Comment on I use Zip Bombs to Protect my Server
Bishma@discuss.tchncs.de 3 weeks ago
When I was serving high volume sites (that were targeted by scrapers) I had a collection of files in CDN that contained nothing but the word “no” over and over. Scrapers who barely hit our detection thresholds saw all their requests go to the 50M version. Super aggressive scrapers got the 10G version. And the scripts that just wouldn’t stop got the 50G version.
It didn’t move the needle on budget, but hopefully it cost them.
sugar_in_your_tea@sh.itjust.works 3 weeks ago
How do you tell scrapers from regular traffic?
Bishma@discuss.tchncs.de 3 weeks ago
Most often because they don’t download any of the css of external js files from the pages they scrape. But there are a lot of other patterns you can detect once you have their traffic logs loaded in a time series database. I used an ELK stack back in the day.
sugar_in_your_tea@sh.itjust.works 3 weeks ago
That sounds like a lot of effort. Are there any tools that get like 80% of the way there? Like something I could plug into Caddy, nginx, or haproxy?
Bishma@discuss.tchncs.de 3 weeks ago
My experience is with systems that handle nearly 1000 pageviews per second. We did use a spread of haproxy servers to handle routing and SNI, but they were being fed offender lists by external analysis tools (built in-house).