Comment on I use Zip Bombs to Protect my Server
sugar_in_your_tea@sh.itjust.works 6 hours agoHow do you tell scrapers from regular traffic?
Comment on I use Zip Bombs to Protect my Server
sugar_in_your_tea@sh.itjust.works 6 hours agoHow do you tell scrapers from regular traffic?
Bishma@discuss.tchncs.de 6 hours ago
Most often because they don’t download any of the css of external js files from the pages they scrape. But there are a lot of other patterns you can detect once you have their traffic logs loaded in a time series database. I used an ELK stack back in the day.
sugar_in_your_tea@sh.itjust.works 6 hours ago
That sounds like a lot of effort. Are there any tools that get like 80% of the way there? Like something I could plug into Caddy, nginx, or haproxy?
Bishma@discuss.tchncs.de 5 hours ago
My experience is with systems that handle nearly 1000 pageviews per second. We did use a spread of haproxy servers to handle routing and SNI, but they were being fed offender lists by external analysis tools (built in-house).
sugar_in_your_tea@sh.itjust.works 5 hours ago
Dang, I was hoping for a FOSS project that would do most of the heavy lifting for me. Maybe such a thing exists, idk, but it would be pretty cool to have a pluggable system that analyzes activity and tags connections w/ some kind of identifier so I could configure a web server to either send it nonsense (i.e. poison AI scrapers), zip bombs (i.e. bots that aren’t respectful of resources), or redirect to a honey pot (i.e. malicious actors).
A quick search didn’t yield anything immediately, but I wasn’t that thorough. I’d be interested if anyone knows of such a project that’s pretty easy to play with.