Comment

Comment on I use Zip Bombs to Protect my Server

<- View Parent

sugar_in_your_tea@sh.itjust.works ⁨7⁩ ⁨months⁩ ago

How do you tell scrapers from regular traffic?

source

Sort:hotnew top

Bishma@discuss.tchncs.de ⁨7⁩ ⁨months⁩ ago
Most often because they don’t download any of the css of external js files from the pages they scrape. But there are a lot of other patterns you can detect once you have their traffic logs loaded in a time series database. I used an ELK stack back in the day.

source
- sugar_in_your_tea@sh.itjust.works ⁨7⁩ ⁨months⁩ ago
  That sounds like a lot of effort. Are there any tools that get like 80% of the way there? Like something I could plug into Caddy, nginx, or haproxy?
  
  source
  - Bishma@discuss.tchncs.de ⁨7⁩ ⁨months⁩ ago
    My experience is with systems that handle nearly 1000 pageviews per second. We did use a spread of haproxy servers to handle routing and SNI, but they were being fed offender lists by external analysis tools (built in-house).
    
    source
    sugar_in_your_tea@sh.itjust.works ⁨7⁩ ⁨months⁩ ago
    Dang, I was hoping for a FOSS project that would do most of the heavy lifting for me. Maybe such a thing exists, idk, but it would be pretty cool to have a pluggable system that analyzes activity and tags connections w/ some kind of identifier so I could configure a web server to either send it nonsense (i.e. poison AI scrapers), zip bombs (i.e. bots that aren’t respectful of resources), or redirect to a honey pot (i.e. malicious actors).
    
    A quick search didn’t yield anything immediately, but I wasn’t that thorough. I’d be interested if anyone knows of such a project that’s pretty easy to play with.
    
    source
    -> View More Comments