Comment

Comment on What steps can be taken to prevent AI training and scraping of my public facing website?

Nephalis@discuss.tchncs.de ⁨5⁩ ⁨months⁩ ago

Isn’t fail2ban a possibility too? I created a filter for chatgpt and some others, and it feels like its working. My radicale server is my only free acessable service but it comes with a small webgui and so the bots showed up. I have no clue if the bot gets a fraction of your site each time it shows up, but seemingly the ban happens within 300ms when I remember correct. So it wouldn’t be that much of information…

When setting the retry to 1 it will ban at the first sight.

source

Sort:hotnew top

JustTesting@lemmy.hogru.ch ⁨5⁩ ⁨months⁩ ago
A big issue is that this works for bots that announce themselves as such, but there’s lots that pretend to be regular users, with fake user agents and ips selected from a random pool with each ip only sending like 1-3 request/day, but overall many thousands of requests. In my experience a lot of them are from huawei and tencent cloud/ASN

source
- Nephalis@discuss.tchncs.de ⁨5⁩ ⁨months⁩ ago
  Yes, if that is true (and I am not that suprised about it) it is nearly impossible to block them this way.
  
  source