Comment on What steps can be taken to prevent AI training and scraping of my public facing website?
Nephalis@discuss.tchncs.de 1 day ago
Isn’t fail2ban a possibility too? I created a filter for chatgpt and some others, and it feels like its working. My radicale server is my only free acessable service but it comes with a small webgui and so the bots showed up. I have no clue if the bot gets a fraction of your site each time it shows up, but seemingly the ban happens within 300ms when I remember correct. So it wouldn’t be that much of information…
When setting the retry to 1 it will ban at the first sight.
JustTesting@lemmy.hogru.ch 1 day ago
A big issue is that this works for bots that announce themselves as such, but there’s lots that pretend to be regular users, with fake user agents and ips selected from a random pool with each ip only sending like 1-3 request/day, but overall many thousands of requests. In my experience a lot of them are from huawei and tencent cloud/ASN
Nephalis@discuss.tchncs.de 1 day ago
Yes, if that is true (and I am not that suprised about it) it is nearly impossible to block them this way.