I actually love the data-poisoning approach. I think that sort of strategy is going to be an unfortunately necessary part of the future of the web.
Comment on AI companies are violating a basic social contract of the web and and ignoring robots.txt
BrianTheeBiscuiteer@lemmy.world 9 months agoIf it doesn’t get queried that’s the fault of the webscraper. You don’t need JS built into the robots.txt file either. Just add some line like:
here-there-be-dragons.html
Any client that hits that page (and maybe doesn’t pass a captcha check) gets banned. Or even better, they get a long stream of nonsense.
gravitas_deficiency@sh.itjust.works 9 months ago
4am@lemm.ee 9 months ago
server {
name herebedragons.example.com; root /dev/random;
}
PlexSheep@feddit.de 9 months ago
Nice idea! Better use
/dev/urandom
through, as that is non blocking. See here.aniki@lemm.ee 9 months ago
That was really interesting. I always used urandom by practice and wondered what the difference was.
aniki@lemm.ee 9 months ago
I wonder if Nginx would just load random into memory and crash if you did this.