Comment on AI companies are violating a basic social contract of the web and and ignoring robots.txt

<- View Parent
lvxferre@mander.xyz ⁨7⁩ ⁨months⁩ ago

Good old honeytrap. I’m not sure, but I think that it’s doable.

Have a honeytrap page somewhere in your website. Make sure that legit users won’t access it. Disallow crawling those pages through robots.txt.

Then if some crawler still access it, you could simply record it in the logs and ban the relevant IPs. Or you could be really nasty, and fill the page with poison - nonsensical text that would look like something that humans would write.

source
Sort:hotnewtop