Comment on What steps can be taken to prevent AI training and scraping of my public facing website?

riskable@programming.dev ⁨1⁩ ⁨day⁩ ago

We learned this lesson in the 90s: If you put something on the (public) Internet, assume it will be scraped (and copied and used in various ways without your consent). If you don’t want that, don’t put it on the Internet.

There’s all sorts of clever things you can do to prevent scraping but none of them are 100% effective and all have negative tradeoffs.

For reference, the big AI players aren’t scraping the Internet to train their LLMs anymore. That creates too many problems, not the least of which is making yourself vulnerable to poisoning. If an AI is scraping your content at this point it’s either amateurs or they’re just indexing it like Google would (or both) so the AI knows where to find it without having to rely on 3rd parties like Google.

Remember: Scraping the Internet is everyone’s right. Trying to stop it is futile and only benefits the biggest of the big search engines/companies.

source
Sort:hotnewtop