Comment on OpenAI finally admitted they're crawling the web to profit off of GPT. Block it from your sites using robots.txt.

<- View Parent
pjhenry1216@kbin.social ⁨10⁩ ⁨months⁩ ago

"anyone". I hate when people use this word knowing full well it's not true in meaning. "Nothing is stopping you from spending millions of dollars on your own LLM." Ok.

The web is a bunch of information that is public, sure. People don't have a reasonable expectation of privacy but they used to have a reasonable expectation that their information would be used in a very specific fashion. Especially in the US where there is a default copyright claim on data. And crawling the web may ignore text that states you can't use the data. Even if you include a clause saying by accessing the data you agree to the claim. That only works against little people. The "anyone" that can't actually just go and build a LLM.

source
Sort:hotnewtop