It’s searched in training, tagged for recall then that info is filtered through layers. So it’s pre-searched if you will. Same thing as meta tags.
Then the data is processed into cells queries flow through during generation.
99% of what it generates doesn’t come from anywhere in particular, and you wouldn’t find it in any of the sources which were fed to the model in training.
That doesn’t matter. It’s still using copywrited works.
Anyways you’re an AI stan, and defending theft. You can deny it all day, but it’s what you’re doing. “It’s okay, I’m a software engineer I’m allowed to defend it”
…as if that doesn’t stop you from also being a dumbass.
MartianSands@sh.itjust.works 1 day ago
You’re still putting words in my mouth.
I never said they weren’t stealing the data
I didn’t comment on that at all, because it’s not relevant to the point I was actually making, which is that people treating the output of an LLM as if it were derived from any factual source at all is really problematic, because it isn’t.
DarkCloud@lemmy.world 20 hours ago
I’m sorry, the discussion was never about factuality. You said search engine. They are in fact searching and reconstructing data based on a probabilistic data space.
…and there are plenty of examples of search engines being sued for the types of data they’ve explored or digitized.