That simply isn’t true. There’s nothing in common between an LLM and a search engine, except insofar as the people developing the LLM had access to search engines, and may have used them during their data gathering efforts for training data
That simply isn’t true. There’s nothing in common between an LLM and a search engine, except insofar as the people developing the LLM had access to search engines, and may have used them during their data gathering efforts for training data
DarkCloud@lemmy.world 1 day ago
“data gathering” and “training data” is just what they have you calling it.
It’s not data gathering, it’s stealing. It’s not training data, it’s our original work.
MartianSands@sh.itjust.works 1 day ago
You’re putting words in my mouth, and inventing arguments I never made.
I didn’t say anything about whether the training data is stolen or not. I also didn’t say a single word about intelligence, or originality.
I haven’t been tricked into using one piece of language over another, I’m a software engineer and know enough about how these systems actually work to reach my own conclusions.
There is not a database tucked away in the LLM anywhere which you could search through and find the phrases which it was trained on, it simply doesn’t exist.
That isn’t to say it’s completely impossible for an LLM to spit out something which formed part of the training data, but it’s pretty rare. 99% of what it generates doesn’t come from anywhere in particular, and you wouldn’t find it in any of the sources which were fed to the model in training.
DarkCloud@lemmy.world 1 day ago
It’s searched in training, tagged for recall then that info is filtered through layers. So it’s pre-searched if you will. Same thing as meta tags.
Then the data is processed into cells queries flow through during generation.
That doesn’t matter. It’s still using copywrited works.
Anyways you’re an AI stan, and defending theft. You can deny it all day, but it’s what you’re doing. “It’s okay, I’m a software engineer I’m allowed to defend it”
…as if that doesn’t stop you from also being a dumbass.
MartianSands@sh.itjust.works 1 day ago
You’re still putting words in my mouth.
I never said they weren’t stealing the data
I didn’t comment on that at all, because it’s not relevant to the point I was actually making, which is that people treating the output of an LLM as if it were derived from any factual source at all is really problematic, because it isn’t.