So tue tldr is just what we already knew: LLMs predict the most likely word to come next and have no concept of “true” or “false” information.
Indeed, to have such a concept would require understanding that information and any AI that actually understood information wouldn’t be an LLM because LLMs are just fancy autocorrect.
XLE@piefed.social 6 days ago
There’s a bit more to it: Obviously, if a model gets more correct data pumped into it, it’s more likely to produce a correct output. But they found that at the core of every AI model they tested, when an incorrect output came along, certain nodes produced it. And they are some of the nodes at the earliest part of making the model - before data gets added.
So with that in mind, the tl;dr is more like
AI models have two goals: first be readable, then be correct. It appears the nodes causing incorrect outputs that are also intended to make the output readable.