Beyond proving hallucinations were inevitable, the OpenAI research revealed that industry evaluation methods actively encouraged the problem. Analysis of popular benchmarks, including GPQA, MMLU-Pro, and SWE-bench, found nine out of 10 major evaluations used binary grading that penalized “I don’t know” responses while rewarding incorrect but confident answers.
“We argue that language models hallucinate because the training and evaluation procedures reward guessing over acknowledging uncertainty,” the researchers wrote.
I just wanna say I called this out nearly a year ago: lemmy.zip/comment/13916070
Guntrigger@sopuli.xyz 6 hours ago
One of these days, the world will no longer reward bullshitters, human or AI. And society will benefit greatly.
essell@lemmy.world 1 hour ago
No it won’t
People talk nonsense a lot.
Both because they’re lying and because they believe nonsense that’ll never happen.
Your comment is an example of evidence that your comment is wrong, but I don’t have enough to tell whether you know that or not.
SapphironZA@sh.itjust.works 5 hours ago
The Lion was THIS big and kept me in that tree all day. And that is why I did not bring back any prey.
Ignore the smell of fermented fruit on my breath.