Comment

Comment on How Much Do LLMs Hallucinate in Document Q&A Scenarios? A 172-Billion-Token Study Across Temperatures, Context Lengths, and Hardware Platforms [TLDR: 25%]

<- View Parent

rekabis@lemmy.ca ⁨2⁩ ⁨months⁩ ago

As I pointed out in another root comment, the average - depending on the model being tested - tends to sit between 60% and 80%.

So this opens up an interesting option for users, in that hallucinations/inaccuracies can be controlled for and potentially reduced by as much as ⅔ simply by restricting the model to those documents/resources that the user is absolutely certain contains the correct answer.

I mean, 25% is still stupidly high. In any prior era, even 2.5% would have been an unacceptable error rate for a business. But source-restriction seems to be a somewhat promising guardrail to use.

source

Sort:hotnew top

jacksilver@lemmy.world ⁨2⁩ ⁨months⁩ ago
Thanks for providing the actual numbers.

I think one of the more concerning things is, what if you think the answer is in the documents you provided but they actually aren’t. What you think is a low error rate could actually be a high error rate.

source