Comment on How Much Do LLMs Hallucinate in Document Q&A Scenarios? A 172-Billion-Token Study Across Temperatures, Context Lengths, and Hardware Platforms [TLDR: 25%]

<- View Parent
rekabis@lemmy.ca ⁨1⁩ ⁨week⁩ ago

As I pointed out in another root comment, the average - depending on the model being tested - tends to sit between 60% and 80%.

So this opens up an interesting option for users, in that hallucinations/inaccuracies can be controlled for and potentially reduced by as much as ⅔ simply by restricting the model to those documents/resources that the user is absolutely certain contains the correct answer.

I mean, 25% is still stupidly high. In any prior era, even 2.5% would have been an unacceptable error rate for a business. But source-restriction seems to be a somewhat promising guardrail to use.

source
Sort:hotnewtop