Comment on How Much Do LLMs Hallucinate in Document Q&A Scenarios? A 172-Billion-Token Study Across Temperatures, Context Lengths, and Hardware Platforms [TLDR: 25%]

rekabis@lemmy.ca ⁨1⁩ ⁨week⁩ ago

How much do large language models actually hallucinate when answering questions grounded in provided documents?

Okay, this is looking promising. In terms of the most important qualifications being plainly stated in the opening line.

Because the amount of hallucinations/inaccuracies “in the wild” - depending on the model being tested - runs about 60-80%. But then again, this would be average use on generalized data sets, not questions on specific documentation. So of course the “in the wild” questions will see a higher rate.

This also helps users, as it shows that hallucinations/inaccuracies can be reduced by as much as ⅔ by simply limiting LLMs to specific documentation that the user is certain contains the desired information.

Very interesting!

source
Sort:hotnewtop