As I pointed out in another root comment, the average - depending on the model being tested - tends to sit between 60% and 80%.
So this opens up an interesting option for users, in that hallucinations/inaccuracies can be controlled for and potentially reduced by as much as ⅔ simply by restricting the model to those documents/resources that the user is absolutely certain contains the correct answer.
I mean, 25% is still stupidly high. In any prior era, even 2.5% would have been an unacceptable error rate for a business. But source-restriction seems to be a somewhat promising guardrail to use.
jacksilver@lemmy.world 1 week ago
Thanks for providing the actual numbers.
I think one of the more concerning things is, what if you think the answer is in the documents you provided but they actually aren’t. What you think is a low error rate could actually be a high error rate.