Lodespawn@aussie.zone 3 days ago
Why is a researcher with a PhD in social sciences researching the accuracy confidence of predictive text, how has this person gotten to where they are without being able to understand that LLM don’t think? Surely they came up when he started even co soldering this brainfart of a research project?
rc__buggy@sh.itjust.works 3 days ago
Someone has to prove it wrong before it’s actually wrong. Maybe they set out to discredit the bots
Lodespawn@aussie.zone 3 days ago
I guess, but it’s like proving your phones predictive text has confidence in its suggestions regardless of accuracy. Confidence is not an attribute of a math function, they are attributing intelligence to a predictive model.
FanciestPants@lemmy.world 3 days ago
I work in risk management, but don’t really have a strong understanding of LLM mechanics. “Confidence” is something that i quantify in my work, but it has different terms that are associated with it. In modeling outcomes, I may say that we have 60% confidence in achieving our budget objectives, while others would express the same result by saying our chances of achieving our budget objective are 60%. Again, I’m not sure if this is what the LLM is doing, but if it is producing a modeled prediction with a CDF of possible outcomes, then representing its result with 100% confindence means that the LLM didn’t model any other possible outcomes other than the answer it is providing, which does seem troubling.
Lodespawn@aussie.zone 3 days ago
Nah so their definition is the classical “how confident are you that you got the answer right”. If you read the article they asked a bunch of people and 4 LLMs a bunch of random questions, then asked the respondent whether they/it had confidence their answer was correct, and then checked the answer. The LLMs initially lined up with people (over confident) but then when they iterated, shared results and asked further questions the LLMs confidence increased while people’s tends to decrease to mitigate the over confidence.
But the study still assumes intelligence enough to review past results and adjust accordingly, but disregards the fact that an AI isnt intelligence, it’s a word prediction model based on a data set of written text tending to infinity. It’s not assessing validity of results, it’s predicting what the answer is based on all previous inputs. The whole study is irrelevant.