Like the how many r’s in strawberry. It took off as an Internet meme and was fixed, but how did that fix happen?
Yes. Absolutely.
The meme in the research community is that current LLMs are literally trained on benchmarks and common stuff people test in LM-Arena, like the how many r’s in strawberry question.
I’m not talking speculatively: Meta literally got caught red-handed doing this. They ran a separate finetune just to look good on lm-arena. And some benchmarks like MMLU have errors in them that many LLMs *answer ‘correctly’.
fmstrat@lemmy.nowsci.com 7 months ago
A lot of answers here, but some are dated, as the “fix” isn’t in the models. MCP is a main fix for items like this. It’s a standardized protocol for LLMs to talk to tools and data stores, like calculators and dictionaries. This way the token effect doesn’t matter, and system prompts only need a small configuration which process much faster.