“Humans score 100%. Frontier AI scores 0.26%.”
The title deals in absolutes.
They didn’t say “100% of humans can solve this benchmark”, they said “humans can solve 100% of this benchmark”.
“Humans score 100%. Frontier AI scores 0.26%.”
The title deals in absolutes.
Those are high scores.
🤔 So this is a visual comparison between peak performance of some humans and peak performance of current LLMs in a controlled environment?
Is this a gotcha? Not sure where you got the “visual” from, but yes it is best human performance vs best LLM performance
rimu@piefed.social 10 hours ago
I couldn’t get past the second level :(
ExLisper@lemmy.curiana.net 8 hours ago
Guy, I found the bot!
MagicShel@lemmy.zip 8 hours ago
I see by your lack of pluralization that you’ve realized there’s only one person here and everyone else is bots. However through inference and deduction, you are therefore also a bot. I have good reason to believe I am the non-bot though I wonder if I could know for certain…
That was a lot of effort for a typo joke…
ExLisper@lemmy.curiana.net 8 hours ago
My programming tells me I’m not a bot.
verdi@tarte.nuage-libre.fr 9 hours ago
feelsbadman. You need more RAM!