Comment on [deleted]

<- View Parent
SuspciousCarrot78@lemmy.world ⁨1⁩ ⁨week⁩ ago

Sorry; brain fart. That could have been clearer.

On a single call, only 11 out of 53 LLM got it right (~20%) Humans: about 71.5% (so, almost 1 in 3 gave the incorrect answer)

Of the 20% of LLMs got it right, 5 got it right every time across multiple tests Claude Opus 4.6, Gemini 2.0 Flash Lite, Gemini 3 Flash, Gemini 3 Pro, Grok-4

source
Sort:hotnewtop