Comment on Announcing ARC-AGI-3 - An benchmark that tests if AI can explore, learn, and adapt in unfamiliar situations. Humans score 100%. Frontier AI scores 0.26%.

<- View Parent
lath@lemmy.world ⁨7⁩ ⁨hours⁩ ago

I don’t know why you assume there has to be a gotcha, maybe it’s the competitive background… Anyway, it’s visual because you look at it to see it. And it’s not the best human performance vs best LLM performance, it’s best controlled performance because the testing is limited to a set of parameters.

source
Sort:hotnewtop