Comment

Comment on Announcing ARC-AGI-3 - An benchmark that tests if AI can explore, learn, and adapt in unfamiliar situations. Humans score 100%. Frontier AI scores 0.26%.

tomalley8342@lemmy.world ⁨2⁩ ⁨months⁩ ago

They didn’t say “100% of humans can solve this benchmark”, they said “humans can solve 100% of this benchmark”.

Sort:hotnew top

rimu@piefed.social ⁨2⁩ ⁨months⁩ ago
I couldn’t get past the second level :(

source
- ExLisper@lemmy.curiana.net ⁨2⁩ ⁨months⁩ ago
  Guy, I found the bot!
  
  source
  - MagicShel@lemmy.zip ⁨2⁩ ⁨months⁩ ago
    I see by your lack of pluralization that you’ve realized there’s only one person here and everyone else is bots. However through inference and deduction, you are therefore also a bot. I have good reason to believe I am the non-bot though I wonder if I could know for certain…
    
    That was a lot of effort for a typo joke…
    
    source
    ExLisper@lemmy.curiana.net ⁨2⁩ ⁨months⁩ ago
    My programming tells me I’m not a bot.
    
    source
- verdi@tarte.nuage-libre.fr ⁨2⁩ ⁨months⁩ ago
  feelsbadman. You need more RAM!
  
  source
lath@lemmy.world ⁨2⁩ ⁨months⁩ ago
“Humans score 100%. Frontier AI scores 0.26%.”

The title deals in absolutes.

source
- davidgro@lemmy.world ⁨2⁩ ⁨months⁩ ago
  Those are high scores.
  
  source
  - lath@lemmy.world ⁨2⁩ ⁨months⁩ ago
    🤔 So this is a visual comparison between peak performance of some humans and peak performance of current LLMs in a controlled environment?
    
    source
    floquant@lemmy.dbzer0.com ⁨2⁩ ⁨months⁩ ago
    Is this a gotcha? Not sure where you got the “visual” from, but yes it is best human performance vs best LLM performance
    
    source
    -> View More Comments