Comment

Comment on Announcing ARC-AGI-3 - An benchmark that tests if AI can explore, learn, and adapt in unfamiliar situations. Humans score 100%. Frontier AI scores 0.26%.

lath@lemmy.world ⁨2⁩ ⁨months⁩ ago

Biased study. Take any average person off the streets and shove this thing in their face. That 100% notion will go down fast.

source

Sort:hotnew top

tomalley8342@lemmy.world ⁨2⁩ ⁨months⁩ ago
They didn’t say “100% of humans can solve this benchmark”, they said “humans can solve 100% of this benchmark”.

source
- rimu@piefed.social ⁨2⁩ ⁨months⁩ ago
  I couldn’t get past the second level :(
  
  source
  - ExLisper@lemmy.curiana.net ⁨2⁩ ⁨months⁩ ago
    Guy, I found the bot!
    
    source
    MagicShel@lemmy.zip ⁨2⁩ ⁨months⁩ ago
    I see by your lack of pluralization that you’ve realized there’s only one person here and everyone else is bots. However through inference and deduction, you are therefore also a bot. I have good reason to believe I am the non-bot though I wonder if I could know for certain…
    
    That was a lot of effort for a typo joke…
    
    source
    -> View More Comments
  - verdi@tarte.nuage-libre.fr ⁨2⁩ ⁨months⁩ ago
    feelsbadman. You need more RAM!
    
    source
- lath@lemmy.world ⁨2⁩ ⁨months⁩ ago
  “Humans score 100%. Frontier AI scores 0.26%.”
  
  The title deals in absolutes.
  
  source
  - davidgro@lemmy.world ⁨2⁩ ⁨months⁩ ago
    Those are high scores.
    
    source
    lath@lemmy.world ⁨2⁩ ⁨months⁩ ago
    🤔 So this is a visual comparison between peak performance of some humans and peak performance of current LLMs in a controlled environment?
    
    source
    -> View More Comments
pulsewidth@lemmy.world ⁨2⁩ ⁨months⁩ ago
Pretty defensive there. It’s not even a study

source
- lath@lemmy.world ⁨2⁩ ⁨months⁩ ago
  If it studies something, it’s a study. If you feel defensiveness, you consider aggression. If you feel bias in one way, someone can feel bias in another way. If there’s an action, there’s a reaction.
  
  source
  - gnufuu@infosec.pub ⁨2⁩ ⁨months⁩ ago
    
    If you feel defensiveness, you consider aggression.
    
    Aggression as in calling something biased without providing evidence?
    
    source
brianpeiris@lemmy.ca ⁨2⁩ ⁨months⁩ ago

ARC-AGI-3 Launch event - Shared publicly live on March 25 in San Francisco at Y Combinator HQ, featuring a fireside conversation between François Chollet (creator, ARC-AGI) and Sam Altman (CEO, OpenAI) on measuring intelligence on the path to AGI.

François Chollet is a software engineer, artificial intelligence (AI) researcher, and former Senior Staff Engineer at Google. Chollet is the creator of the Keras deep-learning library released in 2015.

source