Comment

Comment on Announcing ARC-AGI-3 - An benchmark that tests if AI can explore, learn, and adapt in unfamiliar situations. Humans score 100%. Frontier AI scores 0.26%.

floquant@lemmy.dbzer0.com ⁨1⁩ ⁨month⁩ ago

That’s what games are? I really don’t see how it is an unfair comparison to you. How would you change it?

Sort:hotnew top

lath@lemmy.world ⁨1⁩ ⁨month⁩ ago
Stress test it. Low, average, high, impairment conditions, safeguards off, order, chaos and everything in between.

source
- gnufuu@infosec.pub ⁨1⁩ ⁨month⁩ ago
  I haven’t read all of their Benchmark introduction and Technical Documentation. I assume you have and didn’t find any of the tests you’re asking for?
  
  source