Comment on Announcing ARC-AGI-3 - An benchmark that tests if AI can explore, learn, and adapt in unfamiliar situations. Humans score 100%. Frontier AI scores 0.26%.
That’s what games are? I really don’t see how it is an unfair comparison to you. How would you change it?
Stress test it. Low, average, high, impairment conditions, safeguards off, order, chaos and everything in between.
I haven’t read all of their Benchmark introduction and Technical Documentation. I assume you have and didn’t find any of the tests you’re asking for?
lath@lemmy.world 13 hours ago
Stress test it. Low, average, high, impairment conditions, safeguards off, order, chaos and everything in between.
gnufuu@infosec.pub 12 hours ago
I haven’t read all of their Benchmark introduction and Technical Documentation. I assume you have and didn’t find any of the tests you’re asking for?