Pretty defensive there. It’s not even a study
lath@lemmy.world 13 hours ago
Biased study. Take any average person off the streets and shove this thing in their face. That 100% notion will go down fast.
pulsewidth@lemmy.world 12 hours ago
lath@lemmy.world 11 hours ago
If it studies something, it’s a study. If you feel defensiveness, you consider aggression. If you feel bias in one way, someone can feel bias in another way. If there’s an action, there’s a reaction.
gnufuu@infosec.pub 5 hours ago
If you feel defensiveness, you consider aggression.
Aggression as in calling something biased without providing evidence?
brianpeiris@lemmy.ca 12 hours ago
ARC-AGI-3 Launch event - Shared publicly live on March 25 in San Francisco at Y Combinator HQ, featuring a fireside conversation between François Chollet (creator, ARC-AGI) and Sam Altman (CEO, OpenAI) on measuring intelligence on the path to AGI.
François Chollet is a software engineer, artificial intelligence (AI) researcher, and former Senior Staff Engineer at Google. Chollet is the creator of the Keras deep-learning library released in 2015.
tomalley8342@lemmy.world 12 hours ago
They didn’t say “100% of humans can solve this benchmark”, they said “humans can solve 100% of this benchmark”.
rimu@piefed.social 12 hours ago
I couldn’t get past the second level :(
ExLisper@lemmy.curiana.net 10 hours ago
Guy, I found the bot!
MagicShel@lemmy.zip 10 hours ago
I see by your lack of pluralization that you’ve realized there’s only one person here and everyone else is bots. However through inference and deduction, you are therefore also a bot. I have good reason to believe I am the non-bot though I wonder if I could know for certain…
That was a lot of effort for a typo joke…
verdi@tarte.nuage-libre.fr 11 hours ago
feelsbadman. You need more RAM!
lath@lemmy.world 11 hours ago
“Humans score 100%. Frontier AI scores 0.26%.”
The title deals in absolutes.
davidgro@lemmy.world 11 hours ago
Those are high scores.
lath@lemmy.world 10 hours ago
🤔 So this is a visual comparison between peak performance of some humans and peak performance of current LLMs in a controlled environment?