I don’t know why you assume there has to be a gotcha, maybe it’s the competitive background… Anyway, it’s visual because you look at it to see it. And it’s not the best human performance vs best LLM performance, it’s best controlled performance because the testing is limited to a set of parameters.
lath@lemmy.world 14 hours ago
🤔 So this is a visual comparison between peak performance of some humans and peak performance of current LLMs in a controlled environment?
floquant@lemmy.dbzer0.com 13 hours ago
Is this a gotcha? Not sure where you got the “visual” from, but yes it is best human performance vs best LLM performance
lath@lemmy.world 13 hours ago
I don’t know why you assume there has to be a gotcha, maybe it’s the competitive background… Anyway, it’s visual because you look at it to see it. And it’s not the best human performance vs best LLM performance, it’s best controlled performance because the testing is limited to a set of parameters.
floquant@lemmy.dbzer0.com 13 hours ago
That’s what games are? I really don’t see how it is an unfair comparison to you. How would you change it?