cross-posted from: programming.dev/post/37407786
ClockBench evaluates whether models can read analog clocks - a task that is trivial for humans, but current frontier models struggle with.
What do you mean trivial? Most humans I know can’t read the most basic white-background-big-black -numbers clocks.
Someone rigged the jury to get 90% on this:
MHLoppy@fedia.io 3 weeks ago
The human level accuracy is less than 90%!?
panda_abyss@lemmy.ca 3 weeks ago
Some of those don’t have tick marks. I hate clocks like that, they’re difficult to read.
I’m surprised it’s near 90, a while generation has grown up with digital clocks everywhere
CouldntCareBear@sh.itjust.works 3 weeks ago
Have a look at the clock faces there using to Benchmark and it’ll make more sense.
MHLoppy@fedia.io 3 weeks ago
Really wish they published the whole dataset. They don't specify on the page or in the paper what the full set was like, and the GitHub repo only has one of the easy-to-read ones. If >=10% of the set is comprised of clock faces designed not to be readable then fair enough.
Smith6612@lemmy.world 2 weeks ago
You’d be surprised. I did a super quick skim of the article, and didn’t see any mention of age group. There’s a lot of talk about how the newer generation of humans struggle to read Analog clocks, because many of them grew up to Digital clocks.
There are also analog clocks that are just awful to read anyways.