Comment on ClockBench: Even the best AI models can't reliably read the clock

SnoringEarthworm@sh.itjust.works ⁨2⁩ ⁨days⁩ ago

ClockBench evaluates whether models can read analog clocks - a task that is trivial for humans, but current frontier models struggle with.

What do you mean trivial? Most humans I know can’t read the most basic white-background-big-black -numbers clocks.

Someone rigged the jury to get 90% on this:

Image

source
Sort:hotnewtop