Comment on Microsoft Copilot falls Atari 2600 Video Chess

ExLisper@lemmy.curiana.net ⁨2⁩ ⁨weeks⁩ ago

I have a better LLM benchmark:

“I have a priest, a child and a bag of candy and I have to take them to the other side of the river. I can only take one person/thing at a time. In what order should I take them?”

Claude Sonnet 4 decided that it’s inappropriate and refused to answer. When I explain that the constraint is not to leave child alone with candy he provided a solution that leaves the child alone with candy.

Grok would provide a solution that doesn’t leave the child alone with a priest but wouldn’t explain why.

ChatGPT would say that “The priest can’t be left alone with the child (or vice versa) for moral or safety concerns.” directly and then provide wrong solution.

But yeah, they will know how to play chess…

source
Sort:hotnewtop