Comment

Comment on Car Wash Test on 53 leading AI models: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"

<- View Parent

rockSlayer@lemmy.blahaj.zone ⁨2⁩ ⁨days⁩ ago

It’s interesting to see it build the context necessary to answer the question, but this seems to be a lot of text just to come up with a simple answer

source

Sort:hotnew top

Schadrach@lemmy.sdf.org ⁨2⁩ ⁨days⁩ ago
The whole premise of deep think and similar in other models is to come up with an answer, then ask itself if the answer is right and how it could be wrong until the result is stable.

The seahorse emoji question is one that trips up a lot of models (it’s a Mandela effect thing where it doesn’t exist but lots of people remember it and as a consequence are firm that it’s real), I asked GLM 4.7 about it with deep think on and it wrote about two dozen paragraphs trying to think of everywhere a seahorse emoji could be hiding, if it was in a previous or upcoming standard, if maybe there was another emoji that might be mistaken for a seahorse, etc, etc. It eventually decided that it didn’t exist, double checked that it wasn’t missing anything, and gave an answer.

It was startlingly like flow.ofnconaciousness of someone experiencing the Mandela effect trying desperately to find evidence they were right, except it eventually gave up and realized the truth.

source
Buffy@libretechni.ca ⁨2⁩ ⁨days⁩ ago
They’re showing the thinking the model did, the actual response is the sentence at the end.

source