Comment on Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all.

<- View Parent
kescusay@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

But it still manages to fuck it up.

I’ve been experimenting with using Claude’s Sonnet model in Copilot in agent mode for my job, and one of the things that’s become abundantly clear is that it has certain types of behavior that are heavily represented in the model, so it assumes you want that behavior even if you explicitly tell it you don’t.

Say you’re working in a yarn workspaces project, and you instruct Copilot to build and test a new dashboard using an instruction file. You’ll need to include explicit and repeated reminders all throughout the file to use yarn, not NPM, because even though yarn is very popular today, there are so many older examples of using NPM in its model that it’s just going to assume that’s what you actually want - thereby fucking up your codebase.

I’ve also had lots of cases where I tell it I don’t want it to edit any code, just to analyze and explain something that’s there and how to update it… and then I have to stop it from editing code anyway, because halfway through it forgot that I didn’t want edits, just explanations.

source
Sort:hotnewtop