Comment

Comment on Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all.

But it still manages to fuck it up.

I’ve been experimenting with using Claude’s Sonnet model in Copilot in agent mode for my job, and one of the things that’s become abundantly clear is that it has certain types of behavior that are heavily represented in the model, so it assumes you want that behavior even if you explicitly tell it you don’t.

Say you’re working in a yarn workspaces project, and you instruct Copilot to build and test a new dashboard using an instruction file. You’ll need to include explicit and repeated reminders all throughout the file to use yarn, not NPM, because even though yarn is very popular today, there are so many older examples of using NPM in its model that it’s just going to assume that’s what you actually want - thereby fucking up your codebase.

I’ve also had lots of cases where I tell it I don’t want it to edit any code, just to analyze and explain something that’s there and how to update it… and then I have to stop it from editing code anyway, because halfway through it forgot that I didn’t want edits, just explanations.

source

Sort:hotnew top

spankmonkey@lemmy.world ⁨7⁩ ⁨months⁩ ago

I’ve also had lots of cases where I tell it I don’t want it to edit any code, just to analyze and explain something that’s there and how to update it… and then I have to stop it from editing code anyway, because halfway through it forgot that I didn’t want edits, just explanations.

I find it hilarious that the only people these LLMs mimic are the incompetent ones. I had a coworker that changed things when asked to explain constantly.

source
riskable@programming.dev ⁨7⁩ ⁨months⁩ ago
To be fair, the world of JavaScript is such a clusterfuck… Can you really blame the LLM for needing constant reminders about the specifics of your project?

When a programming language has five hundred bazillion absolutely terrible ways of accomplishing a given thing—and endless absolutely awful code examples on the Internet to “learn from”—you’re just asking for trouble. Not just from trying to get an LLM to produce what you want but also trying to get humans to do it.

This is why LLMs are so fucking good at writing rust and Python: There’s only so many ways to do a thing and the larger community pretty much always uses the same solutions.

JavaScript? How can it even keep up? You’re using yarn today but in a year you’ll probably like, “fuuuuck this code is garbage… I need to convert this all to <new thing>.”

source
- kescusay@lemmy.world ⁨7⁩ ⁨months⁩ ago
  That’s only part of the problem. Yes, JavaScript is a fragmented clusterfuck. Typescript is leagues better, but by no means perfect. Still, that doesn’t explain why the LLM can’t recall that I’m using Yarn while it’s processing the instruction that specifically told it to use Yarn. Or why it tries to start editing code when I tell it not to. Those are still issues that aren’t specific to the language.
  
  source