Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all.

⁨892⁩ ⁨likes⁩

Submitted ⁨⁨5⁩ ⁨months⁩ ago⁩ by ⁨Allah@lemm.ee⁩ to ⁨technology@lemmy.world⁩

https://archive.is/bJvH4

source

Comments

Sort:hotnew top

BlaueHeiligenBlume@feddit.org ⁨5⁩ ⁨months⁩ ago
Of course, that is obvious to all having basic knowledge of neural networks, no?

source
- Endmaker@ani.social ⁨5⁩ ⁨months⁩ ago
  I still remember Geoff Hinton’s criticisms of backpropagation.
  
  IMO it is still remarkable what NNs managed to achieve: some form of emergent intelligence.
  
  source
melsaskca@lemmy.ca ⁨5⁩ ⁨months⁩ ago
It’s all “one instruction at a time” regardless of high processor speeds and words like “intelligent” being bandied about. “Reason” discussions should fall into the same query bucket as “sentience”.

source
- MangoCats@feddit.it ⁨5⁩ ⁨months⁩ ago
  My impression of LLM training and deployment is that it’s actually massively parallel in nature - which can be implemented one instruction at a time - but isn’t in practice.
  
  source
Harbinger01173430@lemmy.world ⁨5⁩ ⁨months⁩ ago
XD so, like a regular school/university student that just wants to get passing grades?

source
Blaster_M@lemmy.world ⁨5⁩ ⁨months⁩ ago
Would like a link to the original research paper, instead of a link of a screenshot of screenshot

source
- Allah@lemm.ee ⁨5⁩ ⁨months⁩ ago
  machinelearning.apple.com/…/illusion-of-thinking
  
  source
Xatolos@reddthat.com ⁨5⁩ ⁨months⁩ ago
[deleted]
source
- coolmojo@lemmy.world ⁨5⁩ ⁨months⁩ ago
  The AI stands for Actually Indians /s
  
  source
crystalmerchant@lemmy.world ⁨5⁩ ⁨months⁩ ago
I mean… Is that not reasoning, I guess? It’s what my brain does-- recognizes patterns and makes split second decisions.

source
- mavu@discuss.tchncs.de ⁨5⁩ ⁨months⁩ ago
  Yes, this comment seems to indicate that your brain does work that way.
  
  source
MangoCats@feddit.it ⁨5⁩ ⁨months⁩ ago
It’s not just the memorization of patterns that matters, it’s the recall of appropriate patterns on demand. Call it what you will, even if AI is just a better librarian for search work, that’s value - that’s the new Google.

source
- cactopuses@lemm.ee ⁨5⁩ ⁨months⁩ ago
  While a fair idea there are two issues with that even still - Hallucinations and the cost of running the models.
  
  Unfortunately, it take significant compute resources to perform even simple responses, and these responses can be totally made up, but still made to look completely real. It’s gotten much better sure, but blindly trusting these things (Which many people do) can have serious consequences.
  
  source
  - MangoCats@feddit.it ⁨5⁩ ⁨months⁩ ago
    
    Hallucinations and the cost of running the models.
    
    So, inaccurate information in books is nothing new. Agreed that the rate of hallucinations needs to decline, a lot, but there has always been a need for a veracity filter - just because it comes from “a book” or “the TV” has never been an indication of absolute truth, even though many people stop there and assume it is. In other words: blind trust is not a new problem.
    
    The cost of running the models is an interesting one - how does it compare with publication on paper to ship globally to store in environmentally controlled libraries which require individuals to physically travel to/from the libraries to access the information? What’s the price of the resulting increased ignorance of the general population due to the high cost of information access?
    
    What good is a bunch of knowledge stuck behind a search engine when people don’t know how to access it, or access it efficiently?
    
    Granted, search engines already take up 95% (IMO) of the way from paper libraries to what AI is almost succeeding in being today, but ease of access of information has tremendous value - and developing ways to easily access the information available on the internet is a very valuable endeavor.
    
    Personally, I feel more emphasis should be put on establishing the veracity of the information before we go making all the garbage easier to find.
    
    I also worry that “easy access” to automated interpretation services is going to lead to a bunch of information encoded in languages that most people don’t know because they’re dependent on machines to do the translation for them. As an example: shiny new computer language comes out but software developer is too lazy to learn it, developer uses AI to write code in the new language instead…
    
    source
MuskyMelon@lemmy.world ⁨5⁩ ⁨months⁩ ago
I use LLMs as advanced search engines. Much less ads and sponsored results.

source
- Dojan@pawb.social ⁨5⁩ ⁨months⁩ ago
  There are search engines that do this better.
  
  source
  - auraithx@lemmy.dbzer0.com ⁨5⁩ ⁨months⁩ ago
    Like what?
    
    I don’t think there’s any search engine better than Perplexity. And for scientific research Consensus is miles ahead.
    
    source
    -> View More Comments
- Kyrgizion@lemmy.world ⁨5⁩ ⁨months⁩ ago
  There are ads but they’re subtle enough that you don’t recognize them as such.
  
  source
NostraDavid@programming.dev ⁨5⁩ ⁨months⁩ ago
OK, and? A car doesn’t run like a horse either, yet they are still very useful.

I’m fine with the distinction between human reasoning and LLM “reasoning”.

source
- fishy@lemmy.today ⁨5⁩ ⁨months⁩ ago
  The guy selling the car doesn’t tell you it runs like a horse, the guy selling you AI is telling you it has reasoning skills. AI absolutely has utility, the guys making it are saying it’s utility is nearly limitless because Tesla has demonstrated there’s no actual penalty for lying to investors.
  
  source
- Brutticus@midwest.social ⁨5⁩ ⁨months⁩ ago
  Then use a different word. “AI” and “reasoning” makes people think of Skynet, which is what the weird tech bros want the lay person to think of. LLMs do not “think”, but that’s not to say I might not be persuaded of their utility. But thats not the way they are being marketed.
  
  source
- technocrit@lemmy.dbzer0.com ⁨5⁩ ⁨months⁩ ago
  Cars are horses. How do you feel about statement?
  
  source
1rre@discuss.tchncs.de ⁨5⁩ ⁨months⁩ ago
The difference between reasoning models and normal models is reasoning models are two steps, to oversimplify it a little they prompt “how would you go about responding to this” then prompt “write the response”

It’s still predicting the most likely thing to come next, but the difference is that it gives the chance for the model to write the most likely instructions to follow for the task, then the most likely result of following the instructions - both of which are much more conformant to patterns than a single jump from prompt to response.

source
- kescusay@lemmy.world ⁨5⁩ ⁨months⁩ ago
  But it still manages to fuck it up.
  
  I’ve been experimenting with using Claude’s Sonnet model in Copilot in agent mode for my job, and one of the things that’s become abundantly clear is that it has certain types of behavior that are heavily represented in the model, so it assumes you want that behavior even if you explicitly tell it you don’t.
  
  Say you’re working in a yarn workspaces project, and you instruct Copilot to build and test a new dashboard using an instruction file. You’ll need to include explicit and repeated reminders all throughout the file to use yarn, not NPM, because even though yarn is very popular today, there are so many older examples of using NPM in its model that it’s just going to assume that’s what you actually want - thereby fucking up your codebase.
  
  I’ve also had lots of cases where I tell it I don’t want it to edit any code, just to analyze and explain something that’s there and how to update it… and then I have to stop it from editing code anyway, because halfway through it forgot that I didn’t want edits, just explanations.
  
  source
  - spankmonkey@lemmy.world ⁨5⁩ ⁨months⁩ ago
    
    I’ve also had lots of cases where I tell it I don’t want it to edit any code, just to analyze and explain something that’s there and how to update it… and then I have to stop it from editing code anyway, because halfway through it forgot that I didn’t want edits, just explanations.
    
    I find it hilarious that the only people these LLMs mimic are the incompetent ones. I had a coworker that changed things when asked to explain constantly.
    
    source
  - riskable@programming.dev ⁨5⁩ ⁨months⁩ ago
    To be fair, the world of JavaScript is such a clusterfuck… Can you really blame the LLM for needing constant reminders about the specifics of your project?
    
    When a programming language has five hundred bazillion absolutely terrible ways of accomplishing a given thing—and endless absolutely awful code examples on the Internet to “learn from”—you’re just asking for trouble. Not just from trying to get an LLM to produce what you want but also trying to get humans to do it.
    
    This is why LLMs are so fucking good at writing rust and Python: There’s only so many ways to do a thing and the larger community pretty much always uses the same solutions.
    
    JavaScript? How can it even keep up? You’re using yarn today but in a year you’ll probably like, “fuuuuck this code is garbage… I need to convert this all to <new thing>.”
    
    source
    -> View More Comments
- technocrit@lemmy.dbzer0.com ⁨5⁩ ⁨months⁩ ago
  
  The difference between reasoning models and normal models is reasoning models are two steps,
  
  If I weren’t a grifter, I would call them two-step models instead of introducing misleading anthropomorphic terminology.
  
  source
Grizzlyboy@lemmy.zip ⁨5⁩ ⁨months⁩ ago
What a dumb title. I proved it by asking a series of questions. It’s not AI, stop calling it AI, it’s a dumb af language model. Can you get a ton of help from it, as a tool? Yes! Can it reason? NO! It never could and for the foreseeable future, it will not.

It’s phenomenal at patterns, much much better than us meat peeps. That’s why they’re accurate as hell when it comes to analyzing medical scans.

source
hornedfiend@sopuli.xyz ⁨5⁩ ⁨months⁩ ago
While I hate LLMs with passion and my opinion of them boiling down to them being glorified search engines and data scrapers, I would ask Apple: how sour are the grapes, eh?

source
Naich@lemmings.world ⁨5⁩ ⁨months⁩ ago
So they have worked out that LLMs do what they were programmed to do in the way that they were programmed? Shocking.

source
burgerpocalyse@lemmy.world ⁨5⁩ ⁨months⁩ ago
hey I cant recognize patterns so theyre smarter than me at least

source
FourWaveforms@lemm.ee ⁨5⁩ ⁨months⁩ ago
WTF do they think reasoning is

source
WorldsDumbestMan@lemmy.today ⁨5⁩ ⁨months⁩ ago
It has so much data, it might as well be reasoning. As it helped me with my problem.

source