Comment

Comment on Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all.

auraithx@lemmy.dbzer0.com ⁨11⁩ ⁨months⁩ ago

Unlike Markov models, modern LLMs use transformers that attend to full contexts, enabling them to simulate structured, multi-step reasoning (albeit imperfectly). While they don’t initiate reasoning like humans, they can generate and refine internal chains of thought when prompted, and emerging frameworks (like ReAct or Toolformer) allow them to update working memory via external tools. Reasoning is limited, but not physically impossible, it’s evolving beyond simple pattern-matching toward more dynamic and compositional processing.

source

Sort:hotnew top

vrighter@discuss.tchncs.de ⁨11⁩ ⁨months⁩ ago
previous input goes in. Completely static, prebuilt model processes it and comes up with a probability distribution.

There is no “unlike markov chains”. They are markov chains. Ones with a long context (a markov chain also kakes use of all the context provided to it, so I don’t know what you’re on about there). LLMs are just a (very) lossy compression scheme for the state transition table. Computed once, applied blindly to any context fed in.

source
- auraithx@lemmy.dbzer0.com ⁨11⁩ ⁨months⁩ ago
  LLMs are not Markov chains, even extended ones. A Markov model, by definition, relies on a fixed-order history and treats transitions as independent of deeper structure. LLMs use transformer attention mechanisms that dynamically weigh relationships between all tokens in the input—not just recent ones. This enables global context modeling, hierarchical structure, and even emergent behaviors like in-context learning. Markov models can’t reweight context dynamically or condition on abstract token relationships.
  
  The idea that LLMs are “computed once” and then applied blindly ignores the fact that LLMs adapt their behavior based on input. They don’t change weights during inference, true—but they do adapt responses through soft prompting, chain-of-thought reasoning, or even emulated state machines via tokens alone. That’s a powerful form of contextual plasticity, not blind table lookup.
  
  Calling them “lossy compressors of state transition tables” misses the fact that the “table” they’re compressing is not fixed—it’s context-sensitive and computed in real time using self-attention over high-dimensional embeddings. That’s not how Markov chains work, even with large windows.
  
  source
  - vrighter@discuss.tchncs.de ⁨11⁩ ⁨months⁩ ago
    their input is the context window. Markov chains also use their whole context window. Llms are a novel implementation that can work with much longer contexts, but as soon as something slides out of its window, it’s forgotten. just like any other markov chain.
    
    source
    auraithx@lemmy.dbzer0.com ⁨11⁩ ⁨months⁩ ago
    While both Markov models and LLMs forget information outside their window, that’s where the similarity ends. A Markov model relies on fixed transition probabilities and treats the past as a chain of discrete states. An LLM evaluates every token in relation to every other using learned, high-dimensional attention patterns that shift dynamically based on meaning, position, and structure.
    
    Changing one word in the input can shift the model’s output dramatically by altering how attention layers interpret relationships across the entire sequence. It’s a fundamentally richer computation that captures syntax, semantics, and even task intent, which a Markov chain cannot model regardless of how much context it sees.
    
    source
    -> View More Comments
riskable@programming.dev ⁨11⁩ ⁨months⁩ ago
I’m not convinced that humans don’t reason in a similar fashion. When I’m asked to produce pointless bullshit at work my brain puts in a similar level of reasoning to an LLM.

Think about “normal” programming: An experienced developer (that’s self-trained on dozens of enterprise code bases) doesn’t have to think much at all about 90% of what they’re coding. It’s all bog standard bullshit so they end up copying and pasting from previous work, Stack Overflow, etc because it’s nothing special.

The remaining 10% is “the hard stuff”. They have to read documentation, search the Internet, and then—after all that effort to avoid having to think—they sigh and start actually start thinking in order to program the thing they need.

LLMs go through similar motions behind the scenes! Probably because they were created by software developers but they still fail at that last 90%: The stuff that requires actual thinking.

Eventually someone is going to figure out how to auto-generate LoRAs based on test cases combined with trial and error that then get used by the AI model to improve itself and that is when people are going to be like, “Oh shit! Maybe AGI really is imminent!” But again, they’ll be wrong.

AGI won’t happen until AI models get good at retraining themselves with something better than basic reinforcement learning. In order for that to happen you need the working memory of the model to be nearly as big as the hardware that was used to train it. That, and loads and loads of spare matrix math processors ready to go for handing that retraining.

source
spankmonkey@lemmy.world ⁨11⁩ ⁨months⁩ ago

Reasoning is limited

Most people wouldn’t call zero of something ‘limited’.

source
- auraithx@lemmy.dbzer0.com ⁨11⁩ ⁨months⁩ ago
  The paper doesn’t say LLMs can’t reason, it shows that their reasoning abilities are limited and collapse under increasing complexity or novel structure.
  
  source
  - technocrit@lemmy.dbzer0.com ⁨11⁩ ⁨months⁩ ago
    
    The paper doesn’t say LLMs can’t reason
    
    Authors gotta get paid. This article is full of pseudo-scientific jargon.
    
    source
  - spankmonkey@lemmy.world ⁨11⁩ ⁨months⁩ ago
    I agree with the author.
    
    If these models were truly “reasoning,” they should get better with more compute and clearer instructions.
    
    The fact that they only work up to a certain point despite increased resources is proof that they are just pattern matching, not reasoning.
    
    source
    auraithx@lemmy.dbzer0.com ⁨11⁩ ⁨months⁩ ago
    Performance eventually collapses due to architectural constraints, this mirrors cognitive overload in humans: reasoning isn’t just about adding compute, it requires mechanisms like abstraction, recursion, and memory. The models’ collapse doesn’t prove “only pattern matching”, it highlights that today’s models simulate reasoning in narrow bands, but lack the structure to scale it reliably. That is a limitation of implementation, not a disproof of emergent reasoning.
    
    source
    -> View More Comments