Comment on Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all.

<- View Parent
auraithx@lemmy.dbzer0.com ⁨22⁩ ⁨hours⁩ ago

Your conflating surface-level architectural limits with core functional behaviour. Yes, an LLM is deterministic at temperature 0 and produces the same output for the same input, but that does not make it equivalent to a Markov chain. A Markov chain defines transitions based on fixed-order memory and static probabilities. An LLM generates output by applying a series of matrix multiplications, activations, and attention-weighted context aggregations across multiple layers, where the representation of each token is conditioned on the entire input sequence, not just on recent tokens.

While the model has a maximum token limit, it does not receive a fixed-length input filled with nulls. It processes variable-length input sequences up to the context limit, and attention masks control which positions are used. These are not hardcoded state transitions; they are dynamically computed weightings over continuous embeddings, where meaning arises from the interaction of tokens, not from simple position or order alone.

Saying that output diversity is just randomness misunderstands why random sampling exists: to explore the rich distribution the model has learned from data, not to fake intelligence. The depth of its output space comes from how it models relationships, hierarchies, syntax, and semantics through training. Markov chains do not do any of this. They map sequences to likely next symbols without modeling internal structure. An LLM’s output reflects high-dimensional reasoning over the prompt. That behavior cannot be reduced to fixed transition logic.

source
Sort:hotnewtop