Comment on Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all.

<- View Parent
auraithx@lemmy.dbzer0.com ⁨3⁩ ⁨weeks⁩ ago

This argument collapses the entire distinction between parametric modeling and symbolic lookup. Yes, the weights are fixed after training, but the key point is that an LLM does not store or retrieve a state transition table. It learns to approximate the probability of the next token given a sequence through function approximation, not by memorizing discrete transitions. What appears to be a “table” is actually a deep, distributed representation compressed into continuous weight matrices. It is not indexing state transitions—it is computing probabilities from patterns in the input space.

A true Markov chain defines transition probabilities over explicit states. An LLM embeds tokens into high-dimensional vectors, then transforms them repeatedly using self-attention and feedforward layers that can capture subtle syntactic, semantic, and structural features. These features interact in nonlinear ways that go far beyond what any finite transition table could express. You cannot meaningfully represent an LLM’s behavior as a finite Markov model, even in principle, because its representations are not enumerable states but regions of a continuous latent space.

Saying “you just need all token combinations in a table” ignores the fact that the model generalizes to combinations never seen during training. That is the core of its power. It doesn’t look up learned transitions—it constructs responses by interpolating through an embedding space guided by attention and weight structure. No Markov chain does this. A lossy compressor of a transition table still implies a symbolic map; a neural network is a differentiable function trained to fit a distribution, not to encode it explicitly.

source
Sort:hotnewtop