Comment

Comment on AI trained on AI garbage spits out AI garbage.

vrighter@discuss.tchncs.de ⁨6⁩ ⁨months⁩ ago

and that is exactly how a predictive text algorithm works.

some tokens go in
they are processed by a deterministic, static statistical model, and a set of probabilities (always the same, deterministic, remember?) comes out.
pick the word with the highest probability, add it to your initial string and start over.
if you want variety, add some randomness and don’t just always pick the most probable next token.

Coincidentally, this is exactly how llms work. It’s a big markov chain, but with a novel lossy compression algorithm on its state transition table.

source

Sort:hotnew top

CeeBee_Eh@lemmy.world ⁨6⁩ ⁨months⁩ ago

Coincidentally, this is exactly how llms work

Everyone who says this doesn’t actually understand how LLMs work.

Multivector word embeddings create emergent relationships that’s new knowledge that doesn’t exist in the training dataset.

Computerphile did a good video on this well before the LLM craze.

source
- vrighter@discuss.tchncs.de ⁨6⁩ ⁨months⁩ ago
  1 - a markov chain only takes previous tokens as input.
  
  2 - It uses a function (in the mathematical sense, so same input results in same output, completely stateless) to generate a set of probabilities for what the next token might be.
  
  3 - The most probable token is picked, else randomness (temperature) is inserted here to choose a different token occasionally.
  
  an llm’s internals, the part that’s trained is literally the function used in step 2. You could have this function implemented a number of ways, ex you could buil a huge table and consult it. Or you could generate it somehow. You could train a big neural network that takes previous tokens as input, and outputs probabilities of tokens as output. It can be very smart and notice correlations, but ultimately it generates a (virtual) huge static table. This is a completely deterministic process. A trained NN is still a (huge) mathematical function. So the big network that they spend resources training is basically the function used in step 2.
  
  Step 3 is the cause of hallucinations. It’s the only nondeterministic part.l. No matter how smarter the neural network gets, the hallucinations are introduced mainly in step 3. So no, they won’t be solving the LLM hallucination problem anytime soon.
  
  source