Comment

Comment on Lutris now being built with Claude AI, developer decides to hide it after backlash

dream_weasel@sh.itjust.works ⁨2⁩ ⁨months⁩ ago

I feel like there needs to be a post (and I don’t want to write it, but maybe I eventually will) that outlines what a model really is. It is not just a statistical text prediction machine unless you are being so loose with the definition of “statistical” that it doesn’t even mean anything anymore.

A decent example of a statistical text prediction machine is the middle word suggested by your phone when you’re using the keyboard. An LLM is not that.

In the most general terms, this kind of language model tokenizes a corpus of text based on a vocabulary (which is probably more than just the words in the dictionary), uses an embedding model to translate these tokens into a vector of semantic “meaning” which minimized loss in a bidirectional encoding (probably), that is then trained against a rubric for one or more topic area questions, retrained for instruction and explainability, retrained with reinforcement learning and human feedback to provide guardrails, and retrained again to make use of supplemental materials not part of the original training corpus (resource augmented generation), then distilled, then probably scaled and fine tuned against topic areas of choice (like coding or Korean or whatever) and maybe THEN made available to people to use. There are generally more parts to curriculum learning even than that but it’s a representative-ish start.

My point being that, yes, it would be nuts to pose ANY question to a predictor that says “with 84% probability, the word that is most likely follows ‘I really like’ is ‘gooning’ on reddit”, but even Grok is wildly more sophisticated than that and Grok is terrible.

source

Sort:hotnew top

Vlyn@lemmy.zip ⁨2⁩ ⁨months⁩ ago
The training is sophisticated, but inference is unfortunately really a text prediction machine. Technically token prediction, but you get the idea.

For every single token/word. You input your system prompt, context, user input, then the output starts.

The

Feed the entire context back in and add the reply “The” at the end.

The capital

Feed everything in again with “The capital”

The capital of

Feed everything in again…

The capital of Austria

…

It literally works like that, which sounds crazy :)

The only control you as a user can have is the sampling, like temperature, top-k and so on. But that’s just to soften and randomize how deterministic the model is.

source
- dream_weasel@sh.itjust.works ⁨2⁩ ⁨months⁩ ago
  Unless that’s how people are designing front ends for models, it literally DOESN’T work like that. It works like that until you finish training an embedding model with masking related tasks, but that’s the tip of the iceberg. The input vector, after being tokenized, is ingested wholesale. Now there’s sometimes funny business to manage the size of a context window effectively but this isn’t that unless you’re home-rolling and you’re caching your own inputs or something before you give it to the model.
  
  source