Large language models, explained with a minimum of math and jargon

⁨96⁩ ⁨likes⁩

Submitted ⁨⁨1⁩ ⁨year⁩ ago⁩ by ⁨sizeoftheuniverse@programming.dev⁩ to ⁨programming@programming.dev⁩

source

Comments

Sort:hotnew top

redcalcium@lemmy.institute ⁨1⁩ ⁨year⁩ ago

We love this example because it illustrates just how difficult it will be to fully understand LLMs. The five-member Redwood team published a 25-page paper explaining how they identified and validated these attention heads. Yet even after they did all that work, we are still far from having a comprehensive explanation for why GPT-2 decided to predict Mary as the next word.

Current approach to ML model development has the same vibe with people writing a block of code that somehow works and then put comments like "no idea why but it works, modify at your own risk’

source
- Jumper775@lemmy.world ⁨1⁩ ⁨year⁩ ago
  Perhaps we could see even greater improvements if we stopped and looked at how this works. Eventually we will need to as there is a limit to how much real text exists.
  
  source
jadero@programming.dev ⁨1⁩ ⁨year⁩ ago
There are a few things I’ve taken from that article on first reading:

I was substantially correct in my understanding of how multidimensional matrices and neural networks are used. While unsurprising given the amount of reading I’ve done over the last several decades on various approaches to AI, it’s still gratifying to feel that I actually learned something from all that reading.

I saw nothing in there to argue against my thesis that things like ChatGPT may be doing for intelligence what evolutionary biology has done to creationism. In the case of evolution, it has forced creationists to fall back on a “God of the Gaps” whose gaps grow ever smaller. ChatGPT et al have me thinking that any attribution of mind or intelligence to “mystery” or the supernatural or whatever hand waving is en vogue is or will be consigned to ever smaller gaps. That is, it is incorrect to claim that intelligence, human or otherwise, is currently and will forever remain unexplainable.

The fact that we cannot easily work out exactly how a particular input was transformed to a particular output strikes me as a “fake problem.” That is, given the scale of operations, this difficulty of following a single throughline is no different from many other processes we have developed. Who can say which molecules go where in an oil refinery? We have only a process that is shown useful in the lab then scaled to beyond comprehension in industry. Except that it’s not actually beyond comprehension, because everything we need to know is described by the process, validated at small scales, and producing statistically similar useful results at large scales. Asking questions about individual molecules is asking the wrong questions. So it is with LLM and transformers: the “how it works” is in being able to describe and validate the process, not in being able to track and understand individual changes between input and output.

Although not explicitly addressed, the “hallucinatory” results we occasionally see may have more in common with the ordinary cognitive failures we are all subject to than anything that can be labelled as broken. Each of us has in our backgrounds something that got misclassified in ways that, when combined with the way we process information, lead to wild conclusions. That is why we have learned to compare and contrast our results with the results of others and have even formalized that activity in science. So it may be necessary to apply that activity (compare and contrast) with other systems, including the ones built in to our brains.

Anyway some pseudorandom babbling that I hope is at least as useful as a hallucinating AI.
source
Amanduh@kbin.social ⁨1⁩ ⁨year⁩ ago
No idea how to save posts so commenting

source
- b_van_b@programming.dev ⁨1⁩ ⁨year⁩ ago
  Click on the star next to the share icon.
  
  source