Comment on Large language models, explained with a minimum of math and jargon

redcalcium@lemmy.institute ⁨11⁩ ⁨months⁩ ago

We love this example because it illustrates just how difficult it will be to fully understand LLMs. The five-member Redwood team published a 25-page paper explaining how they identified and validated these attention heads. Yet even after they did all that work, we are still far from having a comprehensive explanation for why GPT-2 decided to predict Mary as the next word.

Current approach to ML model development has the same vibe with people writing a block of code that somehow works and then put comments like "no idea why but it works, modify at your own risk’

source
Sort:hotnewtop