Comment

Comment on Elon Musk wants to rewrite "the entire corpus of human knowledge" with Grok

JustAPenguin@lemmy.world ⁨6⁩ ⁨months⁩ ago

The thing that annoys me most is that there have been studies done on LLMs where, when trained on subsets of output, it produces increasingly noisier output.

Sources (unordered):

What is model collapse?
AI models collapse when trained on recursively generated data
Large Language Models Suffer From Their Own Output: An Analysis of the Self-Consuming Training Loop
Collapse of Self-trained Language Models

Whatever nonsense Muskrat is spewing, it is factually incorrect. He won’t be able to successfully retrain any model on generated content. At least, not an LLM if he wants a successful product. If anything, he will be producing a model that is heavily trained on censored datasets.

source

Sort:hotnew top

brucethemoose@lemmy.world ⁨6⁩ ⁨months⁩ ago
It’s not so simple, there are successful methods of zero data ‘self play’ or other schemes for using other LLM’s output. Though distillation is probably the only one you’d want for a pretrain, specifically.

source