Comment on The Collapse of GPT: Will future artificial intelligence systems perform increasingly poorly due to AI-generated material in their training data?

andallthat@lemmy.world ⁨1⁩ ⁨day⁩ ago

Basically, model collapse happens when the training data no longer matches real-world data

I’m more concerned about LLMs collaping of the whole idea of “real-world”.

I’m not a machine intelligence expert but I do get the basic concept of training a model and then evaluating its output against real data. But the whole thing rests on the idea that you have a model trained with relatively small samples of the real world and a big, clearly distinct “real world” to check the model’s performance.

If LLMs have already ingested basically the entire information in the “real world” and their output is so pervasive that you can’t easily tell what’s true and what’s AI-generated slop “how do we train our models now” is not my main concern.

As an example, take the judges who found made-up cases because lawyers used an LLM. What happens if made-up cases are referenced in several other places, including some legal textbooks used in Law Schools? Don’t they become part of the “real world”?

source
Sort:hotnewtop