This is actually a problem a lot of people are working on, they used to call the resulting failure ‘model collapse’. Training AI on existing slop does tend to deteriorate and is overall a bad time for AI.
Comment on AI has had zero effect on jobs so far, says Yale study
MiDaBa@lemmy.ml 14 hours ago
Ai has been trained on current and past writing which could be considered plagiarism depending on if you’re asking an Ai CEO or not. My question is, what happens when most writing is done by Ai? Do they continue to train it but now on itself? Will the language models experience deterioration at that point?
BarbecueCowboy@lemmy.dbzer0.com 13 hours ago
luxyr42@lemmy.dormedas.com 13 hours ago
Even discounting the writing quality, we already have AI responses that reference AI hallucinations posted online as fact.
Ilovethebomb@sh.itjust.works 13 hours ago
Also Reddit shitposts, there’s some notable examples of that.
Jhex@lemmy.world 14 hours ago
it becomes even dumber… we are already there
nightlily@leminal.space 14 hours ago
That’s part of the reason these models haven’t improved much in the last year or so. They‘ve absorbed all the public facing internet and whatever copyrighted works they could get away with pirating (pretty much all printed work), and now they are faced with a brick wall. They haven’t come up with a way to create new content, to reinforce a „correct“ statistical model without causing model collapse, and I don’t think they ever will. The well (the public internet) is already thoroughly poisoned so they have to use a snapshot of the pre-LLM internet, not even an up to date one.
If it isn’t good enough after consuming almost the entirety of humanity’s written output since the invention of the printing press, it’s never going to be.