“Model collapse” threatens to kill progress on generative AIs

⁨450⁩ ⁨likes⁩

Submitted ⁨⁨5⁩ ⁨months⁩ ago⁩ by ⁨Stern@lemmy.world⁩ to ⁨technology@lemmy.world⁩

https://bigthink.com/the-future/ai-model-collapse/

source

Comments

Sort:hotnew top

barsquid@lemmy.world ⁨5⁩ ⁨months⁩ ago
It is their own fault for poisoning the internet with their slop.

source
- db2@lemmy.world ⁨5⁩ ⁨months⁩ ago
  In case anyone doesn’t get what’s happening, imagine feeding an animal nothing but its own shit.
  
  source
  - Stern@lemmy.world ⁨5⁩ ⁨months⁩ ago
    I use the “Sistermother and me are gonna have a baby!” example personally, but I am a awful human so
    
    source
  - BassTurd@lemmy.world ⁨5⁩ ⁨months⁩ ago
    Not shit, but isn’t that what brought about mad cow disease? Farmers were feeding cattle brain matter that had infected prions. Idk if it was cows eating cow brains or other animals though.
    
    source
    -> View More Comments
  - leftzero@lemmynsfw.com ⁨5⁩ ⁨months⁩ ago
    Photocopy of a photocopy is my go-to metaphor for model collapse.
    
    source
- Cock_Inspecting_Asexual@lemmy.world ⁨5⁩ ⁨months⁩ ago
  DUDE ITS SO FUCKING ANNOYING TRYNNA USE GOOGLE IMAGES ANYMORE–
  
  ALL IT GIVES ME IS AI ART. IM SO FUCKING SICK AND TIRED OF IT.
  
  source
EgoNo4@lemmy.world ⁨5⁩ ⁨months⁩ ago
More like… Degenerative AI *ba dum tsss

source
- merde@sh.itjust.works ⁨5⁩ ⁨months⁩ ago
  deGenerative AI ☞ !degenerate@lemmynsfw.com
  
  source
  - EgoNo4@lemmy.world ⁨5⁩ ⁨months⁩ ago
    No idea this existed.
    
    Also… JFC WHAT THE SHIT?
    
    source
  - murmelade@lemmy.ml ⁨5⁩ ⁨months⁩ ago
    [deleted]
    source
    -> View More Comments
CarbonatedPastaSauce@lemmy.world ⁨5⁩ ⁨months⁩ ago
Model collapse is just a euphemism for “we ran out of stuff to steal”

source
- Snowclone@lemmy.world ⁨5⁩ ⁨months⁩ ago
  It’s more ''we are so focused on stealing and eating content, we’re accidently eating the content we or other AI made, which is basically like incest for AI, and they’re all inbred to the point they don’t even know people have more than two thumb shaped fingers anymore.
  
  source
- rottingleaf@lemmy.world ⁨5⁩ ⁨months⁩ ago
  All such news make me want to live to the time when our world is interesting again. Real AI research, something new instead of the Web we have, something new instead of the governments we have. It’s just that I’m scared of what’s between now and then. Parasites die hard.
  
  source
- jimmy90@lemmy.world ⁨5⁩ ⁨months⁩ ago
  or “we’ve hit a limit on what our new toy can do and here’s our excuse why it won’t get any better and AGI will never happen”
  
  source
RmDebArc_5@sh.itjust.works ⁨5⁩ ⁨months⁩ ago
This sounds like AI is literally biting its own tail

source
- AbidanYre@lemmy.world ⁨5⁩ ⁨months⁩ ago
  ChatGPT, what is an ouroboros?
  
  source
  - homesweethomeMrL@lemmy.world ⁨5⁩ ⁨months⁩ ago
    Of course! An ChatGPT is an ouroboros, ChatGPT what is an ouroboros.
    
    source
casmael@lemm.ee ⁨5⁩ ⁨months⁩ ago
…………………. Good?

source
- emiellr@lemm.ee ⁨5⁩ ⁨months⁩ ago
  Tbh I’m a bit lost on the purpose of this
  
  source
aggelalex@lemmy.world ⁨5⁩ ⁨months⁩ ago
So AI:

Scraped the entire internet without consent

Trained on it

Polluted it with AI generated rubbish

Trained on that rubbish without consent

Are now in need of lobotomy
source
rickdg@lemmy.world ⁨5⁩ ⁨months⁩ ago
Old news? Seems to be a subject of several papers for some time now. Synthetic data has been used successfully already for very specific domains.

source
- SomeGuy69@lemmy.world ⁨5⁩ ⁨months⁩ ago
  Yup, old news and wrong news. Also so many people who hate AI but don’t understand how it works. Pretty disappointing for a technology community.
  
  source
thejml@lemm.ee ⁨5⁩ ⁨months⁩ ago
Ah, the Hapsburg of AI!

source
- Telorand@reddthat.com ⁨5⁩ ⁨months⁩ ago
  Oh, the artificial humanity!
  
  source
- PapaStevesy@lemmy.world ⁨5⁩ ⁨months⁩ ago
  I like to think of it like a Mad Cow or Kuru, you can’t eat your own species’s brains or you could get a super lethal, contagious prion disease.
  
  source
  - CarbonatedPastaSauce@lemmy.world ⁨5⁩ ⁨months⁩ ago
    Prion diseases aren’t contagious.
    
    source
    -> View More Comments
- ZILtoid1991@lemmy.world ⁨5⁩ ⁨months⁩ ago
  If only the generated output also looked more and more like how inbred humans do.
  
  Like insane rambling from LLMs, and the humans generated by AI had various developmental disorders and the Habsburg jaw.
  
  source
ohellidk@sh.itjust.works ⁨5⁩ ⁨months⁩ ago
Cool, let’s try to ruin it faster!

source
gravitas_deficiency@sh.itjust.works ⁨5⁩ ⁨months⁩ ago
Uh, good.

As an engineer who cares a LOT about engineering ethics, it is absolutely fucking infuriating watching the absolute firehose of shit that comes out of LLMs and public-consumption ML audio, image, and video ML systems, juxtaposed with the outright refusal of companies and engineers who work there to accept ANY accountability or culpability for the systems THEY FUCKING MADE.

I understand the nuances of NNs. I understand that they’re much more stochastic than deterministic. So, you know, maybe it wasn’t a great idea to just tell the general public (which runs a WIDE gamut of intelligence and comprehension ability - not to mention, morality) “have at it”. The fact that ML usage and deployment in terms of information generating/kinda-sorta-but-not-really-aggregating “AI oracles” isn’t regulated on the same level as what you’d see in biotech or aerospace is insane to me. It’s a refusal to admit that these systems fundamentally change the entire premise of how “free speech” is generated, and that bad actors (either unrepentantly profit driven, or outright malicious) can and are taking disproportionate advantage of these systems.

I get it - I am a staunch opponent of censorship, and as a software engineer. But the flippant deployment of literally society-altering technology alongside the outright refusal to accept any responsibility, accountability, or culpability for what that technology does to our society is unconscionable and infuriating to me. I am aware of the potential that ML has - it’s absolutely enormous, and could absolutely change a HUGE number of fields for the better in incredible ways. But that’s not what it’s being used for, and it’s because the field is essentially unregulated right now.

source
draughtcyclist@lemmy.world ⁨5⁩ ⁨months⁩ ago
I’ve been assuming this was going to happen since it’s been haphazardly implemented across the web. Are people just now realizing it?

source
- DeathbringerThoctar@lemmy.world ⁨5⁩ ⁨months⁩ ago
  People are just now acknowledging it. Execs tend to have a disdain for the minutiae. They’re like kids that only want to do the exciting bits. As a result things get fucked because they don’t really understand what they’re doing. As Muskrat would say “move fast and break things.” It’s a terrible mindset.
  
  source
  - pixxelkick@lemmy.world ⁨5⁩ ⁨months⁩ ago
    “Move Fast and Break Things” is Zuckerberg/Facebook motto, not Musk, just to note.
    
    source
    -> View More Comments
pyre@lemmy.world ⁨5⁩ ⁨months⁩ ago
oh no are we gonna have to appreciate the art of human beings? ew. what is they want compensation‽

source
Lettuceeatlettuce@lemmy.ml ⁨5⁩ ⁨months⁩ ago
Good.

source
Hugin@lemmy.world ⁨5⁩ ⁨months⁩ ago
The solution for this is usually counter training. Granted my experience is on the opposite end training ai vision systems to id real objects.

So you train up your detector ai on hand tagged images. When it gets good you use it to train a generator ai until the generator is good at fooling the detector.

Then you train the detector on new tagged real data and the new ai generated data. Once it’s good at detection again you train the generator ai on the new detector.

Repeate several times and you usually get a solid detector and a good generator as a side effect.

The thing is you need new real human tagged data for each new generation. None of the companies want to generate new human tagged data sets as it’s expensive.

source
TheReturnOfPEB@reddthat.com ⁨5⁩ ⁨months⁩ ago
have we tried feeding them actual human beings yet ?

source
- n3m37h@sh.itjust.works ⁨5⁩ ⁨months⁩ ago
  Billionaires are the smartest, give them the most knowledge first!
  
  source
SkyNTP@lemmy.ml ⁨5⁩ ⁨months⁩ ago
I think anyone familiar with the laws of thermodynamics could have predicted this outcome.

source
- mint_tamas@lemmy.world ⁨5⁩ ⁨months⁩ ago
  Explain?
  
  source
  - Draconic_NEO@lemmy.world ⁨5⁩ ⁨months⁩ ago
    Second law of thermodynamics:
    
    II. Total amount of entropy in a closed system always increases with time. Entropy can never be negative.
    
    source
NotInTheFace@lemmy.world ⁨5⁩ ⁨months⁩ ago
Looks like that artist drawing self portraits as his alzheimer got worse and worse.

source
- NocturnalMorning@lemmy.world ⁨5⁩ ⁨months⁩ ago
  It’s basically AI alzheimers
  
  source
  - homesweethomeMrL@lemmy.world ⁨5⁩ ⁨months⁩ ago
    AIzheimers?
    
    source
levzzz@lemmy.world ⁨5⁩ ⁨months⁩ ago
Fake news, just like that one time Nightshade “killed” stable diffusion (literally had no effect) Flux came out not long ago and it’s better than ever

source
- Sabata11792@ani.social ⁨5⁩ ⁨months⁩ ago
  At this point the synthetic data is good enough to intentionally be used for training LLMs.
  
  source
  - Honytawk@lemmy.zip ⁨5⁩ ⁨months⁩ ago
    Yeah, just filter out the bad generated images and feed the good ones again, until the model learns how to produce only good ones.
    
    source
PrivacyDingus@lemmy.world ⁨5⁩ ⁨months⁩ ago
this headline truly is threatening me with a good time

source
pastermil@sh.itjust.works ⁨5⁩ ⁨months⁩ ago
More like degenerative AIs

source
nullPointer@programming.dev ⁨5⁩ ⁨months⁩ ago
when all your information conflicts with itself, you really have no information at all.

source
tee9000@lemmy.world ⁨5⁩ ⁨months⁩ ago
Kind of like how true thoughts and opinions on complex topics are boiled down to digestible concepts for others to understand who then perpetuate those concepts without understanding them and the meaning degrades and we dont think anymore, just repeat stuff in social media comments.

Side note… this article sucks and seems like it was ai generated. Repetitive and no author credit? Just says it was originally posted elsewhere.

Generative AI isnt in danger of being killed as this clickbait titled suggests… just hindered.

source
- FarFarAway@lemmy.world ⁨5⁩ ⁨months⁩ ago
  Theres a link to the other article, in this article. Says Kristin Houser wrote it…although you may have a point about the rest.
  
  source
  - tee9000@lemmy.world ⁨5⁩ ⁨months⁩ ago
    ty
    
    source
- General_Effort@lemmy.world ⁨5⁩ ⁨months⁩ ago
  
  hindered.
  
  I doubt that.
  
  source
NutWrench@lemmy.world ⁨5⁩ ⁨months⁩ ago
Anyone who has made copies of videotapes knows what happens to the quality of each successive copy. You’re not making a “treasure trove.” You’re making trash.

source
BrightCandle@lemmy.world ⁨5⁩ ⁨months⁩ ago
Having now flooded the internet with bad AI content not surprisingly its now eating itself. Numerous projects that aren’t AI are suffering too as the quality of text reduces.

source
Jomega@lemmy.world ⁨5⁩ ⁨months⁩ ago
Good riddance.

source
todd_bonzalez@lemm.ee ⁨5⁩ ⁨months⁩ ago
Our wetware neutral networks probably aren’t supposed to engage with synthetic content like this either. In a few years we’re gonna learn that overexposure to AI generated content creates some sort of neurological problem in people, like a real-world “nerve attenuation syndrome” (Johnny Mnemonic).

source
- TheHarpyEagle@pawb.social ⁨5⁩ ⁨months⁩ ago
  I’ve read some snippets of AI written books and it really does feel like my brain is short circuiting
  
  source
njordomir@lemmy.world ⁨5⁩ ⁨months⁩ ago
It’s like a human centipede where only the first person is a human and everyone else is an AI. It’s all shit, but it gets a bit worse every step.

source
celsiustimeline@lemmy.dbzer0.com ⁨5⁩ ⁨months⁩ ago
If mainstream blogs are writing about it, what makes you think that AI companies haven’t thoroughly dissected the problem and are already working on filtering out AI fingerprints from the training data set? If they can make a sophisticated LLM, chances are they can find methods to XOR out generated content.

source
mac@lemm.ee ⁨5⁩ ⁨months⁩ ago
is it not relatively trivial to pre-vet content before they train it? at least with aigen text it should be.

source
- RvTV95XBeo@sh.itjust.works ⁨5⁩ ⁨months⁩ ago
  The problem is these AI companies currently exist on the business model of not paying for information, and that generally includes not wanting to pay content curators.
  
  Google is probably the only one in a position to potentially outsource by making everyone solve a “does this hand look normal to you” CAPTCHA
  
  They can try and train AI to detect AI, but that’s also difficult.
  
  source
  - FMT99@lemmy.world ⁨5⁩ ⁨months⁩ ago
    So it’s not a problem with AI. It’s just a problem for some mayfly companies that try to profit from the latest trend?
    
    source
    -> View More Comments
- General_Effort@lemmy.world ⁨5⁩ ⁨months⁩ ago
  It depends on what you are looking for. Identifying AI generated data is generally hard, though it can be done in specific cases. There is no mathematical difference between the 1s and 0s that encoded AI generated data and any other data. Which is why these model collapse ideas are just fantasy. There is nothing magical about any data that makes it “poisonous” to AI. The kernel of truth behind these ideas is not likely to matter in practice.
  
  source
brey1013@lemmy.world ⁨5⁩ ⁨months⁩ ago
Oh no

source
- Tamkish@programming.dev ⁨5⁩ ⁨months⁩ ago
  Anyway
  
  source

-> View More Comments