Comment on Reddit stock falls for second day as references to its content in ChatGPT responses plummet
FlexibleToast@lemmy.world 2 weeks agoOf course there are. That doesn’t mean the majority of the site is compromised.
Comment on Reddit stock falls for second day as references to its content in ChatGPT responses plummet
FlexibleToast@lemmy.world 2 weeks agoOf course there are. That doesn’t mean the majority of the site is compromised.
M1ch431@slrpnk.net 2 weeks ago
Citation needed.
plyth@feddit.org 2 weeks ago
They could sell the cleaned votes to AI companies and keep the dirty data public for the scrapers.
M1ch431@slrpnk.net 2 weeks ago
Meta/OpenAI openly pirating everything they can to train their LLMs is a good example of how data hungry these AI/etc. companies are.
Is it plausible for companies to request Reddit narrow down data e.g. by demographic or geographic location and request that data for purchase? Sure, but the LLMs seemingly require all data that exists that these companies can get their hands on - I highly doubt with the scale of data theft being committed do they care about Reddit data being tainted. If anything, it might even be desirable to them.
FlexibleToast@lemmy.world 2 weeks ago
Okay, but it is those niche subs that are the most valuable.
M1ch431@slrpnk.net 2 weeks ago
Are you somebody invested in Reddit? Genuine question.
FlexibleToast@lemmy.world 2 weeks ago
No, I’m not. I don’t care at all if they’re successful or go under.
Sure, but again it’s not likely to be most. You don’t seem to realize how hard it is to get data that is already classified. That stuff is gold to people developing AI. Most of the work in data science is cleaning data and getting it into a usable form.