Okay, but it is those niche subs that are the most valuable.
Comment on Reddit stock falls for second day as references to its content in ChatGPT responses plummet
M1ch431@slrpnk.net 10 hours agoCitation needed.
FlexibleToast@lemmy.world 9 hours ago
M1ch431@slrpnk.net 9 hours ago
Are you somebody invested in Reddit? Genuine question.
FlexibleToast@lemmy.world 8 hours ago
No, I’m not. I don’t care at all if they’re successful or go under.
Sure, but again it’s not likely to be most. You don’t seem to realize how hard it is to get data that is already classified. That stuff is gold to people developing AI. Most of the work in data science is cleaning data and getting it into a usable form.
M1ch431@slrpnk.net 8 hours ago
It’s noise, a very large part of it. Reddit is financially motivated to make the data appear as if is signal. It isn’t - they have taken extremely minimal steps to ensure actual human participation.
This doesn’t matter to AI companies, but it only warps that technology more and more. AI is a sinking ship with current methodologies. Reddit will die when the AI bubble bursts and those involved with Reddit already cashed out enough to be filthy rich (e.g. Steve Huffman sold 500,000 of his shares in the IPO, indicating he will make $17mn).
plyth@feddit.org 2 hours ago
They could sell the cleaned votes to AI companies and keep the dirty data public for the scrapers.
M1ch431@slrpnk.net 2 hours ago
Meta/OpenAI openly pirating everything they can to train their LLMs is a good example of how data hungry these AI/etc. companies are.
Is it plausible for companies to request Reddit narrow down data e.g. by demographic or geographic location and request that data for purchase? Sure, but the LLMs seemingly require all data that exists that these companies can get their hands on - I highly doubt with the scale of data theft being committed do they care about Reddit data being tainted. If anything, it might even be desirable to them.