AFAIK, there’s nothing stopping any company from scraping Lemmy either. The whole point pf reddit limiting API usage was so they could make money like this.
Outside of morals, there is nothing to stop anybody from training on data from Lemmy just like there’s nothing stopping me from using Wikipedia. Most conferences nowadays require a paragraph on ethics in the submission, but I and many of my colleagues would have no qualms saying we scraped our data from open source internet forums and blogs.
mtchristo@lemm.ee 8 months ago
Could you imagine this is what we are training AI with !
Jagermo@feddit.de 8 months ago
I can. Remember Tay?
JustUseMint@lemmy.world 8 months ago
Lol yeah, other bot made data
Annoyed_Crabby@monyet.cc 8 months ago
Yeah, all these bots replies is copied from other comment, and there’s shit tons of r/confidentlyincorrect comment that is outright factually wrong, which then get regurgitated by other user and copied by bots, so good luck to the AI company filtering those.