Comment on In Cringe Video, OpenAI CTO Says She Doesn’t Know Where Sora’s Training Data Came From
PanArab@lemmy.world 9 months ago
So plagiarism?
Comment on In Cringe Video, OpenAI CTO Says She Doesn’t Know Where Sora’s Training Data Came From
PanArab@lemmy.world 9 months ago
So plagiarism?
HaywardT@lemmy.sdf.org 9 months ago
I don’t think so. They aren’t reproducing the content.
I think the equivalent is you reading this article, then answering questions about it.
A_Very_Big_Fan@lemmy.world 9 months ago
Idk why this is such an unpopular opinion. I don’t need permission from an author to talk about their book, or permission from a singer to parody their song. I’ve never heard any good arguments for why it’s a crime to automate these things.
I mean hell, we have an LLM bot in this comment section that took the article and spat 27% of it back out verbatim, yet nobody is pissing and moaning about it “stealing” the article.
Hawk@lemmy.dbzer0.com 9 months ago
What you’re giving as examples are legitimate uses for the data.
If I write and sell a new book that’s just Harry Potter with names and terms switched around, I’ll definitely get in trouble.
The problem is that the data CAN be used for stuff that violates copyright. And because of the nature of AI, it’s not even always clear to the user.
AI can basically throw out a Harry Potter clone without you knowing because it’s trained on that data, and that’s a huge problem.
A_Very_Big_Fan@lemmy.world 9 months ago
I just realized I misread what you said, so that wasn’t entirely relevant to what you said but I think it still stands so ig I won’t delete it.
But I asked both GPT3.5 and GPT4 to give me Harry Potter with the names and words changed, and they can’t do that either. I can’t speak for all models, but I can at least say the two owned by the people this thread was about won’t do that.
A_Very_Big_Fan@lemmy.world 9 months ago
Out of curiosity I asked it to make a Harry Potter part 8 fan fiction, and surprisingly it did. But I really don’t think that’s problematic. There’s already an insane amount of fan fiction out there without the names swapped that I can read, and that’s all fair use.
I mean hell, there are people who actually get paid to draw fictional characters in sexual situations that I’m willing to bet very few creators would prefer to exist lol. But as long as they don’t overstep the bounds of fair use, like trying to pass it off as an official work or submit it for publication, then there’s no copyright violation.
The important part is that it won’t just give me the actual book (but funnily enough, it tried lol).
MostlyGibberish@lemm.ee 9 months ago
Because people are afraid of things they don’t understand. AI is a very new and very powerful technology, so people are going to see what they want to see from it. Of course, it doesn’t help that a lot of people see “a shit load of cash” from it, so companies want to shove it into anything and everything.
AI models are rapidly becoming more advanced, and some of the new models are showing sparks of metacognition. Calling that “plagiarism” is being willfully ignorant of its capabilities, and it’s just not productive to the conversation.
A_Very_Big_Fan@lemmy.world 9 months ago
True
And on a similar note to this, I think a lot of what it is is that OpenAI is profiting off of it and went closed-source. Lemmy being a largely anti-capitalist and pro-open-source group of communities, it’s natural to have a negative gut reaction to what’s going on, but not a single person here, nor any of my friends that accuse them of “stealing” can tell me what is being stolen, or how it’s different from me looking at art and then making my own.
Like, I get that the technology is gonna be annoying and even dangerous sometimes, but maybe let’s criticize it for that instead of shit that it’s not doing.
myrrh@ttrpg.network 9 months ago
…with the prevalence of clickbaity bottom-feeder new sites out there, i’ve learned to avoid TFAs and await user summaries instead…
neptune@dmv.social 9 months ago
The issue is that the LLMs do often just verbatim spit out things they plagiarized form other sources. The deeper issue is that even if/when they stop that from happening, the technology is clearly going to make most people agree our current copyright laws are insufficient for the times.
A_Very_Big_Fan@lemmy.world 9 months ago
The model in question, plus all of the others I’ve tried, will not give you copyrighted material
Linkerbaan@lemmy.world 9 months ago
Actually neural networks verbatim reproduce this kind of content when you ask the right question such as “finish this book” and the creator doesn’t censor it out well.
It uses an encoded version of the source material to create “new” material.
HaywardT@lemmy.sdf.org 9 months ago
Sure, if that is what the network has been trained to do, just like a librarian will if that is how they have been trained.
Linkerbaan@lemmy.world 9 months ago
Actually it’s the opposite, you need to train a network not to reveal its training data.