Not even collage art automatically counts as fair use.
Comment on Grisham, Martin join authors suing OpenAI: “There is nothing fair about this”
ryathal@sh.itjust.works 1 year ago
They are fighting an uphill battle to get anything. It’s a pretty strong argument that training a model is fair use.
Laticauda@lemmy.ca 1 year ago
donuts@kbin.social 1 year ago
Funny, but I don't think there's a very strong argument that training AI is fair use, especially when you consider how it intersects with the standard four factors that generally determine whether a use of copyrighted work is fair or not.
Specifically stuff like:
(Keep in mind that many popular AI models have been trained on vast amounts of entire artworks, large sections of text, etc.)
To me, this factor is by far the strongest argument against AI being considered fair use.
The fact is that today's generative AI is being widely used for commercial purposes and stands to have a dramatic effect on the market for the same types of work that they are using to train their data models--work that they could realistically have been licensing, and probably should be.
Ask any artist, writer, musician, or other creator whether they think it's "fair" to use their work to generate commercial products without any form of credit, consent or compensation, and the vast majority will tell you it isn't. I'm curious what "strong argument" that AI training is fair use is, because I'm just not seeing it.
ryathal@sh.itjust.works 1 year ago
AI training is taking facts which aren’t subject to copyright, not actual content that is subject to it. The original work or a derivative isn’t being distributed or copied. While it may be possible for a user to recreate a copyrighted material with sufficient prompting, the fact it’s possible isn’t any more relevant than for a copy machine. It’s the same as an aspiring author reading all of Martin’s work for inspiration. They can write a story based on a vaguely medieval England full of rape and murder, without paying Martin a dime. What they can’t do is call it Westeros, or have the main character be named Eddard Stork.
There may be an argument that a copy needs to be purchased to extract the facts, but that’s not any special license, a used copy of the book would be sufficient.
AI isn’t doing anything that hasn’t already been done by humans for hundreds of years, it’s just doing it faster.
BraveSirZaphod@kbin.social 1 year ago
Legally, I think you're basically right on.
I think what will eventually need to happen is society deciding whether this is actually the desired legal state of affairs or not. A pretty strong argument can be made that "just doing it faster" makes an enormous difference on the ultimate impact, such that it may be worth adjusting copyright law to explicitly prohibit AI creation of derivative works, training on copyrighted materials without consent, or some other kinds of restrictions.
I do somewhat fear that, in our continuous pursuit for endless amounts of convenient "content" and entertainment to distract ourselves from the real world, we'll essentially outsource human creativity to AI, and I don't love the idea of a future where no one is creating anything because it's impossible to make a living from it due to literally infinite competition from AI.
ryathal@sh.itjust.works 1 year ago
I think that fear is overblown, ai models are only as good as their training material. It still requires humans to create new content to keep models growing. Training ai on ai generated content doesn’t work out well.
Models aren’t good enough yet to actually fully create quality content. It’s also not clear that the ability for them to do so is imminent, maybe one day it will. Right now these tools are really onlyngood for assisting a creator in making drafts, or identifying weak parts of the story.
knitwitt@lemmy.world 1 year ago
If I took 100 of the world’s best-selling novels, wrote each individual word onto a flashcard, shuffled the entire deck, then created an entirely new novel out of that, (with completely original characters, plot threads, themes, and messaged) could it be said that I produced stolen work?
What if I specifically attempted to emulate the style of the number one author on that list? What if instead of 100 novels, I used 1,000 or 10,000? What if instead of words on flashcards, I wrote down sentences? What if it were letters instead?
At some point, regardless of by what means the changes were derived, a transformed work must pass a threshold whereby content alone it is sufficiently different enough that it can no longer be considered derivative.
habanhero@lemmy.ca 1 year ago
Y’all are missing the point, what you said is about AI output and is not the main issue in the lawsuit. The lawsuit is about the input to AI - authors want to choose if their content may be used to train AI or not (and if yes, be compensated for it).
There is an analogy elsewhere in this thread that is pretty apt - this scenario is akin to an university using pirated textbooks to educate their students. Whether or not the student ended up pursing a field that uses the knowledge does not matter - the issue is the university should not have done so in the first place.
knitwitt@lemmy.world 1 year ago
I imagine that the easiest way to acquire specific training data for a LLM is to download EBooks from amazon. If a university professor pirates a textbook and then uses extracts from various pages in their lecture slides, the cost of the crime would be the cost of a single textbook. In the case of a novel, GRRM should be entitled to the cost of a set of Ice & Fire if they could prove that the original training material was illegaly pirated instead of legally purchased.