Comment on The Irony of 'You Wouldn't Download a Car' Making a Comeback in AI Debates

<- View Parent
Eccitaze@yiffit.net ⁨3⁩ ⁨weeks⁩ ago

The commercial aspect of the reproduction is not relevant to whether it is an infringement–it is simply a factor in damages and Fair Use defense (an affirmative defense that presupposes infringement).

What you are getting at when it applies to this particular type of AI is effectively whether it would be a fair use, presupposing there is copying amounting to copyright infringement. And what I am saying is that, ignoring certain stupid behavior like torrenting a shit ton of text to keep a local store of training data, there is no copying happening as a matter of necessity. There may be copying as a matter of stupidity, but it isn’t necessary to the way the technology works.

You’re conflating whether something is infringement with defenses against infringement. Believe it or not, basically all data transfer and display of copyrighted material on the Internet is technically infringing. That includes the download of a picture to your computer’s memory for the sole purpose of displaying it on your monitor. In practice, nobody ever bothers suing art galleries, social media websites, or web browsers, because they all have ironclad defenses against infringement claims: art galleries & social media include a clause in their TOS that grants them a license to redistribute your work for the purpose of displaying it on their website, and web browsers have a basically bulletproof fair use claim. There are other non-infringing uses such as those which qualify for a compulsory license (e.g. live music productions, usually involving royalties), but they’re largely not very relevant here. In any case, the fundamental point is that any reproduction of a copyrighted work is infringement, but there are varied defenses against this that mean most infringing activities never see a courtroom in practice.

All this gets back to the original point I made: Creators retain their copyright even when uploading data for public use, and that copyright comes with heavy restrictions on how third parties may use it. When an individual uploads something to an art website, the website is free and clear of any claims for copyright infringement by virtue of the license granted to it by the website’s TOS. An uninvolved third party–e.g. a non-registered user or an organization that has not entered into a licensing agreement with the creator or the website (*cough* OpenAI)–has no special defense against copyright infringement claims beyond the baseline questions: was the infringement for personal, noncommercial use? And does the infringement qualify as fair use? Individual users downloading an image for their private collection are mostly A-OK, because the infringement is done for personal & noncommercial use–theoretically someone could sue over it, but there would have to be a lot of aggravating factors for it to get beyond summary judgment. AI companies using web scrapers to download creators’ works do not qualify as personal/noncommercial use, for what I hope are bloody obvious reasons.

As for a model trained purely for research or educational purposes, I believe that it would have a very strong claim for fair use as long as the model is not widely available for public use. Once that model becomes publicly available, and/or is leveraged commercially, the analysis changes, because the model is no longer being used for research, but for commercial profit. To apply it to the real world, when OpenAI originally trained ChatGPT for research, it was on strong legal ground, but when it decided to start making it publicly available, they should have thrown out their training dataset and built up a new one using data in the public domain and data that it had negotiated a license for, trained ChatGPT on the new dataset, and then released it commercially. If they had done that, and if individuals had been given the option to opt their creative works out of this dataset, I highly doubt that most people would have any objection to LLM from a legal standpoint. Hell, they probably could have gotten licenses to use most websites’ data to train ChatGPT for a song. Instead, they jumped the gun and tipped their hand before they had all their ducks in a row, and now everybody sees just how valuable their data is to OpenAI and are pricing it accordingly.

Oh, and as for your edit, you contradicted yourself: in your first line, you said “The commercial aspect of the reproduction is not relevant to whether it is an infringement.” In your edit, you said “the infringement happens when you reproduce the images for a commercial purpose.” So which is it? (To be clear, the initial download is infringing copyright both when I download the image for personal/noncommercial use, and also when I download it to make T-shirts with. The difference is that the first case has a strong defense against an infringement claim that would likely get it dismissed in summary, while the cases of making T-shirts would be straightforward claims of infringement.)

source
Sort:hotnewtop