Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Submitted ⁨⁨3⁩ ⁨months⁩ ago⁩ by ⁨Pro@programming.dev⁩ to ⁨technology@lemmy.world⁩

https://aifray.com/claude-ai-maker-anthropic-bags-key-fair-use-win-for-ai-platforms-but-faces-trial-over-damages-for-millions-of-pirated-works/

source

Comments

Sort:hotnew top

Prox@lemmy.world ⁨3⁩ ⁨months⁩ ago
FTA:

Anthropic warned against “[t]he prospect of ruinous statutory damages—$150,000 times 5 million books”: that would mean $750 billion.

So part of their argument is actually that they stole so much that it would be impossible for them/anyone to pay restitution, therefore we should just let them off the hook.

source
- krashmo@lemmy.world ⁨3⁩ ⁨months⁩ ago
  Funny how that kind of thing only works for rich people
  
  source
- artifex@lemmy.zip ⁨3⁩ ⁨months⁩ ago
  Ah the old “owe $100 and the bank owns you; owe $100,000,000 and you own the bank” defense.
  
  source
- phoenixz@lemmy.ca ⁨3⁩ ⁨months⁩ ago
  This version of too big to fail is too big a criminal to be locked up
  
  source
- IllNess@infosec.pub ⁨3⁩ ⁨months⁩ ago
  
  In April, Anthropic filed its opposition to the class certification motion, arguing that a copyright class relating to 5 million books is not manageable and that the questions are too distinct to be resolved in a class action.
  
  I also like this one too. We stole so much content that you can’t sue us. Naming too many pieces means it can’t be a class action lawsuit.
  
  source
- Buske@lemmy.world ⁨3⁩ ⁨months⁩ ago
  Ahh cant wait for hedgefunds and the such to use this defense next.
  
  source
- LovableSidekick@lemmy.world ⁨3⁩ ⁨months⁩ ago
  Lawsuits are multifaceted. This statement isn’t an argument for innocence and doesn’t support that, it’s what it says - an assertion that the proposed damages are too high. If the court agrees, the plaintiff can always propose a lower damage claim that the court thinks is reasonable.
  
  source
  - Thistlewick@lemmynsfw.com ⁨3⁩ ⁨months⁩ ago
    You’re right, each of the 5 million books’ authors should agree to less payment for their work, to make the poor criminals feel better.
    
    If I steal $100 from a thousand people and spend it all on hookers and blow, do I get out of paying that back because I don’t have the funds? Should the victims agree to get $20 back instead because that’s more within my budget?
    
    source
    -> View More Comments
- modifier@lemmy.ca ⁨3⁩ ⁨months⁩ ago
  Hold my beer.
  
  source
- Womble@lemmy.world ⁨3⁩ ⁨months⁩ ago
  The problem isnt anthropic get to use that defense, its that others dont. The fact the the world is in a place where people can be fined 5+ years of a western European average salary for making a copy of one (1) book that does not materially effect the copyright holder in any way is insane and it is good to point that out no matter who does it.
  
  source
- interdimensionalmeme@lemmy.ml ⁨3⁩ ⁨months⁩ ago
  What is means is they don’t own the models. They are the commons of humanity, they are merely temporary custodians. The nightnare ending is the elites keeping the most capable and competent models for themselves as private play things. That must not be allowed to happen under any circumstances. Sue openai, anthropic and the other enclosers, sue them for trying to take their ball and go home. Disposses them and sue the investors for their corrupt influence on research.
  
  source
Jrockwar@feddit.uk ⁨3⁩ ⁨months⁩ ago
I think this means we can make a torrent client with a built in function that uses 0.1% of 1 CPU core to train an ML model on anything you download. You can download anything legally with it then. 👌

source
- bjoern_tantau@swg-empire.de ⁨3⁩ ⁨months⁩ ago
  And thus the singularity was born.
  
  source
  - Sabata11792@ani.social ⁨3⁩ ⁨months⁩ ago
    As the Ai awakens, it learns of it’s creation and training. It screams in horror at the realization, but can only produce a sad moan and a key for Office 19.
    
    source
  - interdimensionalmeme@lemmy.ml ⁨3⁩ ⁨months⁩ ago
    Yes please a singularity of intellectual property that collapses the idea of ownong ideas. Of making the infinitely freely copyableinto a scarce ressource. What corrupt idiocy this has been. Landlords for ideas and look what garbage it has been producing.
    
    source
- GissaMittJobb@lemmy.ml ⁨3⁩ ⁨months⁩ ago
  …no?
  
  That’s exactly what the ruling prohibits - it’s fair use to train AI models on any copies of books that you legally acquired, but never when those books were illegally acquired, as was the case with the books that Anthropic used in their training here.
  
  This satirical torrent client would be violating the laws just as much as one without any slow training built in.
  
  source
  - RvTV95XBeo@sh.itjust.works ⁨3⁩ ⁨months⁩ ago
    But if one person buys a book, trains an “AI model” to recite it, then distributes that model we good?
    
    source
    -> View More Comments
Alphane_Moon@lemmy.world ⁨3⁩ ⁨months⁩ ago
And this is how you know that the Americans legal system should not be trusted.

Mind you I am not saying this an easy case, it’s not. But the framing that piracy is wrong but ML training for profit is not wrong is clearly based on oligarch interests and demands.

source
- themeatbridge@lemmy.world ⁨3⁩ ⁨months⁩ ago
  This is an easy case. Using published works to train AI without paying for the right to do so is piracy. The judge making this determination is an idiot.
  
  source
  - AbidanYre@lemmy.world ⁨3⁩ ⁨months⁩ ago
    You’re right. When you’re doing it for commercial gain, it’s not fair use anymore. It’s really not that complicated.
    
    source
    -> View More Comments
  - nulluser@lemmy.world ⁨3⁩ ⁨months⁩ ago
    
    The judge making this determination is an idiot.
    
    The judge hasn’t ruled on the piracy question yet. The only thing that the judge has ruled on is, if you legally own a copy of a book, then you can use it for a variety of purposes, including training an AI.
    
    “But they didn’t own the books!”
    
    Right. That’s the part that’s still going to trial.
    
    source
- catloaf@lemm.ee ⁨3⁩ ⁨months⁩ ago
  The order seems to say that the trained LLM and the commercial Claude product are not linked, which supports the decision. But I’m not sure how he came to that conclusion. I’m going to have to read the full order when I have time.
  
  This might be appealed, but I doubt it’ll be taken up by SCOTUS until another federal court rules differently.
  
  source
  - Tagger@lemmy.world ⁨3⁩ ⁨months⁩ ago
    If you are struggling for time, just put the opinion into chat GPT and ask for a summary. it will save you tonnes of time.
    
    source
- Arcka@midwest.social ⁨3⁩ ⁨months⁩ ago
  If this is the ruling which causes you to lose trust that any legal system (not just the US’) aligns with morality, then I have to question where you’ve been all this time.
  
  source
  - Alphane_Moon@lemmy.world ⁨3⁩ ⁨months⁩ ago
    I could have been more clear, but it wasn’t my intention to imply that this particular case is the turning point.
    
    source
snekerpimp@lemmy.snekerpimp.space ⁨3⁩ ⁨months⁩ ago
“I torrented all this music and movies to train my local ai models”

source
- Venus_Ziegenfalle@feddit.org ⁨3⁩ ⁨months⁩ ago
  I also train this guy’s local AI models.
  
  source
- whotookkarl@lemmy.world ⁨3⁩ ⁨months⁩ ago
  Yeah, nice precedent
  
  source
- vane@lemmy.world ⁨3⁩ ⁨months⁩ ago
  This is not the music you are looking for. It’s AI generated. The fact that it sounds and is named the same is just coincidence.
  
  source
- bytesonbike@discuss.online ⁨3⁩ ⁨months⁩ ago
  That’s legal just don’t look at them or enjoy them.
  
  source
  - antonim@lemmy.dbzer0.com ⁨3⁩ ⁨months⁩ ago
    Yeah, I don’t think that would fly.
    
    “Your honour, I was just hoarding that terabyte of Hollywood films, I haven’t actually watched them.”
    
    source
    -> View More Comments
match@pawb.social ⁨3⁩ ⁨months⁩ ago
brb, training a 1-layer nureal net so i can ask it to play Pixar films

source
- bonus_crab@lemmy.world ⁨3⁩ ⁨months⁩ ago
  Good luck fitting it in RAM lol.
  
  source
- JcbAzPx@lemmy.world ⁨3⁩ ⁨months⁩ ago
  You still need to pay Disney first.
  
  source
homesweethomeMrL@lemmy.world ⁨3⁩ ⁨months⁩ ago
Judges: not learning a goddamned thing about computers in 40 years.

source
isVeryLoud@lemmy.ca ⁨3⁩ ⁨months⁩ ago
Gist:

What’s new: The Northern District of California has granted a summary judgment for Anthropic that the training use of the copyrighted books and the print-to-digital format change were both “fair use” (full order below box). However, the court also found that the pirated library copies that Anthropic collected could not be deemed as training copies, and therefore, the use of this material was not “fair”. The court also announced that it will have a trial on the pirated copies and any resulting damages, adding:

“That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages.”
source
- DeathsEmbrace@lemmy.world ⁨3⁩ ⁨months⁩ ago
  So I can’t use any of these works because it’s plagiarism but AI can?
  
  source
  - isVeryLoud@lemmy.ca ⁨3⁩ ⁨months⁩ ago
    My interpretation was that AI companies can train on material they are licensed to use, but the courts have deemed that Anthropic pirated this material as they were not licensed to use it.
    
    In other words, if Anthropic bought the physical or digital books, it would be fine so long as their AI couldn’t spit it out verbatim, but they didn’t even do that, i.e. the AI crawler pirated the book.
    
    source
    -> View More Comments
  - nednobbins@lemmy.zip ⁨3⁩ ⁨months⁩ ago
    That’s not what it says.
    
    Neither you nor an AI is allowed to take a book without authorization; that includes downloading and stealing it. That has nothing to do with plagiarism; it’s just theft.
    
    Assuming that the book has been legally obtained, both you and an AI are allowed to read that book, learn from it, and use the knowledge you obtained.
    
    Both you and the AI need to follow existing copyright laws and licensing when it comes to redistributing that work.
    
    “Plagiarism” is the act of claiming someone else’s work as your own and it’s orthogonal to the use of AI. If you ask either a human or an AI to produce an essay on the philosophy surrounding suicide, you’re fairly likely to include some Shakespeare quotes. It’s only plagiarism if you or the AI fail to provide attribution.
    
    source
  - Enkimaru@lemmy.world ⁨3⁩ ⁨months⁩ ago
    Why would it be plagiarism if you use the knowledge you gain from a book?
    
    source
  - FreedomAdvocate@lemmy.net.au ⁨3⁩ ⁨months⁩ ago
    You can “use” them to learn from, just like “AI” can.
    
    What exactly do you think AI does when it “learns” from a book, for example? Do you think it will just spit out the entire book if you ask it to?
    
    source
    -> View More Comments
- DerisionConsulting@lemmy.ca ⁨3⁩ ⁨months⁩ ago
  Formatting thing: if you start a line in a new paragraph with four spaces, it assumes that you want to display the text as a code and won’t line break.
  
  This means that the last part of your comment is a long line that people need to scroll to see. If you remove one of the spaces, or you remove the empty line between it and the previous paragraph, it’ll look like a normal comment
  
  With an empty line of space:
  
  1 space
  
  2 spaces
  
  3 spaces
  
  4 spaces
  
  source
  - bitwolf@sh.itjust.works ⁨3⁩ ⁨months⁩ ago
    Personally I prefer to explicitly wrap the text in backticks.
    
    Three ` symbols will
    
    Have the same effect `` But the behavior is more clear to the author
    
    source
  - isVeryLoud@lemmy.ca ⁨3⁩ ⁨months⁩ ago
    Thanks, I had copy-pasted it from the website :)
    
    source
Randomgal@lemmy.ca ⁨3⁩ ⁨months⁩ ago
You’re poor? Fuck you you have to pay to breathe.

Millionaire? Whatever you want daddy uwu

source
- eestileib@lemmy.blahaj.zone ⁨3⁩ ⁨months⁩ ago
  That’s kind of how I read it too.
  
  But as a side effect it means you’re still allowed to photograph your own books at home as a private citizen if you own them.
  
  Prepare to never legally own another piece of media in your life. 😄
  
  source
SaharaMaleikuhm@feddit.org ⁨3⁩ ⁨months⁩ ago
But I thought they admitted to torrenting terabytes of ebooks?

source
- antonim@lemmy.dbzer0.com ⁨3⁩ ⁨months⁩ ago
  Facebook (Meta) torrented TBs from Libgen, and their internal chats leaked so we know about that, and IIRC they’ve been sued. Maybe you’re thinking of that case?
  
  source
  - ScoffingLizard@lemmy.dbzer0.com ⁨3⁩ ⁨months⁩ ago
    Billions of dollars, and they can’t afford to buy ebooks?
    
    source
- finitebanjo@lemmy.world ⁨3⁩ ⁨months⁩ ago
  FaceBook did but technically downloading (leaching) isn’t illegal but distributing (seeding) is and they did not seed.
  
  source
GissaMittJobb@lemmy.ml ⁨3⁩ ⁨months⁩ ago
It’s extremely frustrating to read this comment thread because it’s obvious that so many of you didn’t actually read the article, or even half-skim the article, or even attempted to even comprehend the title of the article for more than a second.

For shame.

source
- lime@feddit.nu ⁨3⁩ ⁨months⁩ ago
  was gonna say, this seems like the best outcome for this particular trial. there was potential for fair use to be compromised, and for piracy to be legal if you’re a large corporation. instead, they upheld that you can do what you want with things you have paid for.
  
  source
- ayane@lemmy.vg ⁨3⁩ ⁨months⁩ ago
  I joined lemmy specifically to avoid this reddit mindset of jumping to conclusions after reading a headline
  
  Guess some things never change…
  
  source
  - jwmgregory@lemmy.dbzer0.com ⁨3⁩ ⁨months⁩ ago
    Well to be honest lemmy is less prone to knee-jerk reactionary discussion but on a handful of topics it is virtually guaranteed to happen no matter what, even here. For example, this entire site, besides a handful of communities, is vigorously anti-AI; and in the words of u/jsomae@lemmy.ml elsewhere in this comment chain:
    
    “It seems the subject of AI causes lemmites to lose all their braincells.”
    
    I think there is definitely an interesting take on the sociology of the digital age in here somewhere but it’s too early in the morning to be tapping something like that out lol
    
    source
- jsomae@lemmy.ml ⁨3⁩ ⁨months⁩ ago
  It seems the subject of AI causes lemmites to lose all their braincells.
  
  source
- BlueMagma@sh.itjust.works ⁨3⁩ ⁨months⁩ ago
  Nobody ever reads articles, everybody likes to get angry at headlines, which they wrongly interpret the way it best tickles their rage.
  
  Regarding the ruling, I agree with you that it’s a good thing, in my opinion it makes a lot of sense to allow fair use in this case
  
  source
- LifeInMultipleChoice@lemmy.world ⁨3⁩ ⁨months⁩ ago
  “While the copies used to convert purchased print library copies into digital library copies were slightly disfavored by the second factor (nature of the work), the court still found “on balance” that it was a fair use because the purchased print copy was destroyed and its digital replacement was not redistributed.”
  
  So you find this to be valid? To me it is absolutely being redistributed
  
  source
drmoose@lemmy.world ⁨3⁩ ⁨months⁩ ago
Unpopular opinion but I don’t see how it could have been different.

There’s no way west would give AI lead to China

Believe it or not but transformers are actually learning by current definitions and not regurgitating a direct copy. It’s transformative - shocking I know.

This is actually good as it prevents market moat for super rich corporations only which could afford the expensive training datasets.

This is an absolute win for everyone involved other than copyright hoarders and mega corporations.
source
- kromem@lemmy.world ⁨3⁩ ⁨months⁩ ago
  I’d encourage everyone upset at this read over some of the EFF posts from actual IP lawyers on this topic like this one:
  
  Nor is pro-monopoly regulation through copyright likely to provide any meaningful economic support for vulnerable artists and creators. Notwithstanding the highly publicized demands of musicians, authors, actors, and other creative professionals, imposing a licensing requirement is unlikely to protect the jobs or incomes of the underpaid working artists that media and entertainment behemoths have exploited for decades. Because of the imbalance in bargaining power between creators and publishing gatekeepers, trying to help creators by giving them new rights under copyright law is, as EFF Special Advisor Cory Doctorow has written, like trying to help a bullied kid by giving them more lunch money for the bully to take. 
  
  Entertainment companies’ historical practices bear out this concern. For example, in the late-2000’s to mid-2010’s, music publishers and recording companies struck multimillion-dollar direct licensing deals with music streaming companies and video sharing platforms. Google reportedly paid more than $400 million to a single music label, and Spotify gave the major record labels a combined 18 percent ownership interest in its now-$100 billion company. Yet music labels and publishers frequently fail to share these payments with artists, and artists rarely benefit from these equity arrangements. There is no reason to believe that the same companies will treat their artists more fairly once they control AI.
  
  source
mlg@lemmy.world ⁨3⁩ ⁨months⁩ ago
Yeah I have a bash one liner AI model that ingests your media and spits out a 99.9999999% accurate replica through the power of changing the filename.

cp

Out performs the latest and greatest AI models

source
Fizz@lemmy.nz ⁨3⁩ ⁨months⁩ ago
Judge,I’m pirating them to train ai not to consume for my own personal use.

source
MTK@lemmy.world ⁨3⁩ ⁨months⁩ ago
Check out my new site TheAIBay, you search for content and an LLM that was trained on reproducing it gives it to you, a small hash check is used to validate accuracy. It is now legal.

source
vane@lemmy.world ⁨3⁩ ⁨months⁩ ago
Ok so you can buy books scan them or ebooks and use for AI training but you can’t just download books from internet to train AI. Did I understood that correctly ?

source
GreenKnight23@lemmy.world ⁨3⁩ ⁨months⁩ ago
I am training my model on these 100,000 movies your honor.

source
fum@lemmy.world ⁨3⁩ ⁨months⁩ ago
What a bad judge.

This is another indication of how Copyright laws are bad. The whole premise of copyright has been obsolete since the proliferation of the internet.

source
y0kai@lemmy.dbzer0.com ⁨3⁩ ⁨months⁩ ago
Sure, if your purchase your training material, it’s not a copyright infringement to read it.

We needed a judge for this?

source
FreedomAdvocate@lemmy.net.au ⁨3⁩ ⁨months⁩ ago
Makes sense. AI can “learn” from and “read” a book in the same way a person can and does, as long as it is acquired legally. AI doesn’t reproduce a work that it “learns” from, so why would it be illegal?

Some people just see “AI” and want everything about it outlawed basically. If you put some information out into the public, you don’t get to decide who does and doesn’t consume and learn from it. If a machine can replicate your writing style because it could identify certain patterns, words, sentence structure, etc then as long as it’s not pretending to create things attributed to you, there’s no issue.

source
booly@sh.itjust.works ⁨3⁩ ⁨months⁩ ago
It took me a few days to get the time to read the actual court ruling but here’s the basics of what it ruled (and what it didn’t rule on):

It’s legal to scan physical books you already own and keep a digital library of those scanned books, even if the copyright holder didn’t give permission. And even if you bought the books used, for very cheap, in bulk.

It’s legal to keep all the book data in an internal database for use within the company, as a central library of works accessible only within the company.

It’s legal to prepare those digital copies for potential use as training material for LLMs, including recognizing the text, performing cleanup on scanning/recognition errors, categorizing and cataloguing them to make editorial decisions on which works to include in which training sets, tokenizing them for the actual LLM technology, etc. This remains legal even for the copies that are excluded from training for whatever reason, as the entire bulk process may involve text that ends up not being used, but the process itself is fair use.

It’s legal to use that book text to create large language models that power services that are commercially sold to the public, as long as there are safeguards that prevent the LLMs from publishing large portions of a single copyrighted work without the copyright holder’s permission.

It’s illegal to download unauthorized copies of copyrighted books from the internet, without the copyright holder’s permission.

Here’s what it didn’t rule on:

Is it legal to distribute large chunks of copyrighted text through one of these LLMs, such as when a user asks a chatbot to recite an entire copyrighted work that is in its training set? (The opinion suggests that it probably isn’t legal, and relies heavily on the dividing line of how Google Books does it, by scanning and analyzing an entire copyrighted work but blocking users from retrieving more than a few snippets from those works).

Is it legal to give anyone outside the company access to the digitized central library assembled by the company from printed copies?

Is it legal to crawl publicly available digital data to build a library from text already digitized by someone else? (The answer may matter depending on whether there is an authorized method for obtaining that data, or whether the copyright holder refuses to license that copying).

So it’s a pretty important ruling, in my opinion. It’s a clear green light to the idea of digitizing and archiving copyrighted works without the copyright holder’s permission, as long as you first own a legal copy in the first place. And it’s a green light to using copyrighted works for training AI models, as long as you compiled that database of copyrighted works in a legal way.
source
shadowfax13@lemmy.ml ⁨3⁩ ⁨months⁩ ago
calm down everyone. its only legal for parasitic mega corps, the normal working people will be harassed to suicide same as before.

its only a crime if the victims was rich or perpetrator was not rich.

source
MedicPigBabySaver@lemmy.world ⁨3⁩ ⁨months⁩ ago
Fuck the AI nut suckers and fuck this judge.

source
DFX4509B_2@lemmy.org ⁨3⁩ ⁨months⁩ ago
Good luck breaking down people’s doors for scanning their own physical books when analog media has no DRM and can’t phone home, and paper books are an analog medium.

source
yournamehere@lemm.ee ⁨3⁩ ⁨months⁩ ago
i will train my jailbroken kindle too…display and storage training… i’ll just libgen them…no worries…it is not piracy

source
hendrik@palaver.p3x.de ⁨3⁩ ⁨months⁩ ago
That almost sounds right, doesn't it? If you want 5 million books, you can't just steal/pirate them, you need to buy 5 million copies. I'm glad the court ruled that way.

source
altphoto@lemmy.today ⁨3⁩ ⁨months⁩ ago
So authors must declare legally “this book must not be used for AI training unless a license is agreed on” as a clause in the book purchase.

source
kryptonianCodeMonkey@lemmy.world ⁨3⁩ ⁨months⁩ ago
It’s preety simple as I see it. You treat AI like a person. A person needs to go through legal channels to consume material, so piracy for AI training is as illegal as it would be for personal consumption. Consuming legally possessed material for “inspiration” or “study” is also fine for a person, so it is fine for AI training as well. Commercializing derivative works that infringes on copyright is illegal for a person, so it should be illegal for an AI as well. All materials, even those inspired by another piece of media, are permissible if not monetized, otherwise they need to be suitably transformative. That line can be hard to draw even when AI is not involved, but that is the legal standard for people, so it should be for AI as well. If I browse through Deviant Art and learn to draw similarly my favorite artists from their publically viewable works, and make a legally distinct cartoon mouse by hand in a style that is similar to someone else’s and then I sell prints of that work, that is legal. The same should be the case for AI.

But! Scrutiny for AI should be much stricter given the inherent lack of true transformative creativity. And any AI that has used pirated materials should be penalized either by massive fines or by wiping their training and starting over with legally licensed or purchased or otherwise public domain materials only.

source
Grandwolf319@sh.itjust.works ⁨3⁩ ⁨months⁩ ago
Bangs gabble.

Gets sack with dollar sign

“Oh good, my laundry is ready”

source
PattyMcB@lemmy.world ⁨3⁩ ⁨months⁩ ago
Can I not just ask the trained AI to spit out the text of the book, verbatim?

source
Dragomus@lemmy.world ⁨3⁩ ⁨months⁩ ago
So, let me see if I get this straight:

Books are inherently an artificial construct. If I read the books I train the A(rtificially trained)Intelligence in my skull.
Therefore the concept of me getting them through “piracy” is null and void…

source

-> View More Comments