Open Menu
AllLocalCommunitiesAbout
lotide
AllLocalCommunitiesAbout
Login

Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

⁨0⁩ ⁨likes⁩

Submitted ⁨⁨9⁩ ⁨months⁩ ago⁩ by ⁨Pro@programming.dev⁩ to ⁨technology@lemmy.world⁩

https://aifray.com/claude-ai-maker-anthropic-bags-key-fair-use-win-for-ai-platforms-but-faces-trial-over-damages-for-millions-of-pirated-works/

source

Comments

Sort:hotnewtop
  • booly@sh.itjust.works ⁨9⁩ ⁨months⁩ ago

    It took me a few days to get the time to read the actual court ruling but here’s the basics of what it ruled (and what it didn’t rule on):

    • It’s legal to scan physical books you already own and keep a digital library of those scanned books, even if the copyright holder didn’t give permission. And even if you bought the books used, for very cheap, in bulk.
    • It’s legal to keep all the book data in an internal database for use within the company, as a central library of works accessible only within the company.
    • It’s legal to prepare those digital copies for potential use as training material for LLMs, including recognizing the text, performing cleanup on scanning/recognition errors, categorizing and cataloguing them to make editorial decisions on which works to include in which training sets, tokenizing them for the actual LLM technology, etc. This remains legal even for the copies that are excluded from training for whatever reason, as the entire bulk process may involve text that ends up not being used, but the process itself is fair use.
    • It’s legal to use that book text to create large language models that power services that are commercially sold to the public, as long as there are safeguards that prevent the LLMs from publishing large portions of a single copyrighted work without the copyright holder’s permission.
    • It’s illegal to download unauthorized copies of copyrighted books from the internet, without the copyright holder’s permission.

    Here’s what it didn’t rule on:

    • Is it legal to distribute large chunks of copyrighted text through one of these LLMs, such as when a user asks a chatbot to recite an entire copyrighted work that is in its training set? (The opinion suggests that it probably isn’t legal, and relies heavily on the dividing line of how Google Books does it, by scanning and analyzing an entire copyrighted work but blocking users from retrieving more than a few snippets from those works).
    • Is it legal to give anyone outside the company access to the digitized central library assembled by the company from printed copies?
    • Is it legal to crawl publicly available digital data to build a library from text already digitized by someone else? (The answer may matter depending on whether there is an authorized method for obtaining that data, or whether the copyright holder refuses to license that copying).

    So it’s a pretty important ruling, in my opinion. It’s a clear green light to the idea of digitizing and archiving copyrighted works without the copyright holder’s permission, as long as you first own a legal copy in the first place. And it’s a green light to using copyrighted works for training AI models, as long as you compiled that database of copyrighted works in a legal way.

    source
  • Fizz@lemmy.nz ⁨9⁩ ⁨months⁩ ago

    Judge,I’m pirating them to train ai not to consume for my own personal use.

    source
  • DFX4509B_2@lemmy.org ⁨9⁩ ⁨months⁩ ago

    Good luck breaking down people’s doors for scanning their own physical books when analog media has no DRM and can’t phone home, and paper books are an analog medium.

    source
    • Bob_Robertson_IX@discuss.tchncs.de ⁨9⁩ ⁨months⁩ ago

      It sounds like transferring an owned print book to digital and using it to train AI was deemed permissable. But downloading a book from the Internet and using it was training data is not allowed, even if you later purchase the pirated book. So, no one will be knocking down your door for scanning your books.

      This does raise an interesting case where libraries could end up training and distributing public domain AI models.

      source
      • restingboredface@sh.itjust.works ⁨9⁩ ⁨months⁩ ago

        I would actually be okay with libraries having those AI services. Even if they were available only for a fee it would be absurdly low and still waived for people with low or no income.

        source
    • booly@sh.itjust.works ⁨9⁩ ⁨months⁩ ago

      The ruling explicitly says that scanning books and keeping/using those digital copies is legal.

      The piracy found to be illegal was downloading unauthorized copies of books from the internet for free.

      source
      • deltapi@lemmy.world ⁨9⁩ ⁨months⁩ ago

        I wonder if the archive.org cases had any bearing on the decision.

        source
        • -> View More Comments
  • MTK@lemmy.world ⁨9⁩ ⁨months⁩ ago

    Check out my new site TheAIBay, you search for content and an LLM that was trained on reproducing it gives it to you, a small hash check is used to validate accuracy. It is now legal.

    source
    • booly@sh.itjust.works ⁨9⁩ ⁨months⁩ ago

      The court’s ruling explicitly depended on the fact that Anthropic does not allow users to retrieve significant chunks of copyrighted text. It used the entire copyrighted work to train the weights of the LLMs, but is configured not to actually copy those works out to the public user. The ruling says that if the copyright holders later develop evidence that it is possible to retrieve entire copyrighted works, or significant portions of a work, then they will have the right sue over those facts.

      But the facts before the court were that Anthropic’s LLMs have safeguards against distributing copies of identifiable copyrighted works to its users.

      source
    • nodiratime@lemmy.world ⁨9⁩ ⁨months⁩ ago

      Does it “generate” a 1;1 copy?

      source
      • MTK@lemmy.world ⁨9⁩ ⁨months⁩ ago

        You can train an LLM to generate 1:1 copies

        source
      • S_H_K@lemmy.dbzer0.com ⁨9⁩ ⁨months⁩ ago

        Gives you versions like this

        source
        • -> View More Comments
  • Randomgal@lemmy.ca ⁨9⁩ ⁨months⁩ ago

    You’re poor? Fuck you you have to pay to breathe.

    Millionaire? Whatever you want daddy uwu

    source
    • eestileib@lemmy.blahaj.zone ⁨9⁩ ⁨months⁩ ago

      That’s kind of how I read it too.

      But as a side effect it means you’re still allowed to photograph your own books at home as a private citizen if you own them.

      Prepare to never legally own another piece of media in your life. 😄

      source
  • y0kai@lemmy.dbzer0.com ⁨9⁩ ⁨months⁩ ago

    Sure, if your purchase your training material, it’s not a copyright infringement to read it.

    We needed a judge for this?

    source
    • excral@feddit.org ⁨9⁩ ⁨months⁩ ago

      Yes, because just because you bought a book you don’t own its content. You’re not allowed to print and/or sell additional copies or publicly post the entire text. Generally it’s difficult to say where the limit is of what’s allowed. Citing a single sentence in a public posting is most likely fine, citing an entire paragraph is probably fine, too, but an entire chapter would probably be pushing it too far. And when in doubt a judge must decide how far you can go before infringing copyright. There are good arguments to be made that just buying a book doesn’t grant the right to train commercial AI models with it.

      source
  • yournamehere@lemm.ee ⁨9⁩ ⁨months⁩ ago

    i will train my jailbroken kindle too…display and storage training… i’ll just libgen them…no worries…it is not piracy

    source
    • axEl7fB5@lemmy.cafe ⁨9⁩ ⁨months⁩ ago

      why do you even jailbreak your kindle? you can still read pirated books on them if you connect it to your pc using calibre

      source
      • yournamehere@lemm.ee ⁨9⁩ ⁨months⁩ ago

        when not in use i have it load images from my local webserver that are generated by some scripts and feature local news or the weather. kindle screensaver sucks.

        source
      • j0ester@lemmy.world ⁨9⁩ ⁨months⁩ ago

        Hehe jailbreak an Android OS.

        source
      • Vanilla_PuddinFudge@infosec.pub ⁨9⁩ ⁨months⁩ ago
        1. mobi sucka
        2. koreader doesn’t
        source
    • minorkeys@lemmy.world ⁨9⁩ ⁨months⁩ ago

      Of course we have to have a way to manually check the training data, in detail, as well. Not privacy, just verifying training data.

      source
  • SaharaMaleikuhm@feddit.org ⁨9⁩ ⁨months⁩ ago

    But I thought they admitted to torrenting terabytes of ebooks?

    source
    • finitebanjo@lemmy.world ⁨9⁩ ⁨months⁩ ago

      FaceBook did but technically downloading (leaching) isn’t illegal but distributing (seeding) is and they did not seed.

      source
    • antonim@lemmy.dbzer0.com ⁨9⁩ ⁨months⁩ ago

      Facebook (Meta) torrented TBs from Libgen, and their internal chats leaked so we know about that, and IIRC they’ve been sued. Maybe you’re thinking of that case?

      source
      • ScoffingLizard@lemmy.dbzer0.com ⁨9⁩ ⁨months⁩ ago

        Billions of dollars, and they can’t afford to buy ebooks?

        source
  • isVeryLoud@lemmy.ca ⁨9⁩ ⁨months⁩ ago

    Gist:

    What’s new: The Northern District of California has granted a summary judgment for Anthropic that the training use of the copyrighted books and the print-to-digital format change were both “fair use” (full order below box). However, the court also found that the pirated library copies that Anthropic collected could not be deemed as training copies, and therefore, the use of this material was not “fair”. The court also announced that it will have a trial on the pirated copies and any resulting damages, adding:

    “That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages.”
    
    source
    • DerisionConsulting@lemmy.ca ⁨9⁩ ⁨months⁩ ago

      Formatting thing: if you start a line in a new paragraph with four spaces, it assumes that you want to display the text as a code and won’t line break.

      This means that the last part of your comment is a long line that people need to scroll to see. If you remove one of the spaces, or you remove the empty line between it and the previous paragraph, it’ll look like a normal comment

      With an empty line of space:

      1 space

      2 spaces

      3 spaces

      4 spaces
      
      source
      • bitwolf@sh.itjust.works ⁨9⁩ ⁨months⁩ ago

        Personally I prefer to explicitly wrap the text in backticks.

        Three ` symbols will

        Have the same effect
        ``
        
        But the behavior is more clear to the author
        
        source
      • isVeryLoud@lemmy.ca ⁨9⁩ ⁨months⁩ ago

        Thanks, I had copy-pasted it from the website :)

        source
    • DeathsEmbrace@lemmy.world ⁨9⁩ ⁨months⁩ ago

      So I can’t use any of these works because it’s plagiarism but AI can?

      source
      • nednobbins@lemmy.zip ⁨9⁩ ⁨months⁩ ago

        That’s not what it says.

        Neither you nor an AI is allowed to take a book without authorization; that includes downloading and stealing it. That has nothing to do with plagiarism; it’s just theft.

        Assuming that the book has been legally obtained, both you and an AI are allowed to read that book, learn from it, and use the knowledge you obtained.

        Both you and the AI need to follow existing copyright laws and licensing when it comes to redistributing that work.

        “Plagiarism” is the act of claiming someone else’s work as your own and it’s orthogonal to the use of AI. If you ask either a human or an AI to produce an essay on the philosophy surrounding suicide, you’re fairly likely to include some Shakespeare quotes. It’s only plagiarism if you or the AI fail to provide attribution.

        source
      • Enkimaru@lemmy.world ⁨9⁩ ⁨months⁩ ago

        Why would it be plagiarism if you use the knowledge you gain from a book?

        source
      • FreedomAdvocate@lemmy.net.au ⁨9⁩ ⁨months⁩ ago

        You can “use” them to learn from, just like “AI” can.

        What exactly do you think AI does when it “learns” from a book, for example? Do you think it will just spit out the entire book if you ask it to?

        source
        • -> View More Comments
      • isVeryLoud@lemmy.ca ⁨9⁩ ⁨months⁩ ago

        My interpretation was that AI companies can train on material they are licensed to use, but the courts have deemed that Anthropic pirated this material as they were not licensed to use it.

        In other words, if Anthropic bought the physical or digital books, it would be fine so long as their AI couldn’t spit it out verbatim, but they didn’t even do that, i.e. the AI crawler pirated the book.

        source
        • -> View More Comments
  • vane@lemmy.world ⁨9⁩ ⁨months⁩ ago

    Ok so you can buy books scan them or ebooks and use for AI training but you can’t just download books from internet to train AI. Did I understood that correctly ?

    source
    • nednobbins@lemmy.zip ⁨9⁩ ⁨months⁩ ago

      That’s my understanding too. If you obtained them legally, you can use them the same way anyone else who obtained them legally could use them.

      source
    • forkDestroyer@infosec.pub ⁨9⁩ ⁨months⁩ ago

      Make an AI that is trained on the books.

      Tell it to tell you a story for one of the books.

      Read the story without paying for it.

      The law says this is ok now, right?

      source
      • booly@sh.itjust.works ⁨9⁩ ⁨months⁩ ago

        The law says this is ok now, right?

        No.

        The judge accepted the fact that Anthropic prevents users from obtaining the underlying copyrighted text through interaction with its LLM, and that there are safeguards in the software that prevent a user from being able to get an entire copyrighted work out of that LLM. It discusses the Google Books arrangement, where the books are scanned in the entirety, but where a user searching in Google Books can’t actually retrieve more than a few snippets from any given book.

        Anthropic get to keep the copy of the entire book. It doesn’t get to transmit the contents of that book to someone else, even through the LLM service.

        The judge also explicitly stated that if the authors can put together evidence that it is possible for a user to retrieve their entire copyrighted work out of the LLM, they’d have a different case and could sue over it at that time.

        source
      • nednobbins@lemmy.zip ⁨9⁩ ⁨months⁩ ago

        Sort of.

        If you violated laws in obtaining the book (eg stole or downloaded it without permission) it’s illegal and you’ve already violated the law, no matter what you do after that.

        If you obtain the book legally you can do whatever you want with that book, by the first sale doctrine. If you want to redistribute the book, you need the proper license. You don’t need any licensing to create a derivative work. That work has to be “sufficiently transformed” in order to pass.

        source
      • Enkimaru@lemmy.world ⁨9⁩ ⁨months⁩ ago

        The LLM is not repeating the same book. The owner of the LLM has the exact same rights to do with what his LLM is reading, as you have to do with what ever YOU are reading.

        As long as it is not a verbatim recitation, it is completely okay.

        According to story telling theory: there are only roughly 15 different story types anyway.

        source
      • LoreleiSankTheShip@lemmy.ml ⁨9⁩ ⁨months⁩ ago

        As long as they don’t use exactly the same words in the book, yeah, as I understand it.

        source
        • -> View More Comments
  • FreedomAdvocate@lemmy.net.au ⁨9⁩ ⁨months⁩ ago

    Makes sense. AI can “learn” from and “read” a book in the same way a person can and does, as long as it is acquired legally. AI doesn’t reproduce a work that it “learns” from, so why would it be illegal?

    Some people just see “AI” and want everything about it outlawed basically. If you put some information out into the public, you don’t get to decide who does and doesn’t consume and learn from it. If a machine can replicate your writing style because it could identify certain patterns, words, sentence structure, etc then as long as it’s not pretending to create things attributed to you, there’s no issue.

    source
    • antonim@lemmy.dbzer0.com ⁨9⁩ ⁨months⁩ ago

      AI can “learn” from and “read” a book in the same way a person can and does,

      If it’s in the same way, then why do you need the quotation marks? Even you understand that they’re not the same.

      And either way, machine learning is different from human learning in so many ways it’s ridiculous to even discuss the topic.

      AI doesn’t reproduce a work that it “learns” from

      That depends on the model and the amount of data it has been trained on. I remember the first public model of ChatGPT producing a sentence that was just one word different from what I found by googling the text (from some scientific article summary, so not a trivial sentence that could line up accidentally). More recently, there was a widely reported-on study of AI-generated poetry where the model was requested to produce a poem in the style of Chaucer, and then produced a letter-for-letter reproduction of the well-known opening of the Canterbury Tales. It hasn’t been trained on enough Middle English poetry and thus can’t generate any of it, so it defaulted to copying a text that probably occurred dozens of times in its training data.

      source
    • elrik@lemmy.world ⁨9⁩ ⁨months⁩ ago

      AI can “learn” from and “read” a book in the same way a person can and does

      This statement is the basis for your argument and it is simply not correct.

      Training LLMs and similar AI models is much closer to a sophisticated lossy compression algorithm than it is to human learning. The processes are not at all similar given our current understanding of human learning.

      AI doesn’t reproduce a work that it “learns” from, so why would it be illegal?

      The current Disney lawsuit against Midjourney is illustrative - literally, it includes numerous side-by-side comparisons - of how AI models are capable of recreating iconic copyrighted work that is indistinguishable from the original.

      If a machine can replicate your writing style because it could identify certain patterns, words, sentence structure, etc then as long as it’s not pretending to create things attributed to you, there’s no issue.

      An AI doesn’t create works on its own. A human instructs AI to do so. Attribution is also irrelevant. If a human uses AI to recreate the exact tone, structure and other nuances of say, some best selling author, they harm the marketability of the original works which fails fair use tests (at least in the US).

      source
      • jwmgregory@lemmy.dbzer0.com ⁨9⁩ ⁨months⁩ ago

        Even if we accept all your market liberal premise without question… in your own rhetorical framework the Disney lawsuit should be ruled against Disney.

        If a human uses AI to recreate the exact tone, structure and other nuances of say, some best selling author, they harm the marketability of the original works which fails fair use tests (at least in the US).

        Says who? In a free market why is the competition from similar products and brands such a threat as to be outlawed? Think reasonably about what you are advocating… you think authorship is so valuable or so special that one should be granted a legally enforceable monopoly at the loosest notions of authorship. This is the definition of a slippery-slope, and yet, it is the status quo of the society we live in.

        On it “harming marketability of the original works,” frankly, that’s a fiction and anyone advocating such ideas should just fucking weep about it instead of enforce overreaching laws on the rest of us. If you can’t sell your art because a machine made “too good a copy” of your art, it wasn’t good art in the first place and that is not the fault of the machine. Even big pharma doesn’t get to outright ban generic medications (even tho they certainly tried)… it is patently fucking absurd to decry artist’s lack of a state-enforced monopoly on their work. Why do you think we should extend such a radical policy towards… checks notes… tumblr artists and other commission based creators? It’s not good when big companies do it for themselves through lobbying, it wouldn’t be good to do it for “the little guy,” either. The real artists working in industry don’t want to change the law this way because they know it doesn’t work in their favor. Disney’s lawsuit is in the interest of Disney and big capital, not artists themselves, despite what these large conglomerates that trade in IPs and dreams might try to convince the art world writ large of.

        source
        • -> View More Comments
      • FreedomAdvocate@lemmy.net.au ⁨9⁩ ⁨months⁩ ago

        Your very first statement calling my basis for my argument incorrect is incorrect lol.

        LLMs “learn” things from the content they consume. They don’t just take the content in wholesale and keep it there to regurgitate on command.

        On your last part, unless someone uses AI to recreate the tone etc of a best selling author *and then markets their book/writing as being from said best selling author, and doesn’t use trademarked characters etc, there’s no issue. You can’t copyright a style of writing.

        source
        • -> View More Comments
    • badcommandorfilename@lemmy.world ⁨9⁩ ⁨months⁩ ago

      Ask a human to draw an orc. How do they know what an orc looks like? They read Tolkien’s books and were “inspired” Peter Jackson’s LOTR.

      Unpopular opinion, but that’s how our brains work.

      source
      • burntbacon@discuss.tchncs.de ⁨9⁩ ⁨months⁩ ago

        Fuck you, I won’t do what you tell me!

        .>

        <.<

        spoiler

        I was inspired by the sometimes hilarious dnd splatbooks, thank you very much.

        source
  • fum@lemmy.world ⁨9⁩ ⁨months⁩ ago

    What a bad judge.

    This is another indication of how Copyright laws are bad. The whole premise of copyright has been obsolete since the proliferation of the internet.

    source
    • gian@lemmy.grys.it ⁨9⁩ ⁨months⁩ ago

      What a bad judge.

      Why ? Basically he simply stated that you can use whatever material you want to train your model as long as you ask the permission to use it (and presumably pay for it) to the author (or copytight holder)

      source
      • patatahooligan@lemmy.world ⁨9⁩ ⁨months⁩ ago

        “Fair use” is the exact opposite of what you’re saying here. It says that you don’t need to ask for any permission. The judge ruled that obtaining illegitimate copies was unlawful but use without the creators consent is perfectly fine.

        source
      • LifeInMultipleChoice@lemmy.world ⁨9⁩ ⁨months⁩ ago

        If I understand correctly they are ruling you can by a book once, and redistribute the information to as many people you want without consequences. Aka 1 student should be able to buy a textbook and redistribute it to all other students for free.

        They may be trying to put safeguards so it isn’t directly happening, but here is an example that the text is there word for word:

        Image

        source
        • -> View More Comments
      • j0ester@lemmy.world ⁨9⁩ ⁨months⁩ ago

        Huh? Didn’t Meta not use any permission, and pirated a lot of books to train their model?

        source
        • -> View More Comments
  • GissaMittJobb@lemmy.ml ⁨9⁩ ⁨months⁩ ago

    It’s extremely frustrating to read this comment thread because it’s obvious that so many of you didn’t actually read the article, or even half-skim the article, or even attempted to even comprehend the title of the article for more than a second.

    For shame.

    source
    • LifeInMultipleChoice@lemmy.world ⁨9⁩ ⁨months⁩ ago

      “While the copies used to convert purchased print library copies into digital library copies were slightly disfavored by the second factor (nature of the work), the court still found “on balance” that it was a fair use because the purchased print copy was destroyed and its digital replacement was not redistributed.”

      So you find this to be valid? To me it is absolutely being redistributed

      source
    • jsomae@lemmy.ml ⁨9⁩ ⁨months⁩ ago

      It seems the subject of AI causes lemmites to lose all their braincells.

      source
    • ayane@lemmy.vg ⁨9⁩ ⁨months⁩ ago

      I joined lemmy specifically to avoid this reddit mindset of jumping to conclusions after reading a headline

      Guess some things never change…

      source
      • jwmgregory@lemmy.dbzer0.com ⁨9⁩ ⁨months⁩ ago

        Well to be honest lemmy is less prone to knee-jerk reactionary discussion but on a handful of topics it is virtually guaranteed to happen no matter what, even here. For example, this entire site, besides a handful of communities, is vigorously anti-AI; and in the words of u/jsomae@lemmy.ml elsewhere in this comment chain:

        “It seems the subject of AI causes lemmites to lose all their braincells.”

        I think there is definitely an interesting take on the sociology of the digital age in here somewhere but it’s too early in the morning to be tapping something like that out lol

        source
    • BlueMagma@sh.itjust.works ⁨9⁩ ⁨months⁩ ago

      Nobody ever reads articles, everybody likes to get angry at headlines, which they wrongly interpret the way it best tickles their rage.

      Regarding the ruling, I agree with you that it’s a good thing, in my opinion it makes a lot of sense to allow fair use in this case

      source
    • lime@feddit.nu ⁨9⁩ ⁨months⁩ ago

      was gonna say, this seems like the best outcome for this particular trial. there was potential for fair use to be compromised, and for piracy to be legal if you’re a large corporation. instead, they upheld that you can do what you want with things you have paid for.

      source
  • CriticalMiss@lemmy.world ⁨9⁩ ⁨months⁩ ago

    This 240TB JBOD full of books? Oh heavens forbid, we didn’t pirate it. It uhh… fell of a truck, yes, fell off a truck.

    source
  • drmoose@lemmy.world ⁨9⁩ ⁨months⁩ ago

    Unpopular opinion but I don’t see how it could have been different.

    • There’s no way west would give AI lead to China
    • Believe it or not but transformers are actually learning by current definitions and not regurgitating a direct copy. It’s transformative - shocking I know.
    • This is actually good as it prevents market moat for super rich corporations only which could afford the expensive training datasets.

    This is an absolute win for everyone involved other than copyright hoarders and mega corporations.

    source
    • kromem@lemmy.world ⁨9⁩ ⁨months⁩ ago

      I’d encourage everyone upset at this read over some of the EFF posts from actual IP lawyers on this topic like this one:

      Nor is pro-monopoly regulation through copyright likely to provide any meaningful economic support for vulnerable artists and creators. Notwithstanding the highly publicized demands of musicians, authors, actors, and other creative professionals, imposing a licensing requirement is unlikely to protect the jobs or incomes of the underpaid working artists that media and entertainment behemoths have exploited for decades. Because of the imbalance in bargaining power between creators and publishing gatekeepers, trying to help creators by giving them new rights under copyright law is, as EFF Special Advisor Cory Doctorow has written, like trying to help a bullied kid by giving them more lunch money for the bully to take. 

      Entertainment companies’ historical practices bear out this concern. For example, in the late-2000’s to mid-2010’s, music publishers and recording companies struck multimillion-dollar direct licensing deals with music streaming companies and video sharing platforms. Google reportedly paid more than $400 million to a single music label, and Spotify gave the major record labels a combined 18 percent ownership interest in its now-$100 billion company. Yet music labels and publishers frequently fail to share these payments with artists, and artists rarely benefit from these equity arrangements. There is no reason to believe that the same companies will treat their artists more fairly once they control AI.

      source
    • LovableSidekick@lemmy.world ⁨9⁩ ⁨months⁩ ago

      You are being douchevoted because on lemmy any comment that isn’t negative about AI is the Devil’s Work.

      source
      • jwmgregory@lemmy.dbzer0.com ⁨9⁩ ⁨months⁩ ago

        Some communities on this site speak about machine learning exactly how I see grungy Europeans from pre-18th century manuscripts speaking about witches, Satan, and evil… as if it is some pervasive, black-magic miasma.

        As someone who is in the field of machine learning academically/professionally it’s honestly kind of shocking and has largely informed my opinion of society at large as an adult. No one puts any effort into learning if they see the letters “A” and “I” in all caps, next to each other. Immediately turn their brain off and start regurgitating points and responding reflexively, on Lemmy or otherwise. People talk about it so confidently while being so frustratingly unaware of their own ignorance on the matter, which, for lack of a better comparison… reminds me a lot of how historically and in fiction human beings have treated literal magic.

        That’s my main issue with the entire swath of “pro vs anti AI” discourse… all these people treating something that, to me, is simple & daily reality as something entirely different than my own personal notion of it.

        source
        • -> View More Comments
    • deathbird@mander.xyz ⁨9⁩ ⁨months⁩ ago
      1. Idgaf about China and what they do and you shouldn’t either, even if US paranoia about them is highly predictable.
      2. Depending on the outputs it’s not always that transformative.
      3. The moat would be good actually. The business model of LLMs isn’t good, but it’s not even viable without massive subsidies, not least of which is taking people’s shit without paying.

      It’s a huge loss for smaller copyright holders too. They can’t afford to fight when they get imitated beyond fair use. Copyright abuse can only be fixed by the very force that creates copyright in the first place: law. The market can’t fix that. This just decides winners between competing mega corporations, and even worse, up ends a system that some smaller players have been able to carve a niche in.

      Want to fix copyright? Put real time limits on it. Bind it to a living human only. Make it non-transferable. There’s all sorts of ways to fix it, but this isn’t it.

      source
      • Atlas_@lemmy.world ⁨9⁩ ⁨months⁩ ago

        Maybe something could be hacked together to fix copyright, but further complication there is just going to make accurate enforcement even harder. And we already have Google (in YouTube) already doing a shitty job of it and that’s… One of the largest companies on earth.

        We should just kill copyright. Yes, it’ll disrupt Hollywood. Yes it’ll disrupt the music industry. Yes it’ll make it even harder to be successful or wealthy as an author. But this is going to happen one way or the other so long as AI can be trained on copyrighted works (and maybe even if not). We might as well get started on the transition early.

        source
      • drmoose@lemmy.world ⁨9⁩ ⁨months⁩ ago

        I’ll be honest with you - im a huge lefty and I don’t see how this could ever be solved with the methods you suggested. The world is not coming together to hold hands and koombayah out of this one. Trade deals are incredibly hard and even harder to enforce so free market is clearly the only path forward here.

        source
  • shadowfax13@lemmy.ml ⁨9⁩ ⁨months⁩ ago

    calm down everyone. its only legal for parasitic mega corps, the normal working people will be harassed to suicide same as before.

    its only a crime if the victims was rich or perpetrator was not rich.

    source
    • BlueMagma@sh.itjust.works ⁨9⁩ ⁨months⁩ ago

      This ruling stated that corporations are not allowed to pirate books to use them in training. Please read the headlines more carefully, and read the article.

      source
      • Knock_Knock_Lemmy_In@lemmy.world ⁨9⁩ ⁨months⁩ ago

        Or, If a legal copy of the book is owned then it can be used for AI training.

        The court is saying that no special AI book license is needed.

        source
    • milicent_bystandr@lemm.ee ⁨9⁩ ⁨months⁩ ago

      Right. Where’s the punishment for Meta who admitted to pirating books?

      source
      • Knock_Knock_Lemmy_In@lemmy.world ⁨9⁩ ⁨months⁩ ago

        This judgment is implying that meta broke the law.

        source
  • altphoto@lemmy.today ⁨9⁩ ⁨months⁩ ago

    So authors must declare legally “this book must not be used for AI training unless a license is agreed on” as a clause in the book purchase.

    source
  • mlg@lemmy.world ⁨9⁩ ⁨months⁩ ago

    Yeah I have a bash one liner AI model that ingests your media and spits out a 99.9999999% accurate replica through the power of changing the filename.

    cp

    Out performs the latest and greatest AI models

    source
    • BlueMagma@sh.itjust.works ⁨9⁩ ⁨months⁩ ago

      This ruling stated that corporations are not allowed to pirate books to use them in training. Please read the headlines more carefully, and read the article.

      source
      • jsomae@lemmy.ml ⁨9⁩ ⁨months⁩ ago

        Please read the comment more carefully. The observation is that one can proliferate a (legally-attained) work without running afoul of copyright law if one can successfully argue that cp constitutes AI.

        source
    • interdimensionalmeme@lemmy.ml ⁨9⁩ ⁨months⁩ ago

      I call this legally distinct, this is legal advice.

      source
    • sugar_in_your_tea@sh.itjust.works ⁨9⁩ ⁨months⁩ ago

      mv will save you some disk space.

      source
  • MedicPigBabySaver@lemmy.world ⁨9⁩ ⁨months⁩ ago

    Fuck the AI nut suckers and fuck this judge.

    source
  • Grandwolf319@sh.itjust.works ⁨9⁩ ⁨months⁩ ago

    Bangs gabble.

    Gets sack with dollar sign

    “Oh good, my laundry is ready”

    source
  • GreenKnight23@lemmy.world ⁨9⁩ ⁨months⁩ ago

    I am training my model on these 100,000 movies your honor.

    source
  • match@pawb.social ⁨9⁩ ⁨months⁩ ago

    brb, training a 1-layer nureal net so i can ask it to play Pixar films

    source
  • kryptonianCodeMonkey@lemmy.world ⁨9⁩ ⁨months⁩ ago

    It’s preety simple as I see it. You treat AI like a person. A person needs to go through legal channels to consume material, so piracy for AI training is as illegal as it would be for personal consumption. Consuming legally possessed material for “inspiration” or “study” is also fine for a person, so it is fine for AI training as well. Commercializing derivative works that infringes on copyright is illegal for a person, so it should be illegal for an AI as well. All materials, even those inspired by another piece of media, are permissible if not monetized, otherwise they need to be suitably transformative. That line can be hard to draw even when AI is not involved, but that is the legal standard for people, so it should be for AI as well. If I browse through Deviant Art and learn to draw similarly my favorite artists from their publically viewable works, and make a legally distinct cartoon mouse by hand in a style that is similar to someone else’s and then I sell prints of that work, that is legal. The same should be the case for AI.

    But! Scrutiny for AI should be much stricter given the inherent lack of true transformative creativity. And any AI that has used pirated materials should be penalized either by massive fines or by wiping their training and starting over with legally licensed or purchased or otherwise public domain materials only.

    source
  • snekerpimp@lemmy.snekerpimp.space ⁨9⁩ ⁨months⁩ ago

    “I torrented all this music and movies to train my local ai models”

    source
  • PattyMcB@lemmy.world ⁨9⁩ ⁨months⁩ ago

    Can I not just ask the trained AI to spit out the text of the book, verbatim?

    source
  • Dragomus@lemmy.world ⁨9⁩ ⁨months⁩ ago

    So, let me see if I get this straight:

    Books are inherently an artificial construct. If I read the books I train the A(rtificially trained)Intelligence in my skull.
    Therefore the concept of me getting them through “piracy” is null and void…

    source
  • hendrik@palaver.p3x.de ⁨9⁩ ⁨months⁩ ago

    That almost sounds right, doesn't it? If you want 5 million books, you can't just steal/pirate them, you need to buy 5 million copies. I'm glad the court ruled that way.

    source
  • Prox@lemmy.world ⁨9⁩ ⁨months⁩ ago

    FTA:

    Anthropic warned against “[t]he prospect of ruinous statutory damages—$150,000 times 5 million books”: that would mean $750 billion.

    So part of their argument is actually that they stole so much that it would be impossible for them/anyone to pay restitution, therefore we should just let them off the hook.

    source
  • Jrockwar@feddit.uk ⁨9⁩ ⁨months⁩ ago

    I think this means we can make a torrent client with a built in function that uses 0.1% of 1 CPU core to train an ML model on anything you download. You can download anything legally with it then. 👌

    source
  • Alphane_Moon@lemmy.world ⁨9⁩ ⁨months⁩ ago

    And this is how you know that the Americans legal system should not be trusted.

    Mind you I am not saying this an easy case, it’s not. But the framing that piracy is wrong but ML training for profit is not wrong is clearly based on oligarch interests and demands.

    source
-> View More Comments