Comment

Comment on Meta admits using pirated books to train AI, but won't pay for it

TWeaK@lemm.ee ⁨1⁩ ⁨year⁩ ago

Fair use covers research, but creating a training database for your commercial product is distinctly different from research. They’re not publishing scientific papers, along with their data, which others can verify; they are developing a commercial product for profit. Even compared to traditional R&D this is markedly different, as they aren’t building a prototype - the test version will eventually become the finished product.

The way fair use works is that a judge first decides whether it fits into one of the categories - news, education, research, criticism, or comment. This does not really fit into the category of “research”, because it isn’t research, it’s the final product in an interim stage. However, even if it were considered research, the next step in fair use is the nature, in particular whether it is commercial. AI is highly commercial.

AI should not even be classified in a fair use category, but even if it were, it should not be granted any exemption because of how commercial it is.

They use other peoples’ work to profit. They should pay for it.

Facebook steals the data of individuals. They should pay for that, too. We don’t exchange our data for access to their website (or for access to some 3rd party Facebook pays to put a pixel on), the website is provided free of charge, and they try and shoehorn another transaction into the fine print of the terms and conditions where the user gives up their data free of charge. It is not proportionate, and the user’s data is taken without proper consideration (ie payment, in terms of the core principles of contract law).

Frankly, it is unsurprising that an entity like Facebook, which so egregiously breaks the law and abuses the rights of every human being who uses the interent, would try to abuse content creators in such a fashion. Their abuse needs to be stopped, in all forms, and they should be made to pay for all of it.

source

Sort:hotnew top

Syntha@sh.itjust.works ⁨1⁩ ⁨year⁩ ago

They’re not publishing scientific papers, along with their data, which others can verify;

Not that I think this is really relevant here but I’m pretty sure Meta has published scientific papers on Llama and the Llama 1 & 2 models are open and accessible to anyone.

source
- TWeaK@lemm.ee ⁨1⁩ ⁨year⁩ ago
  No that is relevant, however I would still argue that a paper without enough data to replicate their work (ie releasing the code of their LLM) isn’t really anything that should qualify as research. The whole point of academia is that someone else verifies your work - or rather, they try to prove you wrong.
  
  source
  - tinwhiskers@lemmy.world ⁨1⁩ ⁨year⁩ ago
    They have released it on github. The code is only about 500 lines.
    
    source
    TWeaK@lemm.ee ⁨1⁩ ⁨year⁩ ago
    Yeah I mean what they’ve released is essentially the design of the battery and starter system, without the design of the actual motor. You can’t replicate their product and prove their work with what they’ve published.
    
    source
abhibeckert@lemmy.world ⁨1⁩ ⁨year⁩ ago

Fair use covers research, but creating a training database for your commercial product is distinctly different from research. They’re not publishing scientific papers, along with their data, which others can verify;

Since when is there a legal requirement to publish the results of your research?

source
- TWeaK@lemm.ee ⁨1⁩ ⁨year⁩ ago
  My answer to both your comments is that just because a lot of people get away with breaking the law and abusing peoples’ rights doesn’t mean it hasn’t happened and they can’t or shouldn’t be held to account.
  
  source
  - HerrBeter@lemmy.world ⁨1⁩ ⁨year⁩ ago
    Tl;Dr I may pirate anything I want because I want many things and cannot figure out how to pay every item individually
    
    source
General_Effort@lemmy.world ⁨1⁩ ⁨year⁩ ago
That’s not at all how fair use works.

source
- TWeaK@lemm.ee ⁨1⁩ ⁨year⁩ ago
  That is exactly how fair use works. Look up the legislation and quote where it says I’m wrong.
  
  source
  - General_Effort@lemmy.world ⁨1⁩ ⁨year⁩ ago
    Sure. I mean, not sure why you wouldn’t just look it up yourself but ok. It takes like 60 seconds to look up and copy/paste.
    
    Notwithstanding the provisions of sections 106 and 106A, the fair use of a copyrighted work, including such use by reproduction in copies or phonorecords or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright. In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include— (1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes; (2) the nature of the copyrighted work; (3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and (4) the effect of the use upon the potential market for or value of the copyrighted work. The fact that a work is unpublished shall not itself bar a finding of fair use if such finding is made upon consideration of all the above factors.
    
    source
    TWeaK@lemm.ee ⁨1⁩ ⁨year⁩ ago
    So where does that say I’m wrong?
    
    I said fair use covers news, education, research, criticism, or comment.
    
    for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research
    
    Then I said the next thing considered is whether it is commercial.
    
    In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include— (1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes
    
    I didn’t cover everything in the law, I just covered the relevant points in a way that could be easily understood and related to the subject at hand.
    
    My point is that the copying AI does isn’t really research, but even if it were considered research it is absolutely commercial and thus should not have a fair use exemption.
    
    source