Comment

Comment on Meta admits using pirated books to train AI, but won't pay for it

SnotFlickerman@lemmy.blahaj.zone ⁨1⁩ ⁨year⁩ ago

nytimes.com/…/openai-new-york-times-lawsuit.html

In its lawsuit Wednesday, the Times accused Microsoft and OpenAI of creating a business model based on “mass copyright infringement,” stating that the companies’ AI systems were “used to create multiple reproductions of The Times’s intellectual property for the purpose of creating the GPT models that exploit and, in many cases, retain large portions of the copyrightable expression contained in those works.”

Publishers are concerned that, with the advent of generative AI chatbots, fewer people will click through to news sites, resulting in shrinking traffic and revenues.

The Times included numerous examples in the suit of instances where GPT-4 produced altered versions of material published by the newspaper.

In one example, the filing shows OpenAI’s software producing almost identical text to a Times article about predatory lending practices in New York City’s taxi industry.

But in OpenAI’s version, GPT-4 excludes a critical piece of context about the sum of money the city made selling taxi medallions and collecting taxes on private sales.

In its suit, the Times said Microsoft and OpenAI’s GPT models “directly compete with Times content.”

If the New York Times’ evidence is true, then you can recreate copyrighted works with LLMs, and as such, they’re doing the same thing as the Pirate Bay, distributing copyrighted works without authorization and making money off the venture.

So far, no ISPs are blocking Meta for this.

source

Sort:hotnew top

General_Effort@lemmy.world ⁨1⁩ ⁨year⁩ ago
I expect ISPs would get into a lot of legal trouble if they did.

The NYT sued OpenAI and MS. a) That doesn’t involve Meta. b) It’s a claim by the NYT.

Why should ISPs deny their paying customers access to Meta sites or sites hosting LLMs released by Meta? These customers have contracts with their service providers. On what grounds, would ISPs be in the right to stop providing these internet services?

source
- SnotFlickerman@lemmy.blahaj.zone ⁨1⁩ ⁨year⁩ ago
  Both Meta and ChatGPT used books3, it’s functionally the same type of case.
  
  Why should ISPs deny their paying customers access to Meta sites or sites hosting LLMs released by Meta? These customers have contracts with their service providers. On what grounds, would ISPs be in the right to stop providing these internet services?
  
  In the countries where ISP blocking happens, its usually because a copyright holder has sued and demanded blocking at the ISP level and has won in court. Then, the government begins the path of working with ISPs to block the site.
  
  Unless you think most governments that do this do it arbitrarily? No, they do it because a copyright holder sued, like the New York Times has. The NYT has not demanded ISP-level blocking, but that does not mean that they couldn’t. I can’t speak to their choice not to do so other than it seems that companies only save that for truly altruistic groups, and rarely do it for other big corporations.
  
  source
  - General_Effort@lemmy.world ⁨1⁩ ⁨year⁩ ago
    
    but that does not mean that they couldn’t.
    
    IDK why you believe this. Breaking contracts is illegal. You get sued and have to pay damages. Some contracts, in some jurisdictions, may allow such arbitrary decisions. In other jurisdictions such clauses may be unenforceable.
    
    altruistic groups
    
    Well, that’s not something that copyright law cares about very much. Unfortunately, this community seems very pro-copyright; very Ayn Rand even. You’re not likely to get much agreement for any sensible reforms; quite the opposite. I don’t think arguing that Meta is doing the same as TPB is going to win anyone over. It’s more likely to get people here to call for more onerous and more harmful IP laws.
    
    Both Meta and ChatGPT used books3, it’s functionally the same type of case.
    
    FWIW, no. the NYT case and this is different in some crucial ways.
    
    source