Comment

Comment on The New York Times sues OpenAI and Microsoft for copyright infringement

There is something wrong when search and AI companies extract all of the value produced by journalism for themselves. Sites like Reddit and Lemmy also have this issue. I’m not sure what the solution is. I don’t like the idea of a web full of paywalls, but I also don’t like the idea of all the profit going to the ones who didn’t create the product.

source

Sort:hotnew top

kromem@lemmy.world ⁨1⁩ ⁨year⁩ ago
What’s the value of old journalism?

It’s a product where the value curve is heavily weighted towards recency.

In theory, the greatest value theft is when the AP writes a piece and two dozen other ‘journalists’ copy the thing changing the text just enough not to get sued. Which is completely legal, but what effectively killed investigative journalism.

A LLM taking years old articles and predicting them until it can effectively learn relationships between language itself and events described in those articles isn’t some inherent value theft.

It’s not the training that’s the problem, it’s the application of the models that needs policing.

Like if someone took a LLM, fed it recently published news stories, and had it rewrite them just differently enough that no one needed to visit the original publisher.

Even if we have it legal for humans to do that (which really we might want to revisit, or at least create a special industry specific restriction regarding), maybe we should have different rules for the models.

But to try to claim a LLM that’s allowing coma patients to communicate or to problem solve self-driving algorithms or to diagnose medical issues is stealing the value of old NYT articles in its doing so is not really an argument I see much value in.

source
- jacksilver@lemmy.world ⁨1⁩ ⁨year⁩ ago
  Except no one is claiming that LLMs are the problem, they’re claiming GPT, or more specifically GPTs training data, is the problem. Transformer models still have a lot of potential, but the question the NYT is asking is “can you just takes anyone else’s work to train them”.
  
  source
  - kromem@lemmy.world ⁨1⁩ ⁨year⁩ ago
    There’s a similar suit against Meta for Llama.
    
    And yes, we will end up seeing as the dust settles if training a LLM is fair use in case law.
    
    source
- ChucklesMacLeroy@lemmy.world ⁨1⁩ ⁨year⁩ ago
  Really gave me a whole new perspective. Thanks for that.
  
  source
AllonzeeLV@lemmy.world ⁨1⁩ ⁨year⁩ ago

but I also don’t like the idea of all the profit going to the ones who didn’t create the product.

Should… should we tell him?

source
- kilgore_trout@feddit.it ⁨1⁩ ⁨year⁩ ago
  Tell them instead of mocking them.
  
  Yes, “that’s how the world works”. But doesn’t mean we should stop trying to change it.
  
  source
Kecessa@sh.itjust.works ⁨1⁩ ⁨year⁩ ago
The solution is imposing to these companies the responsibility of tracking the profit per media, tax them and redistribute that money based on the tracking info. They’re able to track all the pages you visit, it’s complete bullshit when they say they don’t know how much they make for each places their ads are displayed.

source
Boiglenoight@lemmy.world ⁨1⁩ ⁨year⁩ ago
AI training is piracy by another name.

source
- uriel238@lemmy.blahaj.zone ⁨1⁩ ⁨year⁩ ago
  Elaborate. Consumption of copyrighted materials is normal use whether by a human or a machine.
  
  source
  - Boiglenoight@lemmy.world ⁨1⁩ ⁨year⁩ ago
    Taking someone else’s work and using it without crediting them or compensating them is theft. If Open AI made a deal with The NY Times to train its product using the papers content, which it would turn around and sell to its own customer base, that would be ethical. What Open AI and other companies like it are doing are stealing ahead of actual law that defines what they’re doing as such.
    
    source
    uriel238@lemmy.blahaj.zone ⁨1⁩ ⁨year⁩ ago
    So listening to Billie Jean without thanking Michael Jackson is theft? That is use.
    
    How about Billie Jean’s baseline which is borrowed from Hall and Oates I Can’t Go For That. Was that theft Michael felt guilty about it but John felt it was routine for creatives to borrow from each other all the time.
    
    How about money and lobbyists inspired extensions of copyright so extreme that both songs (heck, the whole oupuses of both artists) have been denied from the ublic domain Is that theft? Or does it only count when companies and rich estates are denied profits.
    
    From your copyright infringement is theft blanket assertion and your inability or refusal to parse out fair use of copyrighted materials, I infer you don’t actually understand what copyright is or what purpose it is meant to serve to the public. You are just regurgitating the maximalist rhetoric you’ve been spoonfed. Its really kinda sad.
    
    Feel free to exercise more nuance. Or if you like you can double down and remove all doubt.
    
    source
    -> View More Comments
  - treefrog@lemm.ee ⁨1⁩ ⁨year⁩ ago
    Not the original comment but I think the difference you’re looking for is in the copying and distribution. The OC makes the false assumption that the data set is full copies of every object fed into it rather than sets of common characteristics.
    
    For example, my own mind has a concept tree. Tree is not a copy of every tree I’ve ever known but more like lists of common characteristics that define treeness based on information I’ve gathered about treeness (my data set).
    
    Piracy is piracy not because of how it’s consumed, but rather, how it’s distributed and stored, as full copies of the object. Datasets are not copies, in other words. And thus copyright doesn’t apply.
    
    Reading an article to get an idea about what articleness is, is fair use. Reading an article to reproduce it verbatim is not. And as of now, I don’t believe LLMa are doing the later.
    
    source
DogWater@lemmy.world ⁨1⁩ ⁨year⁩ ago
Ai isn’t creating the product. It consumed it.

source