Comment

Comment on Microsoft’s AI boss thinks it’s perfectly OK to steal content if it’s on the open web

Buffalox@lemmy.world ⁨8⁩ ⁨months⁩ ago

copying is not theft

source

Sort:hotnew top

cmhe@lemmy.world ⁨8⁩ ⁨months⁩ ago
“Copying is theft” is the argument of corporarions for ages, but if they want or data and information, to integrate into their business, then, suddenly they have the rights to it.

If copying is not theft, then we have the rights to copy their software and AI models, as well, since it is available on the open web.

They got themselves into quite a contradiction.

source
- Buffalox@lemmy.world ⁨8⁩ ⁨months⁩ ago
  
  If copying is not theft, then we have the rights to copy their software
  
  Nope false dichotomy, Copying copyrighted material is copyright infringement. Which is illegal.
  Oversimplifying the issue makes for an uninformed debate.
  
  source
  - cactusupyourbutt@lemmy.world ⁨8⁩ ⁨months⁩ ago
    any content you produce is automatically copyrighted
    
    source
- masterspace@lemmy.ca ⁨8⁩ ⁨months⁩ ago
  You realize that half of Lemmy is tying themselves in inconsistent logical knots trying to escape the reverse conundrum?
  
  Copying isn’t stealing and never was. Our IP system that artificially restricts information has never made sense in the digital age, and yet now everyone is on here cheering copyright on.
  
  source
- BoxOfFeet@lemmy.world ⁨8⁩ ⁨months⁩ ago
  You wouldn’t download a car!
  
  source
GamingChairModel@lemmy.world ⁨8⁩ ⁨months⁩ ago
Yeah, I’m not a fan of AI but I’m generally of the view that anything posted on the internet, visible without a login, is fair game for indexing a search engine, snapshotting a backup (like the internet archive’s Wayback Machine), or running user extensions on (including ad blockers).

source
- Evotech@lemmy.world ⁨8⁩ ⁨months⁩ ago
  You can’t be for piracy but against LLMs fair the same reason
  
  And I think most of the people on Lemmy are for piracy,
  
  source
  - sugar_in_your_tea@sh.itjust.works ⁨8⁩ ⁨months⁩ ago
    I’m not in favor of piracy or LLMs. I’m also not a fan of copyright as it exists today (I think we should go back to the 1790 US definition of copyright).
    
    source
    masterspace@lemmy.ca ⁨8⁩ ⁨months⁩ ago
    The problem with copyright has nothing to do with terms limits. Those exacerbate the problem, but the fundamental problem with copyright and IP law is that it is a system of artificial scarcity where there is no need for one.
    
    Rather than reward creators when their information is used, we hamfistedly try and prevent others from using that information so that people have to pay them to use it sometimes.
    
    Capitalism is flat out the wrong system for distributing digital information, because as soon as information is digitized it is effectively infinitely abundant which sends its value to $0.
    
    source
    -> View More Comments
- sugar_in_your_tea@sh.itjust.works ⁨8⁩ ⁨months⁩ ago
  Yes, it kind of is. A search engine just looks for keywords and links, and that’s all it retains after crawling a site. It’s not producing any derivative works, it’s merely looking up an index of keywords to find matches.
  
  An LLM can essentially reproduce a work, and the whole point is to generate derivative works. So by its very nature, it runs into copyright issues. Whether a particular generated result violates copyright depends on the license of the works it’s based on and how much of those works it uses. So it’s complicated, but there’s very much a copyright argument there.
  
  source
  - Halosheep@lemm.ee ⁨8⁩ ⁨months⁩ ago
    My brain also takes information and creates derivative works from it.
    
    Shit, am I also a data thief?
    
    source
    sugar_in_your_tea@sh.itjust.works ⁨8⁩ ⁨months⁩ ago
    That depends, do you copy verbatim? Or do you process and understand concepts, and then create new works based on that understanding? If you copy verbatim, that’s plagiarism and you’re a thief. If you create your own answer, it’s not.
    
    Current AI doesn’t actually “understand” anything, and “learning” is just grabbing input data. If you ask it a question, it’s not understanding anything, it just matches search terms to the part of the training data that matches, and regurgitates a mix of it, and usually omits the sources. That’s it.
    
    It’s a tricky line in journalism since so much of it is borrowed, and it’s likewise tricky w/ AI, but the main difference IMO is attribution, good journalists cite sources, AI rarely does.
    
    source
  - TheRealKuni@lemmy.world ⁨8⁩ ⁨months⁩ ago
    
    An LLM can essentially reproduce a work, and the whole point is to generate derivative works. So by its very nature, it runs into copyright issues.
    
    Derivative works are not copyright infringement. If LLMs are spitting out exact copies, or near-enough-to-exact copies, that’s one thing. But as you said, the whole point is to generate derivative works.
    
    source
    sugar_in_your_tea@sh.itjust.works ⁨8⁩ ⁨months⁩ ago
    
    Derivative works are not copyright infringement
    
    They absolutely are, unless it’s covered by “fair use.” A “derivative work” doesn’t mean you created something that’s inspired by a work, but that you’ve modified the the work and then distributed the modified version.
    
    source
- petrol_sniff_king@lemmy.blahaj.zone ⁨8⁩ ⁨months⁩ ago
  None of those things replace that content, though.
  
  Look, I dunno if this is legally a copyrights issue, but as a society, I think a lot of people have decided they’re willing to yield to social media and search engine indexers, but not to AI training, you know? The same way I might consent to eating a mango but not a banana.
  
  source
ZILtoid1991@lemmy.world ⁨8⁩ ⁨months⁩ ago
Issue is power imbalance.

There’s a clear difference between a guy in his basement on his personal computer sampling music the original musicians almost never seen a single penny from, and a megacorp trying to drive out creative professionals from the industry in the hopes they can then proceed to hike up the prices to use their generative AI software.

source
Womble@lemmy.world ⁨8⁩ ⁨months⁩ ago
Didnt you hear? We stan draconian IP laws now because AI bad.

source
- SnotFlickerman@lemmy.blahaj.zone ⁨8⁩ ⁨months⁩ ago
  Is it that or is it that the laws are selectively applied on little guys and ignored once you make enough money? It certainly looks that way. Once you’ve achieved a level of “fuck you money” it doesn’t matter how unscrupulously you got there.
  
  Examples:
  
  The Pirate Bay: Only made enough money to run the site and keep the admins living a middle class lifestyle.
  
  VERDICT: Bad, wrong, and evil. Must be put in jail.
  
  OpenAI: Claims to be non-profit, then spins off for-profit wing. Makes a mint in a deal with Microsoft.
  
  VERDICT: Only the goodest of good people and we must allow them to continue doing so.
  
  The IP laws are stupid but letting fucking rich twats get away with it while regular people will still get fucked by the same rules is kind of a fucking stupid ass hill to die on.
  
  source
  - Grimy@lemmy.world ⁨8⁩ ⁨months⁩ ago
    The laws are currently the same for everyone when it comes to what you can use to train an AI with. I, as an individual, can use whatever public facing data I wish to build or fine tune AI models, same as Microsoft.
    
    If we make copyright laws even stronger, the only one getting locked out of the game are the little guys. Microsoft, google and company can afford to pay ridiculous prices for datasets. What they don’t own mainly comes from aggregators like Reddit, Getty, Instagram and Stack.
    
    Boosting copyright laws essentially kill all legal forms of open source AI. It would force the open source scene to go underground as a pirate network and lead to the scenario you mentioned.
    
    source
  - Womble@lemmy.world ⁨8⁩ ⁨months⁩ ago
    Yes, it is a travesty that people are being hounded for sharing information, but the solution to that isn’t to lock up information tighter by restricting access to the open web and saying if you download something we put up to be freely accessed and then use it in a way we don’t like you own us.
    
    source
  - 0x0@programming.dev ⁨8⁩ ⁨months⁩ ago
    
    letting fucking rich twats get away with it
    
    That’s law in general…
    
    source