Comment

Comment on The Irony of 'You Wouldn't Download a Car' Making a Comeback in AI Debates

EldritchFeminity@lemmy.blahaj.zone ⁨5⁩ ⁨months⁩ ago

The argument that these models learn in a way that’s similar to how humans do is absolutely false, and the idea that they discard their training data and produce new content is demonstrably incorrect. These models can and do regurgitate their training data, including copyrighted characters.

And these things don’t learn styles, techniques, or concepts. They effectively learn statistical averages and patterns and collage them together. I’ve gotten to the point where I can guess what model of image generator was used based on the same repeated mistakes that they make every time. Take a look at any generated image, and you won’t be able to identify where a light source is because the shadows come from all different directions. These things don’t understand the concept of a shadow or lighting, they just know that statistically lighter pixels are followed by darker pixels of the same hue and that some places have collections of lighter pixels. I recently heard about an ai that scientists had trained to identify pictures of wolves that was working with incredible accuracy. When they went in to figure out how it was identifying wolves from dogs like huskies so well, they found that it wasn’t even looking at the wolves at all. 100% of the images of wolves in its training data had snowy backgrounds, so it was simply searching for concentrations of white pixels (and therefore snow) in the image to determine whether or not a picture was of wolves or not.

source

Sort:hotnew top

Riccosuave@lemmy.world ⁨5⁩ ⁨months⁩ ago
Even if they learned exactly like humans do, like so fucking what, right!? Humans have to pay EXORBITANT fees for higher education in this country. Arguing that your bot gets socialized education before the people do is fucking absurd.

source
- v_krishna@lemmy.ml ⁨5⁩ ⁨months⁩ ago
  That seems more like an argument for free higher education rather than restricting what corpuses a deep learning model can train on
  
  source
  - Malfeasant@lemm.ee ⁨5⁩ ⁨months⁩ ago
    Tomato, tomato…
    
    source
  - nickwitha_k@lemmy.sdf.org ⁨5⁩ ⁨months⁩ ago
    Porque no los dos? Allowing major corps to put even more downward pressure on workers doesn’t help anyone but the rich. LLMs aren’t going to save the world or become sentient.
    
    source
ricecake@sh.itjust.works ⁨5⁩ ⁨months⁩ ago
Basing your argument around how the model or training system works doesn’t seem like the best way to frame your point to me. It invites a lot of mucking about in the details of how the systems do or don’t work, how humans learn, and what “learning” and “knowledge” actually are.

I’m a human as far as I know, and it’s trivial for me to regurgitate my training data. I regularly say things that are either directly references to things I’ve heard, or accidentally copy them, sometimes with errors.
Would you argue that I’m just a statistical collage of the things I’ve experienced, seen or read? My brain has as many copies of my training data in it as the AI model, namely zero, but “Captain Picard of the USS Enterprise sat down for a rousing game of chess with his friend Sherlock Holmes, and then Shakespeare came in dressed like Mickey mouse and said ‘to be or not to be, that is the question, for tis nobler in the heart’ or something”. Direct copies of someone else’s work, as well as multiple copyright infringements.
I’m also shit at drawing with perspective. It comes across like a drunk toddler trying their hand at cubism.

Arguing about how the model works or the deficiencies of it to justify treating it differently just invites fixing those issues and repeating the same conversation later. What if we make one that does work how humans do in your opinion? Or it properly actually extracts the information in a way that isn’t just statistically inferred patterns, whatever the distinction there is? Does that suddenly make it different?

You don’t need to get bogged down in the muck of the technical to say that even if you conceed every technical point, we can still say that a non-sentient machine learning system can be held to different standards with regards to copyright law than a sentient person. A person gets to buy a book, read it, and then carry around that information in their head and use it however they want. Not-A-Person does not get to read a book and hold that information without consent of the author.
Arguing why it’s bad for society for machines to mechanise the production of works inspired by others is more to the point.

Computers think the same way boats swim. Arguing about the difference between hands and propellers misses the point that you don’t want a shrimp boat in your swimming pool. I don’t care why they’re different, or that it technically violated the “free swim” policy, I care that it ruins the whole thing for the people it exists for in the first place.

I think all the AI stuff is cool, fun and interesting. I also think that letting it train on everything regardless of the creators wishes has too much opportunity to make everything garbage. Same for letting it produce content that isn’t labeled or cited.
If they can find a way to do and use the cool stuff without making things worse, they should focus on that.

source
- keegomatic@lemmy.world ⁨5⁩ ⁨months⁩ ago
  I’m not the above poster, but I really appreciate your argument. I think many people overcorrect in their minds about whether or not these models learn the way we do, and they miss the fact that they do behave very similarly to parts of our own systems. I’ve generally found that that overcorrection leads to bad arguments about copyright violation and ethical concerns.
  
  However, your point is very interesting (and it is thankfully independent of that overcorrection). We’ve never had to worry about nonhuman personhood in any amount of seriousness in the past, so it’s strangely not obvious despite how obvious it should be: it’s okay to treat real people as special, even in the face of the arguable personhood of a sufficiently advanced machine. One good reason the machine can be treated differently is because we made it for us, like everything else we make.
  
  I think there still is one related but dangling ethical question. What about machines that are made for us but we decide for whatever reason that they are equivalent in sentience and consciousness to humans?
  
  A human has rights and can take what they’ve learned and make works inspired by it for money, or for someone else to make money through them. They are well within their rights to do so. A machine that we’ve decided is equivalent in sentience to a human, though… can that nonhuman person go take what it’s learned and make works inspired by it so that another person can make money through them?
  
  If they SHOULDN’T be allowed to do that, then it’s notable that this scenario is only separated from what we have now by a gap in technology.
  
  If they SHOULD be allowed to do that (which we could make a good argument for, since we’ve agreed that it is a sentient being) then the technology gap is again notable.
  
  I don’t think the size of the technology gap actually matters here, logically; I think you can hand-wave it away pretty easily and apply it to our current situation rather than a future one. My guess, though, is that the size of the gap is of intuitive importance to anyone thinking about it (I’m no different) and most people would answer one way or the other depending on how big they perceive the technology gap to be.
  
  source
- Eatspancakes84@lemmy.world ⁨5⁩ ⁨months⁩ ago
  Another good question is why AIs do not mindlessly regurgitate source material. The reason is that they have access to so much copyrighted material. If they were trained on only one book, they would constantly regurgitate material from that one book. Because it’s trained on many (millions) books, it’s able to get creative. So the argument of OpenAI really boils down to: “we are not breaking copyright law, because we have used sufficient copyrighted material to avoid directly infringing on copyright”.
  
  source
  - ricecake@sh.itjust.works ⁨5⁩ ⁨months⁩ ago
    Eeeh, I still think diving into the weeds of the technical is the wrong way to approach it. Their argument is that training isn’t copyright violation, not that sufficient training dilutes the violation.
    
    Even if trained only on one source, it’s quite unlikely that it would generate copyright infringing output. It would be vastly less intelligible, likely to the point of overtly garbled words and sentences lacking much in the way of grammar.
    
    If what they’re doing is technically an infringement or how it works is entirely aside from a discussion on if it should be infringement or permitted.
    
    source
- petrol_sniff_king@lemmy.blahaj.zone ⁨5⁩ ⁨months⁩ ago
  
  Arguing why it’s bad for society for machines to mechanise the production of works inspired by others is more to the point.
  
  I agree, but the fact that shills for this technology are also wrong about it is at least interesting.
  
  Rhetorically speaking, I don’t know if that’s useless.
  
  Computers think the same way boats swim. Arguing about the difference between hands and propellers misses the point that you don’t want a shrimp boat in your swimming pool. I don’t care why they’re different, or that it technically did or didn’t violate the “free swim” policy, I care that it ruins the whole thing for the people it exists for in the first place.
  
  I do like this point a lot.
  
  If they can find a way to do and use the cool stuff without making things worse, they should focus on that.
  
  I do miss when the likes of cleverbot was just a fun novelty on the Internet.
  
  source
Dran_Arcana@lemmy.world ⁨5⁩ ⁨months⁩ ago
Devil’s Advocate:

How do we know that our brains don’t work the same way?

Why would it matter that we learn differently than a program learns?

Suppose someone has a photographic memory, should it be illegal for them to consume copyrighted works?

source
- EldritchFeminity@lemmy.blahaj.zone ⁨5⁩ ⁨months⁩ ago
  Because we’re talking pattern recognition levels of learning. At best, they’re the equivalent of parrots mimicking human speech. They take inputs and output data based on the statistical averages from their training sets - collaging pieces of their training into what they think is the right answer. And I use the word think here loosely, as this is the exact same process that the Gaussian blur tool in Photoshop uses.
  
  This matters in the context of the fact that these companies are trying to profit off of the output of these programs. If somebody with an eidetic memory is trying to sell pieces of works that they’ve consumed as their own - or even somebody copy-pasting bits from Clif Notes - then they should get in trouble; the same as these companies.
  
  Given A and B, we can understand C. But an LLM will only be able to give you AB, A(b), and B(a). And they’ve even been just spitting out A and B wholesale, proving that they retain their training data and will regurgitate the entirety of copyrighted material.
  
  source
Eatspancakes84@lemmy.world ⁨5⁩ ⁨months⁩ ago
I am also not really getting the argument. If I as a human want to learn a subject from a book I buy it ( or I go to a library who paid for it). If it’s similar to how humans learn, it should cost equally much.

The issue is of course that it’s not at all similar to how humans learn. It needs VASTLY more data to produce something even remotely sensible. Develop AI that’s truly transformative, by making it as efficient as humans are in learning, and the cost of paying for copyright will be negligible.

source
- petrol_sniff_king@lemmy.blahaj.zone ⁨5⁩ ⁨months⁩ ago
  
  If I as a human want to learn a subject from a book, I buy it
  
  xD
  That’s good.
  
  source
  - Deathcrow@lemmy.ml ⁨5⁩ ⁨months⁩ ago
    Dude never heard of a library. I only bought a handful of books during my degree, I would’ve been homeless if I had to buy a copy of every learning source
    
    source
    Eatspancakes84@lemmy.world ⁨5⁩ ⁨months⁩ ago
    That was literally in my post. Obviously, in that case the library pays for copyright
    
    source
    petrol_sniff_king@lemmy.blahaj.zone ⁨5⁩ ⁨months⁩ ago
    Your taxes pay for the library.
    
    source
- stephen01king@lemmy.zip ⁨5⁩ ⁨months⁩ ago
  
  If I as a human want to learn a subject from a book I buy it ( or I go to a library who paid for it). If it’s similar to how humans learn, it should cost equally much.
  
  You’re on Lemmy where people casually says “piracy is morally the right thing to do”, so I’m not sure this argument works on this platform.
  
  source
  - Eatspancakes84@lemmy.world ⁨5⁩ ⁨months⁩ ago
    I know my way around the Jolly Roger myself. At the same using copyrighted materials in a commercial setting (as OpenAI does) shouldn’t be free.
    
    source
    stephen01king@lemmy.zip ⁨5⁩ ⁨months⁩ ago
    Only if they are selling the output. I see it as more they are selling access to the service on a server farm, since running ChatGPT is not cheap.
    
    source
    -> View More Comments
- Blaster_M@lemmy.world ⁨5⁩ ⁨months⁩ ago
  Imagine if you had blinders and earmuffs on for most of the day, and only once in a while were you allowed to interact with certain people and things. Your ability to communicate would be truncated to only what you were allowed to absorb.
  
  source
interdimensionalmeme@lemmy.ml ⁨5⁩ ⁨months⁩ ago
The solution is any AI must always be released on a strong copyleft and possibly abolish copyright outright has it has only served the powerful by allowing them to enclose humanity common intellectual heritage (see Disney’s looting and enclosing if ancestral children stories). If you choose to strengthen the current regime, don’t expect things to improve for you as an irrelevant atomised individual,

source