Comment on Judge dismisses authors' copyright lawsuit against Meta over AI training

<- View Parent
ocassionallyaduck@lemmy.world ⁨20⁩ ⁨hours⁩ ago

There is nothing intelligent about “AI” as we call it. It parrots based on probability. If you remove the randomness value from the model, it parrots the same thing every time based on it’s weights, and if the weights were trained on Harry Potter, it will consistently give you giant chunks of harry potter verbatim when prompted.

Most of the LLM services attempt to avoid this by adding arbitrary randomness values to churn the soup. But this is also inherently part of the cause of hallucinations, as the model cannot preserve a single correct response as always the right way to respond to a certain query.

LLMs are insanely “dumb”, they’re just lightspeed parrots. The fact that Meta and these other giant tech companies claim it’s not theft because they sprinkle in some randomness is just obscuring the reality and the fact that their models are derivative of the work of organizations like the BBC and Wikipedia, while also dependent on the works of tens of thousands of authors to develop their corpus of language.

In short, there was a ethical way to train these models. But that would have been slower. And the court just basically gave them a pass on theft. Facebook would have been entirely in the clear had it not stored the books in a dataset, which in itself is insane.

I wish I knew when I was younger that stealing is wrong, unless you steal at scale. Then it’s just clever business.

source
Sort:hotnewtop