Grab em by the intellectual property! When you’re a multi-billion dollar corporation, they just let you do it!
Judge dismisses authors' copyright lawsuit against Meta over AI training
Submitted 1 day ago by drmoose@lemmy.world to technology@lemmy.world
Comments
AmosBurton_ThatGuy@lemmy.ca 23 hours ago
ocassionallyaduck@lemmy.world 1 day ago
Terrible judgement.
Turn the K value down on the model and it reproduces text near verbatim.
drmoose@lemmy.world 1 day ago
Ah the Schrödinger’s LLM - always hallucinating and also always accurate
squaresinger@lemmy.world 22 hours ago
Accuracy and hallucination are two ends of a spectrum.
If you turn hallucinations to a minimum, the LLM will faithfully reproduce what’s in the training set, but the result will not fit the query very well.
The other option is to turn the so-called temperature up, which will result in replies fitting better to the query but also the hallucinations go up.
In the end it’s a balance between getting responses that are closer to the dataset (factual) or closer to the query (creative).
tabular@lemmy.world 1 day ago
“hallucination refers to the generation of plausible-sounding but factually incorrect or nonsensical information”
Is an output an hallucination when the training data involved in the output included factually incorrect data? Suppose my input is “is the would flat” and then an LLM, allegedly, accurately generates a flat-eather’s writings saying it is.
ocassionallyaduck@lemmy.world 15 hours ago
There is nothing intelligent about “AI” as we call it. It parrots based on probability. If you remove the randomness value from the model, it parrots the same thing every time based on it’s weights, and if the weights were trained on Harry Potter, it will consistently give you giant chunks of harry potter verbatim when prompted.
Most of the LLM services attempt to avoid this by adding arbitrary randomness values to churn the soup. But this is also inherently part of the cause of hallucinations, as the model cannot preserve a single correct response as always the right way to respond to a certain query.
LLMs are insanely “dumb”, they’re just lightspeed parrots. The fact that Meta and these other giant tech companies claim it’s not theft because they sprinkle in some randomness is just obscuring the reality and the fact that their models are derivative of the work of organizations like the BBC and Wikipedia, while also dependent on the works of tens of thousands of authors to develop their corpus of language.
In short, there was a ethical way to train these models. But that would have been slower. And the court just basically gave them a pass on theft. Facebook would have been entirely in the clear had it not stored the books in a dataset, which in itself is insane.
I wish I knew when I was younger that stealing is wrong, unless you steal at scale. Then it’s just clever business.
BlameTheAntifa@lemmy.world 22 hours ago
PattyMcB@lemmy.world 1 day ago
It sounds like the precedent has been set
pyre@lemmy.world 11 hours ago
🏴☠️🦜
drmoose@lemmy.world 1 day ago
This is the notorious lawsuit from a year ago:
deathmetal27@lemmy.world 1 day ago
Alsup? Is this the same judge who also presided over Oracle v. Google over the use of Java in Android?
As for the ruling, I’m not in favour of AI training on copyrighted material, but I can see where the judgement is coming from. I think it’s a matter of what’s really copyrightable: the actual text or the abstract knowledge in it. In other words, if you were to read a book and then write a summary of a section of it in your own words or orally described what you learned from the book to someone else, does that mean copyright infringement? Or if you watch a movie and then describe your favourite scenes to your friends?
Perhaps a case could be made that AI training on copyrighted materials is not the same as humans consuming the copyrighted material and therefore it should have a different provision in copyright law. I’m no lawyer, but I’d assume that current copyright law works on the basis that humans do not generally have perfect recall of the copyrighted material they consume. But then again a counter argument could be that neither does the AI due to its tendency to hallucinate sometimes. However, it still has superior recall compared to humans and perhaps could be the grounds for amending copyright law about AI training?
Petter1@lemm.ee 1 day ago
Acree 100%
Hope we can refactor this whole copyright/patent concept soon…
It is more a pain for artists, creators, releasers etc.
I see it with EDM, I work as a Label, and do sometimes produce a bit
Most artists will work with samples and presets etc. And keeping track of who worked on what and who owns how much percent of what etc. just takes the joy out of creating…
Same for game design: You have a vision for your game, make a poc, and then have to change the whole game because of stupid patent shit not allowing you e.g. not land on a horse and immediately ride it, or throwing stuff at things to catch them…
drmoose@lemmy.world 1 day ago
Your last paragraph would be ideal solution in ideal world but I don’t think ever like this could happen in the current political and economical structures.
First its super easy to hide all of this and enforcement would be very difficult even domestically. Second, because we’re in AI race no one would ever put themselves in such disadvantage unless its real damage not economical copyright juggling.
People need to come to terms with these facts so we can address real problems rather than blow against the wind with all this whining we see on Lemmy. There are actual things we can do.