Comment

Comment on Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

LifeInMultipleChoice@lemmy.world ⁨11⁩ ⁨months⁩ ago

The language model isn’t teaching anything it is changing the wording of something and spitting it back out. And in some cases, not changing the wording at all, just spitting the information back out, without paying the copyright source. It is not alive, it has no thoughts. It has no “its own words.” (As seen by the judgement that it’s words cannot be copyrighted.) It only has other people’s words. Every word it spits out by definition is plagiarism, whether the work was copyrighted before or not.

People wonder why works, such as journalism are getting worse. Well how could they ever get better if anything a journalist writes can be absorbed in real time, reworded and regurgitated without paying any dos to the original source. One journalist article, displayed in 30 versions, dividing the original works worth up into 30 portions. The original work now being worth 1/30th its original value. Maybe one can argue it is twice as good, so 1/15th.

Long term it means all original creations… Are devalued and therefore not nearly worth pursuing. So we will only get shittier and shittier information. Every research project… Physics, Chemistry, Psychology, all technological advancements, slowly degraded as language models get better, and original sources deminish returns.

source

Sort:hotnew top

booly@sh.itjust.works ⁨11⁩ ⁨months⁩ ago

just spitting the information back out, without paying the copyright source

The court made its ruling under the factual assumption that it isn’t possible for a user to retrieve copyrighted text from that LLM, and explained that if a copyright holder does develop evidence that it is possible to get entire significant chunks of their copyrighted text out of that LLM, then they’d be able to sue then under those facts and that evidence.

It relies heavily on the analogy to Google Books, which scans in entire copyrighted books to build the database, but where users of the service simply cannot retrieve more than a few snippets from any given book. That way, Google cannot be said to be redistributing entire books to its users without the publisher’s permission.

source
VoterFrog@lemmy.world ⁨11⁩ ⁨months⁩ ago

The language model isn’t teaching anything it is changing the wording of something and spitting it back out. And in some cases, not changing the wording at all, just spitting the information back out, without paying the copyright source.

You could honestly say the same about most “teaching” that a student without a real comprehension of the subject does for another student. But ultimately, that’s beside the point. Because changing the wording, structure, and presentation is all that is necessary to avoid copyright violation. You cannot copyright the information. Only a specific expression of it.

There’s no special exception for AI here. That’s how copyright works for you, me, the student, and the AI. And if you’re hoping that copyright is going to save you from the outcomes you’re worried about, it won’t.

source