Comment on Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not
LoreleiSankTheShip@lemmy.ml 2 days agoAs long as they don’t use exactly the same words in the book, yeah, as I understand it.
Comment on Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not
LoreleiSankTheShip@lemmy.ml 2 days agoAs long as they don’t use exactly the same words in the book, yeah, as I understand it.
vane@lemmy.world 2 days ago
How they don’t use same words in the book ? That’s not how LLM works. They use exactly same words if the probabilities align. It’s proved by this study. arxiv.org/abs/2505.12546
SufferingSteve@feddit.nu 2 days ago
The “if” is working overtime in your statement
nednobbins@lemmy.zip 2 days ago
I’d say there are two issues with it.
FIrst, it’s a very new article with only 3 citations. The authors seem like serious researchers but the paper itself is still in the, “hot off the presses” stage and wouldn’t qualify as “proven” yet.
It also doesn’t exactly say that books are copies. It says that in some models, it’s possible to extract some portions of some texts. They cite “1984” and “Harry Potter” as two books that can be extracted almost entirely, under some circumstances. They also find that, in general, extraction rates are below 1%.
vane@lemmy.world 2 days ago
Yeah but it’s just a start to reverse the process and prove that there is no AI. We only started with generating text I bet people figure out how to reverse process by using some sort of Rosetta stone. It’s just probabilities after all.
nednobbins@lemmy.zip 2 days ago
That’s possible but it’s not what the authors found.
They spend a fair amount of the conclusion emphasizing how exploratory and ambiguous their findings are. The researchers themselves are very careful to point out that this is not a smoking gun.