Anthropic pirating books for their training corpus resulted in the biggest copyright settlement in history–well over a billion. That is still being quibbled over i believe, but they settled because they were likely to pay out more if the case went forward. So I’m not really sure where you’re coming from that infringement via torrenting does not result in monstrously large liability.
ryathal@sh.itjust.works 3 weeks ago
The judge in that case ruled the training wasn’t fair use for pirated books, which left them on the hook for potentially all revenue (likely a court determined percentage) that the model generated for them in addition to statutory damages. That is well north of 1.5 billion.
artifex@piefed.social 3 weeks ago
Which is kind of a pity. Anyone who’s ever written something on the net should be getting royalty checks from these fucks. I’m not exactly famous but I’ve written prolifically in my field of work and have gotten nearly word-for-word reproductions of my articles out of every big model I’ve tested since GPT-3.
FatCrab@slrpnk.net 3 weeks ago
Just noticed your reply and want to correct this. Anyhropic settled, the 1.5bil was not a judgment against them. Specifically, this covered the literal pirating of the training corpus. It had absolutely nothing to do with the way training on the data handled the training data–they literally torrented an enormous portion of their training corpus.
Anthropoc DID try to argue that because they used the pirated material for training a model, it was fair use. The judge correctly decided that doesn’t make any fucking sense. Again, this is not about the models encoding data, it is literally just about the fact that these silly fucks torrented vast portions of their training corpus like college students building a porn library on college broadband.