Comment on Nvidia accused of trying to cut a deal with Anna’s Archive for high‑speed access to the massive pirated book haul — allegedly chased stolen data to fuel its LLMs

PierceTheBubble@lemmy.ml ⁨13⁩ ⁨hours⁩ ago

So the amend alleges, Nvidia having used/stored/copied/obtained/distributed copyrighted works (including plaintiffs’), both through databases available on HugginFace (‘Books3’ featured in both ‘The Pile’ and ‘SlimPajama’), or pirating from shadow libraries (like Anna’s Archive), to train multiple LLMs (primarily their ‘NeMo Megatron’ series), and distributing the copyrighted data through the ‘NeMo Megatron Framework’; data which was ultimately sourced from shadow libraries.

It’s quite an interesting read actually, especially the link to this Anna’s Archive blog post. Which it grossly pulls out of context, as plaintiffs clearly despise the shadow libraries too: as they have ultimately provided access to their copyrighted material.

Especially the part: “Most (but not all!) US-based companies reconsidered once they realized the illegal nature of our work. By contrast, Chinese firms have enthusiastically embraced our collection, apparently untroubled by its legality.” makes me wonder if that’s the reason why models like Deepseek blew Western models out of the water.

source
Sort:hotnewtop