Comment on AI and Copyright: Expanding Copyright Hurts Everyone—Here’s What to Do Instead.

tal@lemmy.today ⁨2⁩ ⁨days⁩ ago

So, I agree with the EFF that we should not introduce some kind of new legal right to prohibit training on something just because it’s copyrighted. There’s nothing that keeps a human from training themselves on content, so neither should an AI be prohibited.

However.

It is possible for a human to make a work that will infringe existing copyright rights, by producing a derivative work. Not every work inspired by something else will meet the legal bar for being derived, but some can. And just as a human can do that, so too can AIs.

I have no problem with, say, an AI being able to emulate a style. But it’s possible for AIs today to produce works that do meet the bar for being derivative works. As things stand, I believe that that’d make the user of the AI liable. And yet, there’s not really a very good way for them to avoid that. That’s a legit point of complaint, I think, because it leads to people making derivative works.

The existing generative AI systems don’t have a very good way of trying to hint to a user of the model whether a work is derivative.

However, I’d think that what we could do is operate something like a federal registry of images. For published, copyrighted works, we already have mandatory deposit with the Library of Congress.

If something akin to Tineye were funded by the government, it would be possible to maintain an archive of registered, copyrighted work. It would then be practical for someone who had just generated an image to check whether there was a pre-existing image.

I don’t know whether Tineye works like this, but for it to work, we’d probably have to have a way to recognize an image under a bunch of transformations: scale, rotation, color, etc. I don’t know what Tineye does today, but I’d assume some kind of feature recognition – maybe does line-detection, vectorizes it, breaks an image up into a bunch of chuns, performs some operation to canonicalize the rotation based on the content of the chunk, and then performs some kind of fuzzy hash on the lines.

Then one could place an expectation that if one is to distribute an LLM-generated work, it be fed into such a system, and if not so verified and distributed and the work is derivative of a registered work, the presumption being that the infringement was intentional (which IIRC entitles a rights holder to treble damages under US law). We don’t have a mathematical model today to determine whether one work is “derivative” of another, but we could make one or at least give an approximation and warning.

I think that that’s practical for most cases for for holders of copyrighted images and LLM users. It permits people to use LLMs to generate images for non-distributed use. It doesn’t create a legal minefield for an LLM user. It places no restrictions on model creators. It’s doable using something like existing technology. It permits a viewer of a generated image to verify that the image is not derivative.

source
Sort:hotnewtop