Comment on AI and Copyright: Expanding Copyright Hurts Everyone—Here’s What to Do Instead.
artificialfish@programming.dev 1 day agoYou could always just do reverse search on the open dataset to see if it’s an exact copy (or over a threshold).
You MIGHT even be able to do that while masking the data using hashing.
tal@lemmy.today 1 day ago
True, but “exact copy” almost certainly isn’t going to be what gets produced – and you can have a derivative work that isn’t an exact copy of the original, just generate something that looks a lot like part of the original. Like, you’d want to have a pretty good chance of finding a derivative work.
artificialfish@programming.dev 1 day ago
Minhash might be able to produce a similarity metric without needing exactness and without revealing the training data.