Comment

Comment on In Cringe Video, OpenAI CTO Says She Doesn’t Know Where Sora’s Training Data Came From

stephen01king@lemmy.zip ⁨11⁩ ⁨months⁩ ago

How constantly does it spit out copyrighted material? Is there data on that?

Sort:hotnew top

buffaloseven@fedia.io ⁨11⁩ ⁨months⁩ ago
There's more and more research starting to happen on it, but I've seen anywhere from 20% to 60% of responses. Here's a recent study where they explicitly try to coerce LLMs to break copyright: https://www.patronus.ai/blog/introducing-copyright-catcher

I don't have the time to grab them right now, but in many of the lawsuits brought forward against companies developing LLMs, their openings contain some statistics gathered on how frequently they infringed by returning copyrighted material.

source