Comment on In Cringe Video, OpenAI CTO Says She Doesn’t Know Where Sora’s Training Data Came From
stephen01king@lemmy.zip 8 months agoHow constantly does it spit out copyrighted material? Is there data on that?
Comment on In Cringe Video, OpenAI CTO Says She Doesn’t Know Where Sora’s Training Data Came From
stephen01king@lemmy.zip 8 months agoHow constantly does it spit out copyrighted material? Is there data on that?
buffaloseven@fedia.io 8 months ago
There's more and more research starting to happen on it, but I've seen anywhere from 20% to 60% of responses. Here's a recent study where they explicitly try to coerce LLMs to break copyright: https://www.patronus.ai/blog/introducing-copyright-catcher
I don't have the time to grab them right now, but in many of the lawsuits brought forward against companies developing LLMs, their openings contain some statistics gathered on how frequently they infringed by returning copyrighted material.