Comment

Comment on In Cringe Video, OpenAI CTO Says She Doesn’t Know Where Sora’s Training Data Came From

Linkerbaan@lemmy.world ⁨11⁩ ⁨months⁩ ago

Actually neural networks verbatim reproduce this kind of content when you ask the right question such as “finish this book” and the creator doesn’t censor it out well.

It uses an encoded version of the source material to create “new” material.

source

Sort:hotnew top

HaywardT@lemmy.sdf.org ⁨11⁩ ⁨months⁩ ago
Sure, if that is what the network has been trained to do, just like a librarian will if that is how they have been trained.

source
- Linkerbaan@lemmy.world ⁨11⁩ ⁨months⁩ ago
  Actually it’s the opposite, you need to train a network not to reveal its training data.
  
  source
  - HaywardT@lemmy.sdf.org ⁨11⁩ ⁨months⁩ ago
    Interesting article. It seems to be about a bug, not a designed behavior. It also says it exposes random excerpts from books and other training data.
    
    source
    Linkerbaan@lemmy.world ⁨11⁩ ⁨months⁩ ago
    It’s not designed to do that because they don’t want to reveal the training data. But factually all neural networks are a combination of their training data encoded into neurons.
    
    When given the right prompt (or image generation question) they will exactly replicate it. Because that’s how they have been trained in the first place. Replicating their source images with as little neurons as possible, and tweaking them when it’s not correct.
    
    source
    -> View More Comments