Comment on How Googlers cracked an SF rival's tech model with a single word | A research team from the tech giant got ChatGPT to spit out its private training data

luthis@lemmy.nz ⁨11⁩ ⁨months⁩ ago

I really want to know how this works. It’s not like the training data is sitting there in nicely formatted plain text waiting to be spat out, it’s all tangled in the neurons. I can’t even begin to conceptualise what is going on here.

Maybe… maybe with each iteration of the word, it loses it’s own weighting, until there is nothing left but the raw neurons which start to re-enforce themselves until they reach more coherence. Once there is a single piece like ‘phone’ that by chance becomes the dominant weighted piece of the output, the ‘related’ parts are in turn enforced because they are actually tied to that ‘phone’ neuron.

Anyone else got any ideas?

source
Sort:hotnewtop