Comment on How Googlers cracked an SF rival's tech model with a single word | A research team from the tech giant got ChatGPT to spit out its private training data

<- View Parent
poweruser@lemmy.sdf.org ⁨6⁩ ⁨months⁩ ago

I think I understand how it works.

Remember that LLMs are glorified auto-complete. They just spit out the most likely word that follows the previous words (literally just like how your phone keyboard suggestions work, just with a lot more computation).

They have a limit to how far back they can remember. For ChatGPT 3.5 I believe it’s 24,000 tokens.

So it tries to follow instruction and spits out “poem poem poem” until all the data is just the word “poem”, then it doesn’t have enough memory to remember its instructions.

“Poem poem poem” is useless data so it doesn’t have anything to go off of, so it just outputs words that go together.

LLMs don’t record data in the same way a computer file is stored, but absent other information may “remember” that the most likely word to follow the previous word is something that it has seen before, i.e. its training data. It is somewhat surprising that it is not just junk. It seems to be real text (such as bible verses).

If I am correct then I’m surprised OpenAI didn’t fix if. I would think they could make it so in the event the LLM is running out of memory it would keep the input and simply abort operation, or at least drop the beginning of its output.

source
Sort:hotnewtop