Comment on How Googlers cracked an SF rival's tech model with a single word | A research team from the tech giant got ChatGPT to spit out its private training data

<- View Parent
eating3645@lemmy.world ⁨6⁩ ⁨months⁩ ago

I’m not really following you but I think we might be on similar paths. I’m just shooting in absolute darkness so don’t hold much weight to my guess.

What makes transformers brilliant is the attention mechanism. That is brilliant in turn because it’s dynamic, depending on your query (also some other stuff). This allows the transformer to be able to distinguish between bat and bat, the animal and the stick.

You know what I bet they didn’t do in testing or training? A nonsensical query that contains thousands of one word, repeating.

So my guess is simply that this query took the model so far out of its training space that the model weights have no ability to control the output in a reasonable way.

As for why it would output training data and not random nonsense? That’s a weak point in my understanding and I can only say “luck,” which is, of course, a way of saying I have no clue.

source
Sort:hotnewtop