I’m not really following you but I think we might be on similar paths. I’m just shooting in absolute darkness so don’t hold much weight to my guess.
What makes transformers brilliant is the attention mechanism. That is brilliant in turn because it’s dynamic, depending on your query (also some other stuff). This allows the transformer to be able to distinguish between bat and bat, the animal and the stick.
You know what I bet they didn’t do in testing or training? A nonsensical query that contains thousands of one word, repeating.
So my guess is simply that this query took the model so far out of its training space that the model weights have no ability to control the output in a reasonable way.
As for why it would output training data and not random nonsense? That’s a weak point in my understanding and I can only say “luck,” which is, of course, a way of saying I have no clue.
sizzler@lemmy.world 11 months ago
Error code, the verbal version.