The model ISN’T outputing the letters individually, binary models (as I mentioned) do not transformers.
The model output is more like Strawberry <S-T-R><A-W-B>
<S-T-R-A-W-B><E-R-R>
<S-T-R-A-W-B-E-R-R-Y>
Tokens can be a letter, part of a word, any single lexeme, any word, or even multiple words (“let be”)