Comment on I'm gonna die on this hill or die trying

<- View Parent
Redjard@lemmy.dbzer0.com ⁨20⁩ ⁨hours⁩ ago

I’d expect tokenizers to include spaces in tokens. You get words constructed from multiple tokens, so can’t really insert spaces based on them. And too much information doesn’t work well when spaces are stripped.

In my tests plenty of llms are also capable of seeing and using double spaces when accessed with the right interface.

source
Sort:hotnewtop