This is because spaces typically are encoded by model tokenizers.
In many cases it would be redundant to show spaces, so tokenizers collapse them down to no spaces at all. Instead the model reads tokens as if the spaces never existed.
For example it might output: thequickbrownfoxjumpsoverthelazydog
Except it would actually be a list of numbers like: [1, 256, 6273, 7836, 1922, 2244, 3245, 256, 6734, 1176, 2]
Then the tokenizer decodes this and adds the spaces because they are assumed to be there. The tokenizer has no knowledge of your request, and the model output typically does not include spaces, hencr your output sentence will not have double spaces.
DarrinBrunner@lemmy.world 1 day ago
So… Why don’t I see double spaces after your periods? Test. For. Double. Spaces.
dual_sport_dork@lemmy.world 1 day ago
Web browsers collapse whitespace by default which means that sans any trickery or deliberately using nonbreaking spaces causes any amount of spaces between words to be reduced into one. Since apparently every single thing in the modern world is displayed via some kind of encapsulated little browser engine nowadays, the majority of double spaces left in the universe that are not already firmly nailed down into print now appear as singles. And thus the convention is almost totally lost.
Redjard@lemmy.dbzer0.com 1 day ago
This seems to match up with some quick tests I did just now, on the pseudonyminized chatbot interface of duckduckgo:
tests
Image Image
SGforce@lemmy.ca 14 hours ago
Tokenization can make it difficult for them.
Image
The word chunks often contain a space because it’s efficient. I would think an extra space would stand out. Writing it back should be easier, assuming there is a dedicated “space” token like other punctuation tokens, there must be.
Hard mode would be asking it how many spaces there are in your sentence. I don’t think they’d figure it out unless their own list of tokens and a description is trained into them specifically.
thesystemisdown@lemmy.world 1 day ago
Double spaces after periods can create “rivers.” This makes text more difficult to read for those with dyslexia. Whatever is used as a text editor is probably stripping them out for accessibility reasons. I suppose double spaces made sense with monospaced fonts.
apastyle.apa.org/…/typography#myth4
FishFace@lemmy.world 1 day ago
HTML rendering collapses whitespace; it has nothing to do with accessibility. I would like to see the research on double-spacing causing rivers, because I’ve only ever noticed them in justified text where I would expect the renderer to be inserting extra space after a full stop compared between words within sentence anyway.
I’ve seen a lot of dubious legibility claims when it comes to typography including:
and so on.
Karyoplasma@discuss.tchncs.de 1 day ago
You can force the double spaces. Like this.