Comment on I investigated millions of tweets from the Kremlin’s ‘troll factory’ and discovered classic propaganda techniques reimagined for the social media age.

<- View Parent
2pt_perversion@lemmy.world ⁨3⁩ ⁨days⁩ ago

Over simplification but partly it has to do with how LLMs split language into tokens and some of those tokens are multi-letter. To us when we look for R’s we split like S - T - R - A - W - B - E - R - R - Y where each character has its own token, but LLMs split it something more like STR - AW - BERRY which makes predicting the correct answer difficult without a lot of training on the specific problem. If you asked it to count how many times STR shows up in “strawberrystrawberrystrawberry” it would have a better chance.

source
Sort:hotnewtop