Also just checked and every open ai model bigger than 4.1-mini can answer this. I think the joke should emphasize how we developed a super power inefficient way to solve some problems that can be accurately and efficiently answered with a single algorithm. Another example is using ChatGPT to do simple calculator math. LLMs are good at specific tasks and really bad at others, but people kinda throw everything at them.
Comment on AGI achieved đ¤
jsomae@lemmy.ml â¨3⊠â¨days⊠ago
People who think that LLMs having trouble with these questions is evidence one way or another about how good or bad LLMs are just donât understand tokenization. This is not a big-picture problem that indicates LLMs is deeply incapable. You may hate AI but that doesnât excuse being ignorant about how it works.
moseschrute@lemmy.world â¨3⊠â¨days⊠ago
__dev@lemmy.world â¨3⊠â¨days⊠ago
And yet they can seemingly spell and count (small numbers) just fine.
jsomae@lemmy.ml â¨3⊠â¨days⊠ago
what do you mean by spell fine? Theyâre just emitting the tokens for the words. Like, itâs not writing âstrawberry,â itâs writing tokens <302, 1618, 19772>, which correspond to st, raw, and berry respectively. If you ask it to put a space between each letter, that will disrupt the tokenization mechanism, and itâs going to be quite liable to making mistakes.
I donât think itâs really fair to say that the lookup 19772 -> berry counts as the LLM being able to spell, since the LLM isnât operating at that layer. It doesnât really emit letters directly. I would argue its inability to reliably spell words when you force it to go letter-by-letter or answer queries about how words are spelled is indicative of its poor ability to spell.
__dev@lemmy.world â¨3⊠â¨days⊠ago
what do you mean by spell fine?
I mean that when you ask them to spell a word they can list every character one at a time.
jsomae@lemmy.ml â¨2⊠â¨days⊠ago
Well thatâs a recent improvement. GPT3 was very bad at that, and GPT4 still makes mistakes.
buddascrayon@lemmy.world â¨3⊠â¨days⊠ago
The problem is that itâs not actually counting anything. Itâs simply looking for some text somewhere in its database that relates to that word and the number of Râs in that word. Thereâs no mechanism within the LLM to actually count things. It is not designed with that function. This is not general AI, this is a Generative Adversarial Network thatâs using its vast vast store of text to put words together that sound like they answer the question that was asked.
untorquer@lemmy.world â¨3⊠â¨days⊠ago
These sorts of artifacts wouldnât be a huge issue except that AI is being pushed to the general public as an alternative means of learning basic information. The meme example is obvious to someone with a strong understanding of English but learners and children might get an artifact and stamp it in their memory, working for years off bad information. Not a problem for a few false things every now and then, thatâs unavoidable in learning. Thousands accumulated over long term use, however, and your understanding of the world will be coarser, like the Swiss cheese with voids so large it canât hold itself up.
jsomae@lemmy.ml â¨3⊠â¨days⊠ago
Youâre talking about hallucinations. Thatâs different from tokenization reflection errors. Iâm specifically talking about its inability to know how many of a certain type of letter are in a word that it can spell correctly. This is not a hallucination per se â at least, itâs a completely different mechanism that causes it than whatever causes other factual errors. This specific problem is due to tokenization, and thatâs why I say it has little bearing on other shortcomings of LLMs.
untorquer@lemmy.world â¨3⊠â¨days⊠ago
No, Iâm talking about human learning and the danger imposed by treating an imperfect tool as a reliable source of information as these companies want people to do.
Whether the erratic information is from tokenization or hallucinations is irrelevant when this is already the main source for so many people in their learning, for example, a new language.
jsomae@lemmy.ml â¨2⊠â¨days⊠ago
Hallucinations arenât relevant to my point here. Iâm not defending that AIs are a good source of information, and I agree that hallucinations are dangerous (either that or misusing LLMs is dangerous). I also admit that for language learning, artifacts caused from tokenization could be very detrimental to the user.
The point I am making is that LLMs struggling with these kind of tokenization artifacts is poor evidence for assuming anything about their behaviour on other tasks.