An LLM is a poor computational paradigm for playing chess.
ChatGPT 'got absolutely wrecked' by Atari 2600 in beginner's chess match — OpenAI's newest model bamboozled by 1970s logic
Submitted 6 days ago by Lifecoach5000@lemmy.world to technology@lemmy.world
Comments
NeilBru@lemmy.world 5 days ago
surph_ninja@lemmy.world 5 days ago
This just in: a hammer makes a poor screwdriver.
WhyJiffie@sh.itjust.works 5 days ago
LLMs are more like a leaf blower though
Takapapatapaka@lemmy.world 5 days ago
Actually, a very specific model (chatgpt3.5-turbo-instruct) was pretty good at chess (around 1700 elo if i remember correctly).
NeilBru@lemmy.world 5 days ago
I’m impressed, if that’s true! In general, an LLM’s training cost vs. an LSTM, RNN, or some other more appropriate DNN algorithm suitable for the ruleset is laughably high.
Bleys@lemmy.world 5 days ago
The underlying neural network tech is the same as what the best chess AIs (AlphaZero, Leela) use. The problem is, as you said, that ChatGPT is designed specifically as an LLM so it’s been optimized strictly to write semi-coherent text first, and then any problem solving beyond that is ancillary. Which should say a lot about how inconsistent ChatGPT is at solving problems, given that it’s not actually optimized for any specific use cases.
NeilBru@lemmy.world 5 days ago
Yes, I agree wholeheartedly with your clarification.
My career path, as I stated in a different comment, In regards to neural networks is focused on generative DNNs for CAD applications and parametric 3D modeling. Before that, I began as a researcher in cancerous tissue classification and object detection in medical diagnostic imaging.
Thus, large language models are well out of my area of expertise in terms of the architecture of their models.
However, fundamentally it boils down to the fact that the specific large language models was designed to predict text and not necessarily solve problems/play games to “win”/“survive”.
I admit that I’m just parroting what you stated and maybe rehashing what I stated even before that, but I like repeating and refining in simple terms to explain to laymen and, dare I say, clients. It helps me make make I don’t come off as too pompous when talking about this subject; forgive my tedium.
sugar_in_your_tea@sh.itjust.works 5 days ago
Yeah, a lot of them hallucinate moves.
Objection@lemmy.ml 6 days ago
Tbf, the article should probably mention the fact that machine learning programs designed to play chess blow everything else out of the water.
bier@feddit.nl 5 days ago
Yeah its like judging how great a fish is at climbing a tree. But it does show that it’s not real intelligence or reasoning
Zenith@lemm.ee 5 days ago
I forgot which airline it is but one of the onboard games in the back of a headrest TV was a game called “Beginners Chess” which was notoriously difficult to beat so it was tested against other chess engines and it ranked in like the top five most powerful chess engines ever
andallthat@lemmy.world 5 days ago
Machine learning has existed for many years, now. The issue is with these funding-hungry new companies taking their LLMs, repackaging them as “AI” and attributing every ML win ever to “AI”.
Yes, ML programs designed and trained specifically to identify tumors in medical imaging have become good diagnostic tools. But if you read in news that “AI helps cure cancer”, it makes it sound like a bunch of researchers just spent a few minutes engineering the right prompt for Copilot.
That’s why, yes a specifically-designed and finely tuned ML program can now beat the best human chess player, but calling it “AI” and bundling it together with the latest Gemini or Claude iteration is intentionally misleading.
AlecSadler@sh.itjust.works 5 days ago
ChatGPT has been, hands down, the worst AI coding assistant I’ve ever used.
It regularly suggests code that doesn’t compile or isn’t even for the language.
It generally suggests AC of code that is just a copy of the lines I just wrote.
Sometimes it likes to suggest setting the same property like 5 times.
It is absolute garbage and I do not recommend it to anyone.
j4yt33@feddit.org 5 days ago
I find it really hit and miss. Easy, standard operations are fine but if you have an issue with code you wrote and ask it to fix it, you can forget it
AlecSadler@sh.itjust.works 5 days ago
I’ve found Claude 3.7 and 4.0 and sometimes Gemini variants still leagues better than ChatGPT/Copilot.
Still not perfect, but night and day difference.
I feel like ChatGPT didn’t focus on coding and instead focused on mainstream, but I am not an expert.
PixelatedSaturn@lemmy.world 5 days ago
I like tab coding, writing small blocks of code that it thinks I need. Its On point almost all the time. This speeds me up.
Blackmist@feddit.uk 5 days ago
It’s the ideal help for people who shouldn’t be employed as programmers to start with.
I had to explain hexadecimal to somebody the other day. It’s honestly depressing.
Etterra@discuss.online 5 days ago
That’s because it doesn’t know what it’s saying. It’s just blathering out each word as what it estimates to be the likely next word given past examples in its training data. It’s a statistics calculator. It’s marginally better than just smashing the auto fill on your cell repeatedly. It’s literally dumber than a parrot.
AnUnusualRelic@lemmy.world 5 days ago
Parrots are actually intelligent though.
nutsack@lemmy.dbzer0.com 5 days ago
my favorite thing is to constantly be implementing libraries that don’t exist
Blackmist@feddit.uk 5 days ago
You’re right. That library was removed in ToolName [PriorVersion]. Please try this instead.
*makes up entirely new fictitious library name*
jj4211@lemmy.world 5 days ago
Oh man, I feel this. A couple of times I’ve had to field questions about some REST API I support and they ask why they get errors when they supply a specific attribute. Now that attribute never existed, not in our code, not in our documentation, we never thought of it. So I say “Well, that attribute is invalid, I’m not sure where you saw to do that”. They get insistent that the code is generated by a very good LLM, so we must be missing something…
arc99@lemmy.world 5 days ago
It’s even worse when AI soaks up some project whose APIs are constantly changing. Try using AI to code against jetty for example and you’ll be weeping.
arc99@lemmy.world 5 days ago
All AIs are the same. They’re just scraping content from GitHub, stackoverflow etc with a bunch of guardrails slapped on to spew out sentences that conform to their training data but there is no intelligence. They’re super handy for basic code snippets but anyone using them anything remotely complex or nuanced will regret it.
NateNate60@lemmy.world 5 days ago
One of my mates generated an entire website using Gemini. It was a React web app that tracks inventory for trading card dealers. It actually did come out functional and well-polished. That being said, the AI really struggled with several aspects of the project that humans would not:
- It left database secrets in the code
- The design of the website meant that it was impossible to operate securely
- The quality of the code itself was hot garbage—unreadable and undocumented nonsense that somehow still worked
- It did not break the code into multiple files. It piled everything into a single file
AlecSadler@sh.itjust.works 5 days ago
I’ve used agents for implementing entire APIs and front-ends from the ground up with my own customizations and nuances.
I will say that, for my pedantic needs, it typically only gets about 80-90% of the way there so I still have to put fingers to code, but it definitely saves a boat load of time in those instances.
ILikeBoobies@lemmy.ca 5 days ago
I’ve had success with splitting a function into 2 and planning out an overview, though that’s more like talking to myself
I wouldn’t use it to generate stuff though
Mobiuthuselah@lemm.ee 5 days ago
I don’t use it for coding. I use it sparingly really, but want to learn to use it more efficiently. Are there any areas in which you think it excels? Are there others that you’d recommend instead?
uairhahs@lemmy.world 5 days ago
Use Gemini (2.5) or Claude (3.7 and up). OpenAI is a shitshow
nednobbins@lemm.ee 5 days ago
Sometimes it seems like most of these AI articles are written by AIs with bad prompts.
Human journalists would hopefully do a little research. A quick search would reveal that researches have been publishing about this for over a year so there’s no need to sensationalize it. Perhaps the human journalist could have spent a little time talking about why LLMs are bad at chess and how researchers are approaching the problem.
LLMs on the other hand, are very good at producing clickbait articles with low information content.
nova_ad_vitum@lemmy.ca 5 days ago
Gotham chess has a video of making chatgpt play chess against stockfish. Spoiler: chatgpt does not do well. It plays okay for a few moves but then the moment it gets in trouble it straight up cheats. Telling it to follow the rules of chess doesn’t help.
This sort of gets to the heart of LLM-based “AI”. That one example to me really shows that there’s no actual reasoning happening inside. It’s producing answers that statistically look like answers that might be given based on that input.
For some things it even works. But calling this intelligence is dubious at best.
Ultraviolet@lemmy.world 5 days ago
Because it doesn’t have any understanding of the rules of chess or even an internal model of the game state? it just has the text of chess games in its training data and can reproduce the notation, but nothing to prevent it from making illegal or nonsensical moves.
Noodle07@lemmy.world 5 days ago
Hallucinating 100% of the time 👌
JacksonLamb@lemmy.world 5 days ago
ChatGPT versus Deepseek is hilarious. They both cheat like crazy and then one side jedi mind tricks the winner into losing.
LovableSidekick@lemmy.world 5 days ago
In this case it’s not even bad prompts, it’s a problem domain ChatGPT was never built to be good at. Like saying modern medicine is clearly bullshit because a doctor sucked at basketball.
Halosheep@lemm.ee 5 days ago
I swear every single article critical of current LLMs is like, “The square got BLASTED by the triangle shape when it completely FAILED to go through the triangle shaped hole.”
drspod@lemmy.ml 5 days ago
It’s newsworthy when the sellers of squares are saying that nobody will ever need a triangle again, and the shape-sector of the stock market is hysterically pumping money into companies that make or use squares.
ipkpjersi@lemmy.ml 5 days ago
That’s just clickbait in general these days lol
lambalicious@lemmy.sdf.org 5 days ago
Well, the first and obvious thing to do to show that AI is bad is to show that AI is bad. If it provides that much of a low-hanging fruit for the demonstration… that just further emphasizes the point.
MonkderVierte@lemmy.zip 5 days ago
LLM are not built for logic.
floofloof@lemmy.ca 6 days ago
ChatGPT the word prediction machine? Why would anyone expect it to be good at chess?
anubis119@lemmy.world 6 days ago
A strange game. How about a nice game of Global Thermonuclear War?
cley_faye@lemmy.world 5 days ago
Ah, you used logic. That’s the issue. They don’t do that.
Furbag@lemmy.world 6 days ago
Can ChatGPT actually play chess now? Last I checked, it couldn’t remember more than 5 moves of history so it wouldn’t be able to see the true board state and would make illegal moves, take it’s own pieces, materialize pieces out of thin air, etc.
Lembot_0003@lemmy.zip 6 days ago
The Atari chess program can play chess better than the Boeing 747 too. And better than the North Pole. Amazing!
arc99@lemmy.world 5 days ago
Hardly surprising. Llms aren’t -thinking- they’re just shitting out the next token for any given input of tokens.
finitebanjo@lemmy.world 5 days ago
All these comments asking “why don’t they just have chatgpt go and look up the correct answer”.
That’s not how it works, you buffoons, it trains off of datasets long before it releases. It doesn’t think. It doesn’t learn after release, it won’t remember things you try to teach it.
Really lowering my faith in humanity when even the AI skeptics don’t understand that it generates statistical representations of an answer based on answers given in the past.
jsomae@lemmy.ml 5 days ago
Using an LLM as a chess engine is like using a power tool as a table leg. Pretty funny honestly, but it’s obviously not going to be good at it, at least not without scaffolding.
capuccino@lemmy.world 6 days ago
This made my day
vane@lemmy.world 6 days ago
It’s not that hard to beat dumb 6 year old who’s only purpose is mine your privacy to sell you ads or product place some shit for you in future.
stevedice@sh.itjust.works 4 days ago
2025 Mazda MX-5 Miata ‘got absolutely wrecked’ by Inflatable Boat in beginner’s boat racing match — Mazda’s newest model bamboozled by 1930s technology.
Sidhean@lemmy.blahaj.zone 5 days ago
Can i fistfight ChatGPT next? I bet I could kick its ass, too :p
seven_phone@lemmy.world 6 days ago
You say you produce good oranges but my machine for testing apples gave your oranges a very low score.
Nurse_Robot@lemmy.world 6 days ago
I’m often impressed at how good chatGPT is at generating text, but I’ll admit it’s hilariously terrible at chess. It loves to manifest pieces out of thin air, or make absurd illegal moves, like jumping its king halfway across the board and claiming checkmate
FourWaveforms@lemm.ee 4 days ago
If you don’t play chess, the Atari is probably going to beat you as well.
LLMs are only good at things to the extent that they have been well-trained in the relevant areas. Not just learning to predict text string sequences, but reinforcement learning after that, where a human or some other agent says “this answer is better than that one” enough times in enough of the right contexts. It mimics the way humans learn, which is through repeated and diverse exposure.
If they set up a system to train it against some chess program, or (much simpler) simply gave it a tool call, it would do much better. Tool calling already exists and would be by far the easiest way.
It could also be instructed to write a chess solver program and then run it, at which point it would be on par with the Atari, but it wouldn’t compete well with a serious chess solver.
Kolanaki@pawb.social 6 days ago
There was a chess game for the Atari 2600? :O
I wanna see them W I D E pieces.
muntedcrocodile@lemm.ee 6 days ago
This isn’t the strength of gpt-o4 the model has been optimised for tool use as an agent. That’s why its so good at image gen relative to their models it uses tools to construct an image piece by piece similar to a human. Also probably poor system prompting. A LLM is not a universal thinking machine its a a universal process machine. An LLM understands the process and uses tools to accomplish the process hence its strengths in writing code (especially as an agent).
Its similar to how a monkey is infinitely better at remembering a sequence of numbers than a human ever could but is totally incapable of even comprehending writing down numbers.
NotMyOldRedditName@lemmy.world 6 days ago
Okay, but could ChatGPT be used to vibe code a chess program that beats the Atari 2600?
IsaamoonKHGDT_6143@lemmy.zip 6 days ago
They used ChatGPT 4o, instead of using o1 or o3.
Obviously it was going to fail.
untakenusername@sh.itjust.works 4 days ago
this is because an LLM is not made for playing chess
krigo666@lemmy.world 6 days ago
Next, pit ChatGPT against 1K Chess in a ZX81.
ICastFist@programming.dev 5 days ago
So, it fares as well as the average schmuck, proving it is human
/s
FMT99@lemmy.world 6 days ago
Did the author thinks ChatGPT is in fact an AGI? It’s a chatbot. Why would it be good at chess? It’s like saying an Atari 2600 running a dedicated chess program can beat Google Maps at chess.
spankmonkey@lemmy.world 6 days ago
AI including ChatGPT is being marketed as super awesome at everything, which is why that and similar AI is being forced into absolutely everything and being sold as a replacement for people.
Something marketed as AGI should be treated as AGI when proving it isn’t AGI.
pelespirit@sh.itjust.works 6 days ago
Not to help the AI companies, but why don’t they program them to look up math programs and outsource chess to other programs when they’re asked for that stuff? It’s obvious they’re shit at it, why do they answer anyway? It’s because they’re programmed by know-it-all programmers, isn’t it.
PixelatedSaturn@lemmy.world 6 days ago
I don’t think ai is being marketed as awesome at everything. It’s got obvious flaws. Right now its not good for stuff like chess, probably not even tic tac toe. It’s a language model, its hard for it to calculate the playing field. But ai is in development, it might not need much to start playing chess.
malwieder@feddit.org 5 days ago
Google Maps doesn’t pretend to be good at chess. ChatGPT does.
whaleross@lemmy.world 5 days ago
A toddler can pretend to be good at chess but anybody with reasonable expectations knows that they are not.
suburban_hillbilly@lemmy.ml 6 days ago
Most people do. It’s just called AI in the media everywhere and marketing works. I think online folks forget that something as simple as getting a Lemmy account by yourself puts you into the top quintile of tech literacy.
Opinionhaver@feddit.uk 5 days ago
Yet even on Lemmy people can’t seem to make sense of these terms and are saying things like “LLM’s are not AI”
iAvicenna@lemmy.world 5 days ago
well so much hype has been generated around chatgpt being close to AGI that now it makes sense to ask questions like “can chatgpt prove the Riemann hypothesis”
red@sopuli.xyz 5 days ago
Even the models that pretend to be AGI are not. It’s been proven.
Broken@lemmy.ml 6 days ago
I agree with your general statement, but in theory since all ChatGPT does is regurgitate information back and a lot of chess is memorization of historical games and types, it might actually perform well. No, it can’t think, but it can remember everything so at some point that might tip the results in it’s favor.
Eagle0110@lemmy.world 6 days ago
Regurgitating am impression of, not regurgitating verbatim, that’s the problem here.
Chess is 100% deterministic, so it falls flat.
FMT99@lemmy.world 5 days ago
I mean it may be possible but the complexity would be so many orders of magnitude greater. It’d be like learning chess by just memorizing all the moves great players made but without any context or understanding of the underlying strategy.
adhdplantdev@lemm.ee 6 days ago
Articles like this are good because it exposes the flaws with the ai and that it can’t be trusted with complex multi step tasks.
Helps people see that think AI is close to a human that its not and its missing critical functionality
FMT99@lemmy.world 5 days ago
The problem is though that this perpetuates the idea that ChatGPT is actually an AI.
TowardsTheFuture@lemmy.zip 6 days ago
I think that’s generally the point is most people thing chat GPT is this sentient thing that knows everything and… no.
PixelatedSaturn@lemmy.world 5 days ago
Do they though? No one I talked to, not my coworkers that use it for work, not my friends, not my 72 year old mother think they are sentient.
merdaverse@lemm.ee 5 days ago
OpenAI has been talking about AGI for years, implying that they are getting closer to it with their products.
openai.com/index/planning-for-agi-and-beyond/
openai.com/…/elon-musk-wanted-an-openai-for-profi…
FMT99@lemmy.world 5 days ago
Hey I didn’t say anywhere that corporations don’t lie to promote their product did I?
Empricorn@feddit.nl 5 days ago
You’re not wrong, but keep in mind ChatGPT advocates, including the company itself are referring to it as AI, including in marketing. They’re saying it’s a complete, self-learning, constantly-evolving Artificial Intelligence that has been improving itself since release… And it loses to a 4KB video game program from 1979 that can only “think” 2 moves ahead.
FMT99@lemmy.world 5 days ago
That’s totally fair, the company is obviously lying, excuse me “marketing”, to promote their product, that’s absolutely true.
x00z@lemmy.world 6 days ago
In all fairness. Machine learning in chess engines is actually pretty strong.
www.chess.com/terms/alphazero-chess-engine
jeeva@lemmy.world 5 days ago
Sure, but machine learning like that is very different to how LLMs are trained and their output.
FMT99@lemmy.world 5 days ago
Oh absolutely you can apply machine learning to game strategy. But you can’t expect a generalized chatbot to do well at strategic decision making for a specific game.
saltesc@lemmy.world 6 days ago
I like referring to LLMs as VI (Virtual Intelligence from Mass Effect) since they merely give the impression of intelligence but are little more than search engines. In the end all one is doing is displaying expected results based on a popularity algorithm. However they do this inconsistently due to bad data in and limited caching.
FartMaster69@lemmy.dbzer0.com 6 days ago
I mean, open AI seem to forget it isn’t.