Tbf, the article should probably mention the fact that machine learning programs designed to play chess blow everything else out of the water.
ChatGPT 'got absolutely wrecked' by Atari 2600 in beginner's chess match — OpenAI's newest model bamboozled by 1970s logic
Submitted 2 weeks ago by Lifecoach5000@lemmy.world to technology@lemmy.world
Comments
Objection@lemmy.ml 2 weeks ago
bier@feddit.nl 2 weeks ago
Yeah its like judging how great a fish is at climbing a tree. But it does show that it’s not real intelligence or reasoning
Zenith@lemm.ee 2 weeks ago
I forgot which airline it is but one of the onboard games in the back of a headrest TV was a game called “Beginners Chess” which was notoriously difficult to beat so it was tested against other chess engines and it ranked in like the top five most powerful chess engines ever
andallthat@lemmy.world 2 weeks ago
Machine learning has existed for many years, now. The issue is with these funding-hungry new companies taking their LLMs, repackaging them as “AI” and attributing every ML win ever to “AI”.
Yes, ML programs designed and trained specifically to identify tumors in medical imaging have become good diagnostic tools. But if you read in news that “AI helps cure cancer”, it makes it sound like a bunch of researchers just spent a few minutes engineering the right prompt for Copilot.
That’s why, yes a specifically-designed and finely tuned ML program can now beat the best human chess player, but calling it “AI” and bundling it together with the latest Gemini or Claude iteration is intentionally misleading.
NeilBru@lemmy.world 2 weeks ago
An LLM is a poor computational paradigm for playing chess.
surph_ninja@lemmy.world 2 weeks ago
This just in: a hammer makes a poor screwdriver.
WhyJiffie@sh.itjust.works 2 weeks ago
LLMs are more like a leaf blower though
Takapapatapaka@lemmy.world 2 weeks ago
Actually, a very specific model (chatgpt3.5-turbo-instruct) was pretty good at chess (around 1700 elo if i remember correctly).
NeilBru@lemmy.world 2 weeks ago
I’m impressed, if that’s true! In general, an LLM’s training cost vs. an LSTM, RNN, or some other more appropriate DNN algorithm suitable for the ruleset is laughably high.
Bleys@lemmy.world 2 weeks ago
The underlying neural network tech is the same as what the best chess AIs (AlphaZero, Leela) use. The problem is, as you said, that ChatGPT is designed specifically as an LLM so it’s been optimized strictly to write semi-coherent text first, and then any problem solving beyond that is ancillary. Which should say a lot about how inconsistent ChatGPT is at solving problems, given that it’s not actually optimized for any specific use cases.
NeilBru@lemmy.world 2 weeks ago
Yes, I agree wholeheartedly with your clarification.
My career path, as I stated in a different comment, In regards to neural networks is focused on generative DNNs for CAD applications and parametric 3D modeling. Before that, I began as a researcher in cancerous tissue classification and object detection in medical diagnostic imaging.
Thus, large language models are well out of my area of expertise in terms of the architecture of their models.
However, fundamentally it boils down to the fact that the specific large language models was designed to predict text and not necessarily solve problems/play games to “win”/“survive”.
I admit that I’m just parroting what you stated and maybe rehashing what I stated even before that, but I like repeating and refining in simple terms to explain to laymen and, dare I say, clients. It helps me make make I don’t come off as too pompous when talking about this subject; forgive my tedium.
sugar_in_your_tea@sh.itjust.works 2 weeks ago
Yeah, a lot of them hallucinate moves.
AlecSadler@sh.itjust.works 2 weeks ago
ChatGPT has been, hands down, the worst AI coding assistant I’ve ever used.
It regularly suggests code that doesn’t compile or isn’t even for the language.
It generally suggests AC of code that is just a copy of the lines I just wrote.
Sometimes it likes to suggest setting the same property like 5 times.
It is absolute garbage and I do not recommend it to anyone.
j4yt33@feddit.org 2 weeks ago
I find it really hit and miss. Easy, standard operations are fine but if you have an issue with code you wrote and ask it to fix it, you can forget it
AlecSadler@sh.itjust.works 2 weeks ago
I’ve found Claude 3.7 and 4.0 and sometimes Gemini variants still leagues better than ChatGPT/Copilot.
Still not perfect, but night and day difference.
I feel like ChatGPT didn’t focus on coding and instead focused on mainstream, but I am not an expert.
PixelatedSaturn@lemmy.world 2 weeks ago
I like tab coding, writing small blocks of code that it thinks I need. Its On point almost all the time. This speeds me up.
Blackmist@feddit.uk 2 weeks ago
It’s the ideal help for people who shouldn’t be employed as programmers to start with.
I had to explain hexadecimal to somebody the other day. It’s honestly depressing.
Etterra@discuss.online 2 weeks ago
That’s because it doesn’t know what it’s saying. It’s just blathering out each word as what it estimates to be the likely next word given past examples in its training data. It’s a statistics calculator. It’s marginally better than just smashing the auto fill on your cell repeatedly. It’s literally dumber than a parrot.
AnUnusualRelic@lemmy.world 2 weeks ago
Parrots are actually intelligent though.
nutsack@lemmy.dbzer0.com 2 weeks ago
my favorite thing is to constantly be implementing libraries that don’t exist
Blackmist@feddit.uk 2 weeks ago
You’re right. That library was removed in ToolName [PriorVersion]. Please try this instead.
*makes up entirely new fictitious library name*
jj4211@lemmy.world 2 weeks ago
Oh man, I feel this. A couple of times I’ve had to field questions about some REST API I support and they ask why they get errors when they supply a specific attribute. Now that attribute never existed, not in our code, not in our documentation, we never thought of it. So I say “Well, that attribute is invalid, I’m not sure where you saw to do that”. They get insistent that the code is generated by a very good LLM, so we must be missing something…
arc99@lemmy.world 2 weeks ago
It’s even worse when AI soaks up some project whose APIs are constantly changing. Try using AI to code against jetty for example and you’ll be weeping.
arc99@lemmy.world 2 weeks ago
All AIs are the same. They’re just scraping content from GitHub, stackoverflow etc with a bunch of guardrails slapped on to spew out sentences that conform to their training data but there is no intelligence. They’re super handy for basic code snippets but anyone using them anything remotely complex or nuanced will regret it.
NateNate60@lemmy.world 2 weeks ago
One of my mates generated an entire website using Gemini. It was a React web app that tracks inventory for trading card dealers. It actually did come out functional and well-polished. That being said, the AI really struggled with several aspects of the project that humans would not:
- It left database secrets in the code
- The design of the website meant that it was impossible to operate securely
- The quality of the code itself was hot garbage—unreadable and undocumented nonsense that somehow still worked
- It did not break the code into multiple files. It piled everything into a single file
AlecSadler@sh.itjust.works 2 weeks ago
I’ve used agents for implementing entire APIs and front-ends from the ground up with my own customizations and nuances.
I will say that, for my pedantic needs, it typically only gets about 80-90% of the way there so I still have to put fingers to code, but it definitely saves a boat load of time in those instances.
ILikeBoobies@lemmy.ca 2 weeks ago
I’ve had success with splitting a function into 2 and planning out an overview, though that’s more like talking to myself
I wouldn’t use it to generate stuff though
Mobiuthuselah@lemm.ee 2 weeks ago
I don’t use it for coding. I use it sparingly really, but want to learn to use it more efficiently. Are there any areas in which you think it excels? Are there others that you’d recommend instead?
uairhahs@lemmy.world 2 weeks ago
Use Gemini (2.5) or Claude (3.7 and up). OpenAI is a shitshow
nednobbins@lemm.ee 2 weeks ago
Sometimes it seems like most of these AI articles are written by AIs with bad prompts.
Human journalists would hopefully do a little research. A quick search would reveal that researches have been publishing about this for over a year so there’s no need to sensationalize it. Perhaps the human journalist could have spent a little time talking about why LLMs are bad at chess and how researchers are approaching the problem.
LLMs on the other hand, are very good at producing clickbait articles with low information content.
nova_ad_vitum@lemmy.ca 2 weeks ago
Gotham chess has a video of making chatgpt play chess against stockfish. Spoiler: chatgpt does not do well. It plays okay for a few moves but then the moment it gets in trouble it straight up cheats. Telling it to follow the rules of chess doesn’t help.
This sort of gets to the heart of LLM-based “AI”. That one example to me really shows that there’s no actual reasoning happening inside. It’s producing answers that statistically look like answers that might be given based on that input.
For some things it even works. But calling this intelligence is dubious at best.
Ultraviolet@lemmy.world 2 weeks ago
Because it doesn’t have any understanding of the rules of chess or even an internal model of the game state? it just has the text of chess games in its training data and can reproduce the notation, but nothing to prevent it from making illegal or nonsensical moves.
Noodle07@lemmy.world 2 weeks ago
Hallucinating 100% of the time 👌
JacksonLamb@lemmy.world 2 weeks ago
ChatGPT versus Deepseek is hilarious. They both cheat like crazy and then one side jedi mind tricks the winner into losing.
LovableSidekick@lemmy.world 2 weeks ago
In this case it’s not even bad prompts, it’s a problem domain ChatGPT was never built to be good at. Like saying modern medicine is clearly bullshit because a doctor sucked at basketball.
Halosheep@lemm.ee 2 weeks ago
I swear every single article critical of current LLMs is like, “The square got BLASTED by the triangle shape when it completely FAILED to go through the triangle shaped hole.”
drspod@lemmy.ml 2 weeks ago
It’s newsworthy when the sellers of squares are saying that nobody will ever need a triangle again, and the shape-sector of the stock market is hysterically pumping money into companies that make or use squares.
ipkpjersi@lemmy.ml 2 weeks ago
That’s just clickbait in general these days lol
lambalicious@lemmy.sdf.org 2 weeks ago
Well, the first and obvious thing to do to show that AI is bad is to show that AI is bad. If it provides that much of a low-hanging fruit for the demonstration… that just further emphasizes the point.
floofloof@lemmy.ca 2 weeks ago
ChatGPT the word prediction machine? Why would anyone expect it to be good at chess?
MonkderVierte@lemmy.zip 2 weeks ago
LLM are not built for logic.
anubis119@lemmy.world 2 weeks ago
A strange game. How about a nice game of Global Thermonuclear War?
Lembot_0003@lemmy.zip 2 weeks ago
The Atari chess program can play chess better than the Boeing 747 too. And better than the North Pole. Amazing!
Furbag@lemmy.world 2 weeks ago
Can ChatGPT actually play chess now? Last I checked, it couldn’t remember more than 5 moves of history so it wouldn’t be able to see the true board state and would make illegal moves, take it’s own pieces, materialize pieces out of thin air, etc.
cley_faye@lemmy.world 2 weeks ago
Ah, you used logic. That’s the issue. They don’t do that.
arc99@lemmy.world 2 weeks ago
Hardly surprising. Llms aren’t -thinking- they’re just shitting out the next token for any given input of tokens.
capuccino@lemmy.world 2 weeks ago
This made my day
vane@lemmy.world 2 weeks ago
It’s not that hard to beat dumb 6 year old who’s only purpose is mine your privacy to sell you ads or product place some shit for you in future.
finitebanjo@lemmy.world 2 weeks ago
All these comments asking “why don’t they just have chatgpt go and look up the correct answer”.
That’s not how it works, you buffoons, it trains off of datasets long before it releases. It doesn’t think. It doesn’t learn after release, it won’t remember things you try to teach it.
Really lowering my faith in humanity when even the AI skeptics don’t understand that it generates statistical representations of an answer based on answers given in the past.
jsomae@lemmy.ml 2 weeks ago
Using an LLM as a chess engine is like using a power tool as a table leg. Pretty funny honestly, but it’s obviously not going to be good at it, at least not without scaffolding.
seven_phone@lemmy.world 2 weeks ago
You say you produce good oranges but my machine for testing apples gave your oranges a very low score.
Nurse_Robot@lemmy.world 2 weeks ago
I’m often impressed at how good chatGPT is at generating text, but I’ll admit it’s hilariously terrible at chess. It loves to manifest pieces out of thin air, or make absurd illegal moves, like jumping its king halfway across the board and claiming checkmate
Sidhean@lemmy.blahaj.zone 2 weeks ago
Can i fistfight ChatGPT next? I bet I could kick its ass, too :p
stevedice@sh.itjust.works 2 weeks ago
2025 Mazda MX-5 Miata ‘got absolutely wrecked’ by Inflatable Boat in beginner’s boat racing match — Mazda’s newest model bamboozled by 1930s technology.
Kolanaki@pawb.social 2 weeks ago
There was a chess game for the Atari 2600? :O
I wanna see them W I D E pieces.
muntedcrocodile@lemm.ee 2 weeks ago
This isn’t the strength of gpt-o4 the model has been optimised for tool use as an agent. That’s why its so good at image gen relative to their models it uses tools to construct an image piece by piece similar to a human. Also probably poor system prompting. A LLM is not a universal thinking machine its a a universal process machine. An LLM understands the process and uses tools to accomplish the process hence its strengths in writing code (especially as an agent).
Its similar to how a monkey is infinitely better at remembering a sequence of numbers than a human ever could but is totally incapable of even comprehending writing down numbers.
NotMyOldRedditName@lemmy.world 2 weeks ago
Okay, but could ChatGPT be used to vibe code a chess program that beats the Atari 2600?
FourWaveforms@lemm.ee 2 weeks ago
If you don’t play chess, the Atari is probably going to beat you as well.
LLMs are only good at things to the extent that they have been well-trained in the relevant areas. Not just learning to predict text string sequences, but reinforcement learning after that, where a human or some other agent says “this answer is better than that one” enough times in enough of the right contexts. It mimics the way humans learn, which is through repeated and diverse exposure.
If they set up a system to train it against some chess program, or (much simpler) simply gave it a tool call, it would do much better. Tool calling already exists and would be by far the easiest way.
It could also be instructed to write a chess solver program and then run it, at which point it would be on par with the Atari, but it wouldn’t compete well with a serious chess solver.
IsaamoonKHGDT_6143@lemmy.zip 2 weeks ago
They used ChatGPT 4o, instead of using o1 or o3.
Obviously it was going to fail.
krigo666@lemmy.world 2 weeks ago
Next, pit ChatGPT against 1K Chess in a ZX81.
ICastFist@programming.dev 2 weeks ago
So, it fares as well as the average schmuck, proving it is human
/s
Harbinger01173430@lemmy.world 2 weeks ago
Llms useless confirmed once again
FMT99@lemmy.world 2 weeks ago
Did the author thinks ChatGPT is in fact an AGI? It’s a chatbot. Why would it be good at chess? It’s like saying an Atari 2600 running a dedicated chess program can beat Google Maps at chess.
spankmonkey@lemmy.world 2 weeks ago
AI including ChatGPT is being marketed as super awesome at everything, which is why that and similar AI is being forced into absolutely everything and being sold as a replacement for people.
Something marketed as AGI should be treated as AGI when proving it isn’t AGI.
pelespirit@sh.itjust.works 2 weeks ago
Not to help the AI companies, but why don’t they program them to look up math programs and outsource chess to other programs when they’re asked for that stuff? It’s obvious they’re shit at it, why do they answer anyway? It’s because they’re programmed by know-it-all programmers, isn’t it.
PixelatedSaturn@lemmy.world 2 weeks ago
I don’t think ai is being marketed as awesome at everything. It’s got obvious flaws. Right now its not good for stuff like chess, probably not even tic tac toe. It’s a language model, its hard for it to calculate the playing field. But ai is in development, it might not need much to start playing chess.
malwieder@feddit.org 2 weeks ago
Google Maps doesn’t pretend to be good at chess. ChatGPT does.
whaleross@lemmy.world 2 weeks ago
A toddler can pretend to be good at chess but anybody with reasonable expectations knows that they are not.
suburban_hillbilly@lemmy.ml 2 weeks ago
Most people do. It’s just called AI in the media everywhere and marketing works. I think online folks forget that something as simple as getting a Lemmy account by yourself puts you into the top quintile of tech literacy.
Opinionhaver@feddit.uk 2 weeks ago
Yet even on Lemmy people can’t seem to make sense of these terms and are saying things like “LLM’s are not AI”
iAvicenna@lemmy.world 2 weeks ago
well so much hype has been generated around chatgpt being close to AGI that now it makes sense to ask questions like “can chatgpt prove the Riemann hypothesis”
red@sopuli.xyz 2 weeks ago
Even the models that pretend to be AGI are not. It’s been proven.
Broken@lemmy.ml 2 weeks ago
I agree with your general statement, but in theory since all ChatGPT does is regurgitate information back and a lot of chess is memorization of historical games and types, it might actually perform well. No, it can’t think, but it can remember everything so at some point that might tip the results in it’s favor.
Eagle0110@lemmy.world 2 weeks ago
Regurgitating am impression of, not regurgitating verbatim, that’s the problem here.
Chess is 100% deterministic, so it falls flat.
FMT99@lemmy.world 2 weeks ago
I mean it may be possible but the complexity would be so many orders of magnitude greater. It’d be like learning chess by just memorizing all the moves great players made but without any context or understanding of the underlying strategy.
adhdplantdev@lemm.ee 2 weeks ago
Articles like this are good because it exposes the flaws with the ai and that it can’t be trusted with complex multi step tasks.
Helps people see that think AI is close to a human that its not and its missing critical functionality
FMT99@lemmy.world 2 weeks ago
The problem is though that this perpetuates the idea that ChatGPT is actually an AI.
TowardsTheFuture@lemmy.zip 2 weeks ago
I think that’s generally the point is most people thing chat GPT is this sentient thing that knows everything and… no.
PixelatedSaturn@lemmy.world 2 weeks ago
Do they though? No one I talked to, not my coworkers that use it for work, not my friends, not my 72 year old mother think they are sentient.
merdaverse@lemm.ee 2 weeks ago
OpenAI has been talking about AGI for years, implying that they are getting closer to it with their products.
openai.com/index/planning-for-agi-and-beyond/
openai.com/…/elon-musk-wanted-an-openai-for-profi…
FMT99@lemmy.world 2 weeks ago
Hey I didn’t say anywhere that corporations don’t lie to promote their product did I?
Empricorn@feddit.nl 2 weeks ago
You’re not wrong, but keep in mind ChatGPT advocates, including the company itself are referring to it as AI, including in marketing. They’re saying it’s a complete, self-learning, constantly-evolving Artificial Intelligence that has been improving itself since release… And it loses to a 4KB video game program from 1979 that can only “think” 2 moves ahead.
FMT99@lemmy.world 2 weeks ago
That’s totally fair, the company is obviously lying, excuse me “marketing”, to promote their product, that’s absolutely true.
x00z@lemmy.world 2 weeks ago
In all fairness. Machine learning in chess engines is actually pretty strong.
www.chess.com/terms/alphazero-chess-engine
jeeva@lemmy.world 2 weeks ago
Sure, but machine learning like that is very different to how LLMs are trained and their output.
FMT99@lemmy.world 2 weeks ago
Oh absolutely you can apply machine learning to game strategy. But you can’t expect a generalized chatbot to do well at strategic decision making for a specific game.
saltesc@lemmy.world 2 weeks ago
I like referring to LLMs as VI (Virtual Intelligence from Mass Effect) since they merely give the impression of intelligence but are little more than search engines. In the end all one is doing is displaying expected results based on a popularity algorithm. However they do this inconsistently due to bad data in and limited caching.
FartMaster69@lemmy.dbzer0.com 2 weeks ago
I mean, open AI seem to forget it isn’t.