ChatGPT 'got absolutely wrecked' by Atari 2600 in beginner's chess match — OpenAI's newest model bamboozled by 1970s logic

Submitted ⁨⁨4⁩ ⁨months⁩ ago⁩ by ⁨Lifecoach5000@lemmy.world⁩ to ⁨technology@lemmy.world⁩

https://www.tomshardware.com/tech-industry/artificial-intelligence/chatgpt-got-absolutely-wrecked-by-atari-2600-in-beginners-chess-match-openais-newest-model-bamboozled-by-1970s-logic

source

Comments

Sort:hotnew top

FMT99@lemmy.world ⁨4⁩ ⁨months⁩ ago
Did the author thinks ChatGPT is in fact an AGI? It’s a chatbot. Why would it be good at chess? It’s like saying an Atari 2600 running a dedicated chess program can beat Google Maps at chess.

source
- spankmonkey@lemmy.world ⁨4⁩ ⁨months⁩ ago
  AI including ChatGPT is being marketed as super awesome at everything, which is why that and similar AI is being forced into absolutely everything and being sold as a replacement for people.
  
  Something marketed as AGI should be treated as AGI when proving it isn’t AGI.
  
  source
  - pelespirit@sh.itjust.works ⁨4⁩ ⁨months⁩ ago
    Not to help the AI companies, but why don’t they program them to look up math programs and outsource chess to other programs when they’re asked for that stuff? It’s obvious they’re shit at it, why do they answer anyway? It’s because they’re programmed by know-it-all programmers, isn’t it.
    
    source
    -> View More Comments
  - PixelatedSaturn@lemmy.world ⁨4⁩ ⁨months⁩ ago
    I don’t think ai is being marketed as awesome at everything. It’s got obvious flaws. Right now its not good for stuff like chess, probably not even tic tac toe. It’s a language model, its hard for it to calculate the playing field. But ai is in development, it might not need much to start playing chess.
    
    source
    -> View More Comments
- malwieder@feddit.org ⁨4⁩ ⁨months⁩ ago
  Google Maps doesn’t pretend to be good at chess. ChatGPT does.
  
  source
  - whaleross@lemmy.world ⁨4⁩ ⁨months⁩ ago
    A toddler can pretend to be good at chess but anybody with reasonable expectations knows that they are not.
    
    source
    -> View More Comments
- suburban_hillbilly@lemmy.ml ⁨4⁩ ⁨months⁩ ago
  Most people do. It’s just called AI in the media everywhere and marketing works. I think online folks forget that something as simple as getting a Lemmy account by yourself puts you into the top quintile of tech literacy.
  
  source
  - Opinionhaver@feddit.uk ⁨4⁩ ⁨months⁩ ago
    Yet even on Lemmy people can’t seem to make sense of these terms and are saying things like “LLM’s are not AI”
    
    source
    -> View More Comments
- iAvicenna@lemmy.world ⁨4⁩ ⁨months⁩ ago
  well so much hype has been generated around chatgpt being close to AGI that now it makes sense to ask questions like “can chatgpt prove the Riemann hypothesis”
  
  source
  - red@sopuli.xyz ⁨4⁩ ⁨months⁩ ago
    Even the models that pretend to be AGI are not. It’s been proven.
    
    source
- Broken@lemmy.ml ⁨4⁩ ⁨months⁩ ago
  I agree with your general statement, but in theory since all ChatGPT does is regurgitate information back and a lot of chess is memorization of historical games and types, it might actually perform well. No, it can’t think, but it can remember everything so at some point that might tip the results in it’s favor.
  
  source
  - Eagle0110@lemmy.world ⁨4⁩ ⁨months⁩ ago
    Regurgitating am impression of, not regurgitating verbatim, that’s the problem here.
    
    Chess is 100% deterministic, so it falls flat.
    
    source
    -> View More Comments
  - FMT99@lemmy.world ⁨4⁩ ⁨months⁩ ago
    I mean it may be possible but the complexity would be so many orders of magnitude greater. It’d be like learning chess by just memorizing all the moves great players made but without any context or understanding of the underlying strategy.
    
    source
- adhdplantdev@lemm.ee ⁨4⁩ ⁨months⁩ ago
  Articles like this are good because it exposes the flaws with the ai and that it can’t be trusted with complex multi step tasks.
  
  Helps people see that think AI is close to a human that its not and its missing critical functionality
  
  source
  - FMT99@lemmy.world ⁨4⁩ ⁨months⁩ ago
    The problem is though that this perpetuates the idea that ChatGPT is actually an AI.
    
    source
    -> View More Comments
- TowardsTheFuture@lemmy.zip ⁨4⁩ ⁨months⁩ ago
  I think that’s generally the point is most people thing chat GPT is this sentient thing that knows everything and… no.
  
  source
  - PixelatedSaturn@lemmy.world ⁨4⁩ ⁨months⁩ ago
    Do they though? No one I talked to, not my coworkers that use it for work, not my friends, not my 72 year old mother think they are sentient.
    
    source
    -> View More Comments
- merdaverse@lemm.ee ⁨4⁩ ⁨months⁩ ago
  OpenAI has been talking about AGI for years, implying that they are getting closer to it with their products.
  
  openai.com/index/planning-for-agi-and-beyond/
  
  openai.com/…/elon-musk-wanted-an-openai-for-profi…
  
  source
  - FMT99@lemmy.world ⁨4⁩ ⁨months⁩ ago
    Hey I didn’t say anywhere that corporations don’t lie to promote their product did I?
    
    source
- Empricorn@feddit.nl ⁨4⁩ ⁨months⁩ ago
  You’re not wrong, but keep in mind ChatGPT advocates, including the company itself are referring to it as AI, including in marketing. They’re saying it’s a complete, self-learning, constantly-evolving Artificial Intelligence that has been improving itself since release… And it loses to a 4KB video game program from 1979 that can only “think” 2 moves ahead.
  
  source
  - FMT99@lemmy.world ⁨4⁩ ⁨months⁩ ago
    That’s totally fair, the company is obviously lying, excuse me “marketing”, to promote their product, that’s absolutely true.
    
    source
- x00z@lemmy.world ⁨4⁩ ⁨months⁩ ago
  In all fairness. Machine learning in chess engines is actually pretty strong.
  
  AlphaZero was developed by the artificial intelligence and research company DeepMind, which was acquired by Google. It is a computer program that reached a virtually unthinkable level of play using only reinforcement learning and self-play in order to train its neural networks. In other words, it was only given the rules of the game and then played against itself many millions of times (44 million games in the first nine hours, according to DeepMind).
  
  www.chess.com/terms/alphazero-chess-engine
  
  source
  - jeeva@lemmy.world ⁨4⁩ ⁨months⁩ ago
    Sure, but machine learning like that is very different to how LLMs are trained and their output.
    
    source
  - FMT99@lemmy.world ⁨4⁩ ⁨months⁩ ago
    Oh absolutely you can apply machine learning to game strategy. But you can’t expect a generalized chatbot to do well at strategic decision making for a specific game.
    
    source
- saltesc@lemmy.world ⁨4⁩ ⁨months⁩ ago
  I like referring to LLMs as VI (Virtual Intelligence from Mass Effect) since they merely give the impression of intelligence but are little more than search engines. In the end all one is doing is displaying expected results based on a popularity algorithm. However they do this inconsistently due to bad data in and limited caching.
  
  source
- FartMaster69@lemmy.dbzer0.com ⁨4⁩ ⁨months⁩ ago
  I mean, open AI seem to forget it isn’t.
  
  source
Objection@lemmy.ml ⁨4⁩ ⁨months⁩ ago
Tbf, the article should probably mention the fact that machine learning programs designed to play chess blow everything else out of the water.

source
- bier@feddit.nl ⁨4⁩ ⁨months⁩ ago
  Yeah its like judging how great a fish is at climbing a tree. But it does show that it’s not real intelligence or reasoning
  
  source
  - 13igTyme@lemmy.world ⁨4⁩ ⁨months⁩ ago
    Don’t call my fish stupid.
    
    source
    -> View More Comments
- Zenith@lemm.ee ⁨4⁩ ⁨months⁩ ago
  I forgot which airline it is but one of the onboard games in the back of a headrest TV was a game called “Beginners Chess” which was notoriously difficult to beat so it was tested against other chess engines and it ranked in like the top five most powerful chess engines ever
  
  source
- andallthat@lemmy.world ⁨4⁩ ⁨months⁩ ago
  Machine learning has existed for many years, now. The issue is with these funding-hungry new companies taking their LLMs, repackaging them as “AI” and attributing every ML win ever to “AI”.
  
  Yes, ML programs designed and trained specifically to identify tumors in medical imaging have become good diagnostic tools. But if you read in news that “AI helps cure cancer”, it makes it sound like a bunch of researchers just spent a few minutes engineering the right prompt for Copilot.
  
  That’s why, yes a specifically-designed and finely tuned ML program can now beat the best human chess player, but calling it “AI” and bundling it together with the latest Gemini or Claude iteration is intentionally misleading.
  
  source
- wewbull@feddit.uk ⁨4⁩ ⁨months⁩ ago
  It does
  
  source
  - Objection@lemmy.ml ⁨4⁩ ⁨months⁩ ago
    I does not. Where?
    
    source
NeilBru@lemmy.world ⁨4⁩ ⁨months⁩ ago
An LLM is a poor computational paradigm for playing chess.

source
- surph_ninja@lemmy.world ⁨4⁩ ⁨months⁩ ago
  This just in: a hammer makes a poor screwdriver.
  
  source
  - WhyJiffie@sh.itjust.works ⁨4⁩ ⁨months⁩ ago
    LLMs are more like a leaf blower though
    
    source
- Takapapatapaka@lemmy.world ⁨4⁩ ⁨months⁩ ago
  Actually, a very specific model (chatgpt3.5-turbo-instruct) was pretty good at chess (around 1700 elo if i remember correctly).
  
  source
  - NeilBru@lemmy.world ⁨4⁩ ⁨months⁩ ago
    I’m impressed, if that’s true! In general, an LLM’s training cost vs. an LSTM, RNN, or some other more appropriate DNN algorithm suitable for the ruleset is laughably high.
    
    source
    -> View More Comments
- Bleys@lemmy.world ⁨4⁩ ⁨months⁩ ago
  The underlying neural network tech is the same as what the best chess AIs (AlphaZero, Leela) use. The problem is, as you said, that ChatGPT is designed specifically as an LLM so it’s been optimized strictly to write semi-coherent text first, and then any problem solving beyond that is ancillary. Which should say a lot about how inconsistent ChatGPT is at solving problems, given that it’s not actually optimized for any specific use cases.
  
  source
  - NeilBru@lemmy.world ⁨4⁩ ⁨months⁩ ago
    Yes, I agree wholeheartedly with your clarification.
    
    My career path, as I stated in a different comment, In regards to neural networks is focused on generative DNNs for CAD applications and parametric 3D modeling. Before that, I began as a researcher in cancerous tissue classification and object detection in medical diagnostic imaging.
    
    Thus, large language models are well out of my area of expertise in terms of the architecture of their models.
    
    However, fundamentally it boils down to the fact that the specific large language models was designed to predict text and not necessarily solve problems/play games to “win”/“survive”.
    
    I admit that I’m just parroting what you stated and maybe rehashing what I stated even before that, but I like repeating and refining in simple terms to explain to laymen and, dare I say, clients. It helps me make make I don’t come off as too pompous when talking about this subject; forgive my tedium.
    
    source
- sugar_in_your_tea@sh.itjust.works ⁨4⁩ ⁨months⁩ ago
  Yeah, a lot of them hallucinate moves.
  
  source
AlecSadler@sh.itjust.works ⁨4⁩ ⁨months⁩ ago
ChatGPT has been, hands down, the worst AI coding assistant I’ve ever used.

It regularly suggests code that doesn’t compile or isn’t even for the language.

It generally suggests AC of code that is just a copy of the lines I just wrote.

Sometimes it likes to suggest setting the same property like 5 times.

It is absolute garbage and I do not recommend it to anyone.

source
- j4yt33@feddit.org ⁨4⁩ ⁨months⁩ ago
  I find it really hit and miss. Easy, standard operations are fine but if you have an issue with code you wrote and ask it to fix it, you can forget it
  
  source
  - AlecSadler@sh.itjust.works ⁨4⁩ ⁨months⁩ ago
    I’ve found Claude 3.7 and 4.0 and sometimes Gemini variants still leagues better than ChatGPT/Copilot.
    
    Still not perfect, but night and day difference.
    
    I feel like ChatGPT didn’t focus on coding and instead focused on mainstream, but I am not an expert.
    
    source
    -> View More Comments
  - PixelatedSaturn@lemmy.world ⁨4⁩ ⁨months⁩ ago
    I like tab coding, writing small blocks of code that it thinks I need. Its On point almost all the time. This speeds me up.
    
    source
    -> View More Comments
  - Blackmist@feddit.uk ⁨4⁩ ⁨months⁩ ago
    It’s the ideal help for people who shouldn’t be employed as programmers to start with.
    
    I had to explain hexadecimal to somebody the other day. It’s honestly depressing.
    
    source
- Etterra@discuss.online ⁨4⁩ ⁨months⁩ ago
  That’s because it doesn’t know what it’s saying. It’s just blathering out each word as what it estimates to be the likely next word given past examples in its training data. It’s a statistics calculator. It’s marginally better than just smashing the auto fill on your cell repeatedly. It’s literally dumber than a parrot.
  
  source
  - AnUnusualRelic@lemmy.world ⁨4⁩ ⁨months⁩ ago
    Parrots are actually intelligent though.
    
    source
    -> View More Comments
- nutsack@lemmy.dbzer0.com ⁨4⁩ ⁨months⁩ ago
  my favorite thing is to constantly be implementing libraries that don’t exist
  
  source
  - Blackmist@feddit.uk ⁨4⁩ ⁨months⁩ ago
    You’re right. That library was removed in ToolName [PriorVersion]. Please try this instead.
    
    *makes up entirely new fictitious library name*
    
    source
  - jj4211@lemmy.world ⁨4⁩ ⁨months⁩ ago
    Oh man, I feel this. A couple of times I’ve had to field questions about some REST API I support and they ask why they get errors when they supply a specific attribute. Now that attribute never existed, not in our code, not in our documentation, we never thought of it. So I say “Well, that attribute is invalid, I’m not sure where you saw to do that”. They get insistent that the code is generated by a very good LLM, so we must be missing something…
    
    source
  - arc99@lemmy.world ⁨4⁩ ⁨months⁩ ago
    It’s even worse when AI soaks up some project whose APIs are constantly changing. Try using AI to code against jetty for example and you’ll be weeping.
    
    source
- arc99@lemmy.world ⁨4⁩ ⁨months⁩ ago
  All AIs are the same. They’re just scraping content from GitHub, stackoverflow etc with a bunch of guardrails slapped on to spew out sentences that conform to their training data but there is no intelligence. They’re super handy for basic code snippets but anyone using them anything remotely complex or nuanced will regret it.
  
  source
  - NateNate60@lemmy.world ⁨4⁩ ⁨months⁩ ago
    One of my mates generated an entire website using Gemini. It was a React web app that tracks inventory for trading card dealers. It actually did come out functional and well-polished. That being said, the AI really struggled with several aspects of the project that humans would not:
    
    It left database secrets in the code
    
    The design of the website meant that it was impossible to operate securely
    
    The quality of the code itself was hot garbage—unreadable and undocumented nonsense that somehow still worked
    
    It did not break the code into multiple files. It piled everything into a single file
    
    source
  - AlecSadler@sh.itjust.works ⁨4⁩ ⁨months⁩ ago
    I’ve used agents for implementing entire APIs and front-ends from the ground up with my own customizations and nuances.
    
    I will say that, for my pedantic needs, it typically only gets about 80-90% of the way there so I still have to put fingers to code, but it definitely saves a boat load of time in those instances.
    
    source
- ILikeBoobies@lemmy.ca ⁨4⁩ ⁨months⁩ ago
  I’ve had success with splitting a function into 2 and planning out an overview, though that’s more like talking to myself
  
  I wouldn’t use it to generate stuff though
  
  source
- Mobiuthuselah@lemm.ee ⁨4⁩ ⁨months⁩ ago
  I don’t use it for coding. I use it sparingly really, but want to learn to use it more efficiently. Are there any areas in which you think it excels? Are there others that you’d recommend instead?
  
  source
  - uairhahs@lemmy.world ⁨4⁩ ⁨months⁩ ago
    Use Gemini (2.5) or Claude (3.7 and up). OpenAI is a shitshow
    
    source
nednobbins@lemm.ee ⁨4⁩ ⁨months⁩ ago
Sometimes it seems like most of these AI articles are written by AIs with bad prompts.

Human journalists would hopefully do a little research. A quick search would reveal that researches have been publishing about this for over a year so there’s no need to sensationalize it. Perhaps the human journalist could have spent a little time talking about why LLMs are bad at chess and how researchers are approaching the problem.

LLMs on the other hand, are very good at producing clickbait articles with low information content.

source
- nova_ad_vitum@lemmy.ca ⁨4⁩ ⁨months⁩ ago
  Gotham chess has a video of making chatgpt play chess against stockfish. Spoiler: chatgpt does not do well. It plays okay for a few moves but then the moment it gets in trouble it straight up cheats. Telling it to follow the rules of chess doesn’t help.
  
  This sort of gets to the heart of LLM-based “AI”. That one example to me really shows that there’s no actual reasoning happening inside. It’s producing answers that statistically look like answers that might be given based on that input.
  
  For some things it even works. But calling this intelligence is dubious at best.
  
  source
  - Ultraviolet@lemmy.world ⁨4⁩ ⁨months⁩ ago
    Because it doesn’t have any understanding of the rules of chess or even an internal model of the game state? it just has the text of chess games in its training data and can reproduce the notation, but nothing to prevent it from making illegal or nonsensical moves.
    
    source
  - Noodle07@lemmy.world ⁨4⁩ ⁨months⁩ ago
    Hallucinating 100% of the time 👌
    
    source
  - JacksonLamb@lemmy.world ⁨4⁩ ⁨months⁩ ago
    ChatGPT versus Deepseek is hilarious. They both cheat like crazy and then one side jedi mind tricks the winner into losing.
    
    source
    -> View More Comments
- LovableSidekick@lemmy.world ⁨4⁩ ⁨months⁩ ago
  In this case it’s not even bad prompts, it’s a problem domain ChatGPT was never built to be good at. Like saying modern medicine is clearly bullshit because a doctor sucked at basketball.
  
  source
Halosheep@lemm.ee ⁨4⁩ ⁨months⁩ ago
I swear every single article critical of current LLMs is like, “The square got BLASTED by the triangle shape when it completely FAILED to go through the triangle shaped hole.”

source
- drspod@lemmy.ml ⁨4⁩ ⁨months⁩ ago
  It’s newsworthy when the sellers of squares are saying that nobody will ever need a triangle again, and the shape-sector of the stock market is hysterically pumping money into companies that make or use squares.
  
  source
- ipkpjersi@lemmy.ml ⁨4⁩ ⁨months⁩ ago
  That’s just clickbait in general these days lol
  
  source
- lambalicious@lemmy.sdf.org ⁨4⁩ ⁨months⁩ ago
  Well, the first and obvious thing to do to show that AI is bad is to show that AI is bad. If it provides that much of a low-hanging fruit for the demonstration… that just further emphasizes the point.
  
  source
floofloof@lemmy.ca ⁨4⁩ ⁨months⁩ ago
ChatGPT the word prediction machine? Why would anyone expect it to be good at chess?

source
MonkderVierte@lemmy.zip ⁨4⁩ ⁨months⁩ ago
LLM are not built for logic.

source
anubis119@lemmy.world ⁨4⁩ ⁨months⁩ ago
A strange game. How about a nice game of Global Thermonuclear War?

source
Lembot_0003@lemmy.zip ⁨4⁩ ⁨months⁩ ago
The Atari chess program can play chess better than the Boeing 747 too. And better than the North Pole. Amazing!

source
Furbag@lemmy.world ⁨4⁩ ⁨months⁩ ago
Can ChatGPT actually play chess now? Last I checked, it couldn’t remember more than 5 moves of history so it wouldn’t be able to see the true board state and would make illegal moves, take it’s own pieces, materialize pieces out of thin air, etc.

source
cley_faye@lemmy.world ⁨4⁩ ⁨months⁩ ago
Ah, you used logic. That’s the issue. They don’t do that.

source
arc99@lemmy.world ⁨4⁩ ⁨months⁩ ago
Hardly surprising. Llms aren’t -thinking- they’re just shitting out the next token for any given input of tokens.

source
capuccino@lemmy.world ⁨4⁩ ⁨months⁩ ago
This made my day

source
vane@lemmy.world ⁨4⁩ ⁨months⁩ ago
It’s not that hard to beat dumb 6 year old who’s only purpose is mine your privacy to sell you ads or product place some shit for you in future.

source
finitebanjo@lemmy.world ⁨4⁩ ⁨months⁩ ago
All these comments asking “why don’t they just have chatgpt go and look up the correct answer”.

That’s not how it works, you buffoons, it trains off of datasets long before it releases. It doesn’t think. It doesn’t learn after release, it won’t remember things you try to teach it.

Really lowering my faith in humanity when even the AI skeptics don’t understand that it generates statistical representations of an answer based on answers given in the past.

source
jsomae@lemmy.ml ⁨4⁩ ⁨months⁩ ago
Using an LLM as a chess engine is like using a power tool as a table leg. Pretty funny honestly, but it’s obviously not going to be good at it, at least not without scaffolding.

source
seven_phone@lemmy.world ⁨4⁩ ⁨months⁩ ago
You say you produce good oranges but my machine for testing apples gave your oranges a very low score.

source
Nurse_Robot@lemmy.world ⁨4⁩ ⁨months⁩ ago
I’m often impressed at how good chatGPT is at generating text, but I’ll admit it’s hilariously terrible at chess. It loves to manifest pieces out of thin air, or make absurd illegal moves, like jumping its king halfway across the board and claiming checkmate

source
Sidhean@lemmy.blahaj.zone ⁨4⁩ ⁨months⁩ ago
Can i fistfight ChatGPT next? I bet I could kick its ass, too :p

source
Kolanaki@pawb.social ⁨4⁩ ⁨months⁩ ago
There was a chess game for the Atari 2600? :O

I wanna see them W I D E pieces.

source
stevedice@sh.itjust.works ⁨4⁩ ⁨months⁩ ago
2025 Mazda MX-5 Miata ‘got absolutely wrecked’ by Inflatable Boat in beginner’s boat racing match — Mazda’s newest model bamboozled by 1930s technology.

source
muntedcrocodile@lemm.ee ⁨4⁩ ⁨months⁩ ago
This isn’t the strength of gpt-o4 the model has been optimised for tool use as an agent. That’s why its so good at image gen relative to their models it uses tools to construct an image piece by piece similar to a human. Also probably poor system prompting. A LLM is not a universal thinking machine its a a universal process machine. An LLM understands the process and uses tools to accomplish the process hence its strengths in writing code (especially as an agent).

Its similar to how a monkey is infinitely better at remembering a sequence of numbers than a human ever could but is totally incapable of even comprehending writing down numbers.

source
NotMyOldRedditName@lemmy.world ⁨4⁩ ⁨months⁩ ago
Okay, but could ChatGPT be used to vibe code a chess program that beats the Atari 2600?

source
FourWaveforms@lemm.ee ⁨4⁩ ⁨months⁩ ago
If you don’t play chess, the Atari is probably going to beat you as well.

LLMs are only good at things to the extent that they have been well-trained in the relevant areas. Not just learning to predict text string sequences, but reinforcement learning after that, where a human or some other agent says “this answer is better than that one” enough times in enough of the right contexts. It mimics the way humans learn, which is through repeated and diverse exposure.

If they set up a system to train it against some chess program, or (much simpler) simply gave it a tool call, it would do much better. Tool calling already exists and would be by far the easiest way.

It could also be instructed to write a chess solver program and then run it, at which point it would be on par with the Atari, but it wouldn’t compete well with a serious chess solver.

source
IsaamoonKHGDT_6143@lemmy.zip ⁨4⁩ ⁨months⁩ ago
They used ChatGPT 4o, instead of using o1 or o3.

Obviously it was going to fail.

source
krigo666@lemmy.world ⁨4⁩ ⁨months⁩ ago
Next, pit ChatGPT against 1K Chess in a ZX81.

source
ICastFist@programming.dev ⁨4⁩ ⁨months⁩ ago
So, it fares as well as the average schmuck, proving it is human

/s

source
Harbinger01173430@lemmy.world ⁨4⁩ ⁨months⁩ ago
Llms useless confirmed once again

source

-> View More Comments