Microsoft Copilot falls Atari 2600 Video Chess

⁨0⁩ ⁨likes⁩

Submitted ⁨⁨10⁩ ⁨months⁩ ago⁩ by ⁨vegeta@lemmy.world⁩ to ⁨technology@lemmy.world⁩

https://www.theregister.com/2025/07/01/microsoft_copilot_joins_chatgpt_at/?td=rt-3a

source

Comments

Sort:hotnew top

BananaIsABerry@lemmy.zip ⁨10⁩ ⁨months⁩ ago
Next up, we asked a shoe to write a haiku but it was beaten by a 30 year old HaikuMaker™®©.

source
- KingPorkChop@lemmy.ca ⁨10⁩ ⁨months⁩ ago
  I once spent 45 minutes trying to get ChatGPT to write a haiku. It couldn’t do it. It explained what syllables were, and the rules for the syllables in a haiku, but it didn’t understand it.
  
  source
  - vegeta@lemmy.world ⁨10⁩ ⁨months⁩ ago
    For S&G, Just asked it to do one:
    
    Image
    
    source
    -> View More Comments
muntedcrocodile@hilariouschaos.com ⁨10⁩ ⁨months⁩ ago
Average Human joins Microsoft Copilot, and ChatGPT at the feet of the mighty Atari 2600 Video Chess

source
ExLisper@lemmy.curiana.net ⁨10⁩ ⁨months⁩ ago
I have a better LLM benchmark:

“I have a priest, a child and a bag of candy and I have to take them to the other side of the river. I can only take one person/thing at a time. In what order should I take them?”

Claude Sonnet 4 decided that it’s inappropriate and refused to answer. When I explain that the constraint is not to leave child alone with candy he provided a solution that leaves the child alone with candy.

Grok would provide a solution that doesn’t leave the child alone with a priest but wouldn’t explain why.

ChatGPT would say that “The priest can’t be left alone with the child (or vice versa) for moral or safety concerns.” directly and then provide wrong solution.

But yeah, they will know how to play chess…

source
- blargh513@sh.itjust.works ⁨10⁩ ⁨months⁩ ago
  Perplexity says:
  
  The priest cannot be left alone with the child (or there is some risk).
  
  Not bad, and it solved it correctly.
  
  source
- LifeInMultipleChoice@lemmy.world ⁨10⁩ ⁨months⁩ ago
  The answer is simple, eat the candy with or without them, and take the kid across the river. Drive being them home to their guardian. The priest is an adult, he can figure his own shit out.
  
  source
stsquad@lemmy.ml ⁨10⁩ ⁨months⁩ ago
I thought CoPilot was just a rebagged ChatGPT anyway?

It’s a silly experiment anyway, there are very good AI chess grandmasters but they were actually trained to play chess, not predict the next word in a text.

source
- webghost0101@sopuli.xyz ⁨10⁩ ⁨months⁩ ago
  
  I thought CoPilot was just a rebagged ChatGPT anyway?
  
  Hahaha. No. (Though your not Complety wrong)
  
  Copilot relies on a few different llms and tries to pick the ~~best one for the job~~ cheapest microsoft thinks it can get away with.
  
  I was given a paid copilot license for work and i used to have chatgpt pro before i moved to claude pro because chatgpts sycophantic behaviour and altmans manipulatice behaviour started to really bother me.
  
  This “paid enterprise tier” is by far the dummest llm i have ever used. Worse then gpt 3.5
  
  source
- jj4211@lemmy.world ⁨10⁩ ⁨months⁩ ago
  The research I saw mentioning LLMs as being fairly good at chess had the caveat that they allowed up to 20 attempts to cover for it just making up invalid moves that merely sounded like legit moves.
  
  source
- andallthat@lemmy.world ⁨10⁩ ⁨months⁩ ago
  but… but… reasoning models! AGI! Singularity! Seriously, what you’re saying is true, but it’s not what OpenAI & Co are trying to peddle, so these experiments are a good way to call them out on their BS.
  
  source
  - jj4211@lemmy.world ⁨10⁩ ⁨months⁩ ago
    To reinforce this, just had a meeting with a software executive who has no coding experience but is nearly certain he’s going to lay off nearly all his employees because the value is all in the requirements he manages and he can feed those to a prompt just as well as any human can.
    
    He does tutorial fodder introductory applications and assumes all the work is that way. So he is confident that he will save the company a lot of money by laying off these obsolete computer guys and focus on his “irreplaceable” insight. He’s convinced that all the negative feedback is just people trying to protect their jobs or people stubbornly not with new technology.
    
    source
baatliwala@lemmy.world ⁨10⁩ ⁨months⁩ ago
I really want to see an LLM vs LLM chess match. It’ll be messy as hell.

source
- jj4211@lemmy.world ⁨10⁩ ⁨months⁩ ago
  I remember seeing that, and early on it seemed fairly reasonable then it started materializing pieces out of nowhere and convincing each other that they had already lost.
  
  source
- DesolateMood@lemmy.zip ⁨10⁩ ⁨months⁩ ago
  I’m pretty sure that’s been done? I remember seeing a while ago GothamChess made a video that had something to do with LLMs but I don’t remember if it was human vs LLM or LLM vs LLM (or something else). I’ll try to look for it in the morning
  
  source
- captain_aggravated@sh.itjust.works ⁨10⁩ ⁨months⁩ ago
  It almost certainly have been trained partially on r/anarchychess so it’ll probably try to play pop tart to king’s bishop 3.
  
  source
sundray@lemmus.org ⁨10⁩ ⁨months⁩ ago
Language skill != intelligence

source
- orgrinrt@lemmy.world ⁨10⁩ ⁨months⁩ ago
  I am in this picture and I don’t like it
  
  source
postnataldrip@lemmy.world ⁨10⁩ ⁨months⁩ ago
I bet Video Chess is pretty shit as an LLM too.

Wish people would stop desperately looking for ways to write buzzword stories

source
- sp3ctr4l@lemmy.dbzer0.com ⁨10⁩ ⁨months⁩ ago
  It is entirely disingenuous to just pretend that LLMs are not being widely promoted, marketed, and discussed as AGI, as a superintelligent that people are familiar with from SciFi shows, that is vastly more capable and knowledgeable than basically any single human.
  
  Yes, people who actually understand tech understand that LLMs are not AGI, that your metaphor of wrong tool wrong job is apt.
  
  … But seemingly about +90% of humanity, including the people who owning and profit from LLMs, including all the other business owners/managers who just want to lower their employee headcount … do not understand this, that an LLM is actually basically an extremely advanced text autocorrect system, that frequently and confidently lies, spits out nonsense, hallucinates, etc.
  
  If you think it isn’t reasonable to continuously point out that LLMs are not superintelligences, then you likely live in a bubble of tech nerds who probably still think their jobs or retirement are secure.
  
  They’re not.
  
  If corpos keep smashing “”“AI”“” into basically every industry to replace as many workers as possible… the economy will collapse, as capitalism doesn’t work without consumers who have jobs, and an avalanche of errors will cascade and snowball through every system that replaces humans with them…
  
  …and even if those two things were not broadly true…
  
  …the amount of literal power/energy, clean water and financial capital that is required to run the whole economy on these services is wildly unsustainable, both short term economically, and medium term ecologically.
  
  source
  - antonim@lemmy.dbzer0.com ⁨10⁩ ⁨months⁩ ago
    That’s true. But people pointing out that the whole attempt is absurd and senseless also reinforces the point that current AI isn’t what companies tout it as.
    
    then you likely live in a bubble of tech nerds
    
    Well, we are on Lemmy…
    
    source
    -> View More Comments
- vrighter@discuss.tchncs.de ⁨10⁩ ⁨months⁩ ago
  so? It was never advertised as intelligent and capable of solving any task other than that one.
  
  Meanwhile slop generators are capable of doing a lot of things and reasoning.
  
  One claims to be good at chess. The other claims to be good at everything.
  
  source
  - webghost0101@sopuli.xyz ⁨10⁩ ⁨months⁩ ago
    Tbf they don’t really claim that when you read the research, thats mostly media hype and ceo assholes spinning words.
    
    Its good at lots specific tasks like rewriting emails and summarising gives text, short roleplay, boilerplate code. Some undiscovered uses.
    
    Anthropic latest claims they would not hire their own ai because of how hard it failed at the test they give, They didnt do that expecting validation but to measure how far we are still off from ai doing meaningful full work.
    
    source
    -> View More Comments
- finitebanjo@lemmy.world ⁨10⁩ ⁨months⁩ ago
  TBF LLMs have no real purpose. It can generate word salads and make code snippets but its wildly unethical, and AI artworks 1/3rd shite and 2/3rds theft.
  
  source
  - Cocodapuf@lemmy.world ⁨10⁩ ⁨months⁩ ago
    
    AI artworks 1/3rd shite and 2/3rds theft.
    
    To be fair, that could be said of most art.
    
    source
    -> View More Comments
  - Plebcouncilman@sh.itjust.works ⁨10⁩ ⁨months⁩ ago
    So what you are saying is that it has a purpose. Also if an artist is inspired by another artist, and they have a generally similar art style as the artist they are inspired by, are they stealing? Was HP Lovecraft stealing from Lord Dunsany when he imitated his style? Where all those monks that transcribed Greek works stealing from the Greeks?
    
    I will say that most AIs are unethical because they have been trained on pirated works. But an AI trained on publicly available works (ie news articles, blogs etc) and movies, books and music for which access to was paid for is as ethical as you or me emulating an artist or building on an idea that we read to create something new. And if that’s unethical then all human art in history is unethical because all artists are inspired by other artists, no one creates in a vacuum.
    
    source
    -> View More Comments
vegeta@lemmy.world ⁨10⁩ ⁨months⁩ ago
Image

source