ChatGPT 'got absolutely wrecked' by Atari 2600 in beginner's chess match — OpenAI's newest model bamboozled by 1970s logic (www.tomshardware.com)

submitted 7 hours ago by [email protected] to c/[email protected]

27 comments fedilink hide all child comments

top 27 comments

sorted by: hot top new old

[-] [email protected] 9 points 2 hours ago

A simple calculator will also beat it at math.

[-] [email protected] 26 points 5 hours ago

In a quite unexpected turn of events, it is claimed that OpenAI’s ChatGPT “got absolutely wrecked on the beginner level” while playing Atari Chess.

Who the hell thought this was "unexpected"?

What's next? ChatGPT vs. Microwave to see which can make instant oatmeal the fastest? 😂

[-] [email protected] 16 points 3 hours ago

Considering how much heat the servers probably generate, ChatGPT might have a decent chance in that competition 😁

[-] [email protected] 4 points 2 hours ago

Air-fried oatmeal, FTW!

[-] [email protected] 73 points 6 hours ago

Anyone even believing that a generic word auto completer would beat classic algorithms wherever possible probably belongs into a psychiatry.

[-] [email protected] 26 points 5 hours ago

There are a lot of people out there that think LLM's are somehow reasoning. Even reasoning models aren't really doing it. It important to do demonstrations like this in the hopes that the general public will understand the limitations of this tech.

[-] [email protected] 0 points 5 hours ago

But the general public (myself included) doesn’t really understand how our own reasoning happens.

Does anyone, really? i.e., am I merely a meat computer that takes in massive amounts of input over a lifetime, builds internal models of the world, tests said models through trial-and-error, and outputs novel combinations of data when said combinations are useful for me in a given context in said world?

Is what I do when I “reason” really all that different from what an LLM does, fundamentally? Do I do more than language prediction when I “think”? And if so, what is it?

[-] [email protected] 7 points 5 hours ago

This is definitely part of the issue, not sure why people are downvoting this. That's also why tests like this are important, to illustrate that thinking in the way we know it isn't happening in these models.

[-] [email protected] 19 points 6 hours ago

I think I remember some doge goon asking online about using an LLM to parse JSON. Many people don't understand things.

[-] [email protected] 9 points 5 hours ago

Jesus Christ software’s about to get far, far worse innit?

[-] [email protected] 4 points 5 hours ago

For us? Not as much, luckily most have the sentiment of rejecting anything LLM made and supported. But externals still have a lot of impact unfortunately, just ask @[email protected]

[-] [email protected] 19 points 6 hours ago

That’s too much critical thinking for most people

[-] [email protected] 28 points 6 hours ago

Atari game programmed to know chess moves: knight to B4

Chat-GPT: many Redditors have credited Chesster A. Pawnington with inventing the game when he chased the queen across the palace before crushing the king with a castle tower. Then he became the king and created his own queen by playing "The Twist" and "Let's Twist Again" at the same time.

[-] [email protected] 15 points 6 hours ago* (last edited 5 hours ago)

This article buries the lede so much that many readers probably miss it completely: the important takeaway here, which is clearer in The Register's version of the story, is that ChatGPT cannot actually play chess:

“Despite being given a baseline board layout to identify pieces, ChatGPT confused rooks for bishops, missed pawn forks, and repeatedly lost track of where pieces were."

To actually use an LLM as a chess engine without the kind of manual intervention that this person did, you would need to combine it with some other software to automate continuing to ask it for a different next move every time it suggests an invalid one. And, if you did that, it would still mostly lose, even to much older chess engines than Atari's Video Chess.

edit: i see now that numerous people have done this; you can find many websites where you can "play chess against chatgpt" (which actually means: with chatgpt and also some other mechanism to enforce the rules). and if you know how to play chess you should easily win :)

[-] [email protected] 11 points 5 hours ago

You probably could train an AI to play chess and win, but it wouldn't be an LLM.

In fact, let's go see...

Stockfish: Open-source and regularly ranks at the top of computer chess tournaments. It uses advanced alpha-beta search and a neural network evaluation (NNUE).
Leela Chess Zero (Lc0): Inspired by DeepMind’s AlphaZero, it uses deep reinforcement learning and plays via a neural network with Monte Carlo tree search.
AlphaZero: Developed by DeepMind, it reached superhuman levels using reinforcement learning and defeated Stockfish in high-profile matches (though not under perfectly fair conditions).

Hmm. neural networks and reinforcement learning. So non-LLM AI.

you can play chess against something based on chatgpt, and if you're any good at chess you can win

You don't even have to be good. You can just flat out lie to ChatGPT because fiction and fact are intertwined in language.

"You can't put me in check because your queen can only move 1d6 squares in a single turn."

[-] [email protected] 19 points 6 hours ago

Isn’t this kind of like ridiculing that same Atari for not being able to form coherent sentences? It’s not all that surprising that a system not designed to play chess loses to a system designed specifically for that purpose.

[-] [email protected] 14 points 5 hours ago

Pretty much, but the marketers are still trying to tell people it can totally do logic anyway. Hopefully the apple paper opens some eyes

[-] [email protected] 4 points 2 hours ago

For anyone wondering what "the" apple paper is: https://machinelearning.apple.com/research/illusion-of-thinking

[-] [email protected] 25 points 6 hours ago

This article makes ChatGPT sound like a deranged blowhard, blaming everything but its own ineptitude for its failure.

So yeah, that tracks.

[-] [email protected] 13 points 6 hours ago

A PE teacher got absolutely wrecked by a former Olympic sprinter at a sprint competition.

[-] [email protected] 21 points 6 hours ago

Change "PE teacher" to "stack of health magazines" and it's a more accurate equivalence.

[-] [email protected] 7 points 6 hours ago

Well... yeah. That's not what LLMs do. That's like saying "A leafblower got absolutely wrecked by 1998 Dodge Viper in beginner's drag race". It's only impressive if you don't understand what a leafblower is.

[-] [email protected] 5 points 6 hours ago* (last edited 6 hours ago)

People write code with LLMs. Programming language is just a language specialised at precise logic. That’s what „AI” is advertised to be good at. How can you do that an not the other?

[-] [email protected] 2 points 5 hours ago* (last edited 33 minutes ago)

"Precise logic" is specifically what AI is not any good at whatsoever.

AI might be able to write a program that beats an A2600 in chess, but it should not be expected to win at chess itself.

[-] [email protected] 3 points 2 hours ago* (last edited 2 hours ago)

I shall await the moment when AI pretends to be as confident about communicating not being able to do something as it is with the opposite because it looks like it’s my job somehow.

[-] [email protected] 1 points 2 hours ago

Yeah, LLMs seem pretty unlikely to do that, though if they figure it out that would be great. That's just not their wheelhouse. You have to know enough about what you're attempting to ask the right questions and recognize bad answers. The thing you're trying to do needs be within your reach without AI or you are unlikely to be successful.

I think the problem is more the over-promising what AI can do (or people who don't understand it at all making assumptions because it sounds human-like).

[-] [email protected] 3 points 5 hours ago* (last edited 5 hours ago)

machine designed to play chess beats machine not designed to play chess at chess!

Fascinating news!

Consider me successfully ragebaited into engaging. Why people are upvoting this drivel is beyond me.

this post was submitted on 09 Jun 2025

99 points (95.4% liked)

Technology

39044 readers

304 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 3 years ago

MODERATORS

[email protected]