Technology

12 readers

1 users here now

This magazine is dedicated to discussions on the latest developments, trends, and innovations in the world of technology. Whether you are a tech enthusiast, a developer, or simply curious about the latest gadgets and software, this is the place for you. Here you can share your knowledge, ask questions, and engage in discussions on topics such as artificial intelligence, robotics, cloud computing, cybersecurity, and more. From the impact of technology on society to the ethical considerations of new technologies, this category covers a wide range of topics related to technology. Join the conversation and let's explore the ever-evolving world of technology together!

founded 2 years ago

Stanford Scientists Find That Yes, ChatGPT Is Getting Stupider (futurism.com)

submitted 1 year ago by [email protected] to c/[email protected]

18 comments fedilink hide all child comments

While looking into artificial intelligence "behavior," researchers affirmed that yes, OpenAI's GPT-4 appeared to be getting dumber.

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 11 points 1 year ago (3 children)

No they didn't. The study is misleading in a number of ways.

The first version thought every number is a prime, the second version thought none of them were primes.
Their complaints about the code generation are about formatting stuff that gets filtered out by the chat window. GPT actually got better at outputting runnable code over time.

[–] [email protected] 7 points 1 year ago (1 children)

That explanation of the prime number thing doesn't seem to actually match what's in the paper. GPT4 goes from a wordy explanation of how it arrived at the correct answer, "yes", to a single-word incorrect "no". GPT3.5 goes from a wordy explanation that has the right chain of thought but the wrong answer "no" to a very wordy explanation with the correct answer "yes". Neither of those seem to be predicated on either of the models just answering one way for everything.

[–] [email protected] 4 points 1 year ago (1 children)

@rastilin is making some unproven assumptions here. But it is true that the "math question" dataset consists only of prime numbers, so if the first version thought every number was prime and the second thought no numbers were prime, we would see this exact behavior. Source:

For this dataset, we query the primality of 500 randomly chosen primes between 1,000 and 20,000; the correct answer is always Yes.

From Zhang et al. (2023), the paper they took the dataset from.

[–] [email protected] 2 points 1 year ago (1 children)

Surely it can't just be a case of the LM saying a hard yes or no to any question of "is this prime" with the data they have, though? The results are a significant majority one way or the other in each case, but never 100%. Of the 500 each time, GPT3.5 has 37 answers go against the majority in March and 66 in July. That doesn't seem like a hard one answer to any primality query to me, though that does come with the caveat that I'm by no means actually well studied on the topic

[–] [email protected] 2 points 1 year ago (1 children)

True, GPT does not return a "yes" or "no" 100% of the time in either case, but that's not the point. The point is that it's impossible to say if GPT has actually gotten better or worse at predicting prime numbers with their test set. Since the test set is composed of only prime numbers, we do not know if GPT is more likely to call a number "prime" when it actually is a prime number than when it isn't. All we know is that it was very likely to answer "yes" to the question "is this number prime?" in March, and very likely to answer "no" in July. We do not know if the number makes a difference.

[–] [email protected] 2 points 1 year ago

Ahh, I see what you're getting at. Thanks for clarifying

[–] [email protected] 3 points 1 year ago

Damn, you're right. The study has not been peer reviewed yet according to the article, and in my opinion, it really shows. For anyone who doesn't want to actually read the study:

They took the set of questions from a different study (which is fine). The original study had a set of 500 randomly chosen prime numbers and asked ChatGPT if they were prime, and to support its reasoning. They did this to see if in the cases where ChatGPT got the question wrong, ChatGPT would try to support its wrong answer with more faulty reasoning - a dataset with only prime numbers is perfectly fine for this initial question.

The study in the article appears to be trying to answer two questions - is there significant drift in the answers ChatGPT gives, and is ChatGPT getting better or worse at answering questions. The dataset is perfectly fine for answering the first question, but completely inadequate for answering the second, since an AI that simply thinks all numbers are prime would be judged as having perfect accuracy! Some good peer review would never let that kind of thing slide.

[–] [email protected] 3 points 1 year ago

Holy shit you weren't kidding. The Markdown backticks being "not directly executable" is perhaps the dumbest take I've ever heard on ChatGPT, and that's saying a lot. Wow