The Lemmy Club

947 readers
28 users here now

Welcome to The Lemmy Club!

Instance Rules:

  1. Don't be a dick.
  2. Racism/slurs/etc use will not be tolerated.
  3. No spamming.
  4. Don't harass other users (See rule 1)
  5. NSFW content must be marked correctly.
  6. All content must comply with US law
  7. Loli/etc. will not be tolerated. Suggestive or sexual art must be reasonably recognizable as adult subjects.
  8. These rules apply to all content and users that appear on The Lemmy Club. Moderation is on an as noticed/as reported basis. If you see rule breaking content, I likely have just not seen it yet. Please report it.
  9. Instances/users/communities that tolerate, repeatedly fail to enforce, or allow content that breaks any of these rules may be banned from The Lemmy Club.
  10. The site admin team (well, just @bdonvr really as of now) has final say in interpretations of all rules.

Help contribute towards our operating costs to keep us going and growing: https://opencollective.com/thelemmyclub/

We host MLMYM (a clone of old.reddit) at https://old.thelemmy.club

We host Voyager (a mobile optimized webapp) at https://app.thelemmy.club

founded 1 year ago
ADMINS
1
 
 

High-profile A.I. chatbot ChatGPT performed worse on certain tasks in June than its March version, a Stanford University study found.

The study compared the performance of the chatbot, created by OpenAI, over several months at four “diverse” tasks: solving math problems, answering sensitive questions, generating software code, and visual reasoning.

Researchers found wild fluctuations—called drift—in the technology’s ability to perform certain tasks. The study looked at two versions of OpenAI’s technology over the time period: a version called GPT-3.5 and another known as GPT-4. The most notable results came from research into GPT-4’s ability to solve math problems. Over the course of the study researchers found that in March GPT-4 was able to correctly identify that the number 17077 is a prime number 97.6% of the times it was asked. But just three months later, its accuracy plummeted a lowly 2.4%. Meanwhile, the GPT-3.5 model had virtually the opposite trajectory. The March version got the answer to the same question right just 7.4% of the time—while the June version was consistently right, answering correctly 86.8% of the time.

2
3
 
 

"The chatbot gave wildly different answers to the same math problem, with one version of ChatGPT even refusing to show how it came to its conclusion."

It's getting worse. And because it's a black box model they don't know why. The computer science professor here likens it to how human students make mistakes... but human students make mistakes because they don't have perfect recall, mishear things being told to them, are tired and/or not paying attention... A bunch of reason that basically relate to having a human body that needs food, rest and water. A thing a computer does not have.

The only reason ChatGPT should be getting math wrong is that it's getting inputs that are wrong, but without view into it they can't figure out where it's getting it wrong and who told it the wrong info.

view more: next ›