this post was submitted on 26 Jun 2023
119 points (97.6% liked)
Asklemmy
43775 readers
1049 users here now
A loosely moderated place to ask open-ended questions
Search asklemmy ๐
If your post meets the following criteria, it's welcome here!
- Open-ended question
- Not offensive: at this point, we do not have the bandwidth to moderate overtly political discussions. Assume best intent and be excellent to each other.
- Not regarding using or support for Lemmy: context, see the list of support communities and tools for finding communities below
- Not ad nauseam inducing: please make sure it is a question that would be new to most members
- An actual topic of discussion
Looking for support?
Looking for a community?
- Lemmyverse: community search
- sub.rehab: maps old subreddits to fediverse options, marks official as such
- [email protected]: a community for finding communities
~Icon~ ~by~ ~@Double_[email protected]~
founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
LLMs, IIRC, are really bad at IQ-test type questions that require abstract reasoning, especially if they require multiple steps. So, something like
The box is yellow and red.
If the box is yellow, it is good.
If the box is blue, it is unhappy.
If the box is good and happy, the box is awesome.
If the box is red, it is happy.
Is the box awesome?
is what I'd use.
From ChatGPT 4:
Yes, the box is awesome.
According to the provided statements, the box is yellow and red.
"If the box is yellow, it is good." - So, since the box is yellow, it is good.
"If the box is red, it is happy." - And since the box is red, it is happy.
Finally, "If the box is good and happy, the box is awesome." - Therefore, because the box is both good (due to being yellow) and happy (due to being red), the box is indeed awesome.
Bard however, does not seem to get the answer right:
Seems like it got kind of close, with the "The box is both yellow and red, so it is both good and happy"... but then falls apart afterwards.
Edit: I tried to debate with it:
Me:
Bard:
Which is interesting to say the least, its almost like its looking a bit too deeply into the question lol.
Bing had no trouble
Bing is GPT4 based, though I don't think the same version as ChatGPT. But either way GPT4 can solve these types of problems all day.
Not surprised. I got access to bard a while back and it does quite a lot more hallucinating than even GPT3.5.
Though it doubling down on the wrong answer even when corrected is something I've seen GPT4 do even in some cases. It seems like once it says something, it usually sticks to it.
That's terrifyingly good wtf
I was going to say you could give it a math problem that uses big numbers but tried one on GPT4 and it succeeded. GPT3 though will absolutely fail at nontrivial math every time.
Yeah. It still definitely does! The interesting thing is that it seems to be very good at estimating and the final answer it gives is usually pretty close to correct in my experience. Of course close doesn't really count in math problems.
Darn, foiled again. You win this round Chat GPT.
Thats just an algebraic equation but with words.
This sounds like something chatGPT could easily convert. Although that is computationally high.
It would also have the side affects of blocking all math illiterate people from creating accounts.
A + B = C
A = X
D != Y
X + Y = Z
B = Y
Does C = Z?
exactly
Um wtf, I'm starting to doubt if I'm a human. ๐ค