118

Google's Gemini 2.5 pro is out of beta. (awful.systems)

submitted 2 days ago* (last edited 2 days ago) by [email protected] to c/[email protected]

70 comments fedilink hide all child comments

I love to show that kind of shit to AI boosters. (In case you're wondering, the numbers were chosen randomly and the answer is incorrect).

They go waaa waaa its not a calculator, and then I can point out that it got the leading 6 digits and the last digit correct, which is a lot better than it did on the "softer" parts of the test.

you are viewing a single comment's thread
view the rest of the comments

[-] [email protected] 8 points 1 day ago

One of the big AI companies (Anthropic with claude? Yep!) wrote a long paper that details some common LLM issues, and they get into why they do math wrong and lie about it in "reasoning" mode.

It's actually pretty interesting, because you can't say they "don't know how to do math" exactly. The stochastic mechanisms that allow it to fool people with written prose also allow it to do approximate math. That's why some digits are correct, or it gets the order of magnitude right but still does the math wrong. It's actually layering together several levels of approximation.

The "reasoning" is just entirely made up. We barely understsnd how LLMs actually work, so none of them have been trained on research about that, which means LLMs don't understand their own functioning (not that they "understand" anything strictly speaking).

[-] [email protected] 8 points 1 day ago

Thing is, it has tool integration. Half of the time it uses python to calculate it. If it uses a tool, that means it writes a string that isn't shown to the user, which runs the tool, and tool results are appended to the stream.

What is curious is that instead of request for precision causing it to use the tool (or just any request to do math), and then presence of the tool tokens causing it to claim that a tool was used, the requests for precision cause it to claim that a tool was used, directly.

Also, all of it is highly unnatural texts, so it is either coming from fine tuning or from training data contamination.

[-] [email protected] 4 points 1 day ago

Also, if the LLM had reasoning capabilities that even remotely resembled those of an actual human, let alone someone who would be able to replace office workers, wouldn't they use the best tool they had available for every task (especially in a case as clear-cut as this)? After all, almost all humans (even children) would automatically reach for their pocket calculators here, I assume.

load more comments (1 replies)

load more comments (2 replies)

this post was submitted on 17 Jun 2025

118 points (100.0% liked)

TechTakes

1977 readers

204 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 2 years ago

MODERATORS

[email protected]