Apple: ‘Reasoning’ AIs fail hard if they actually have to think (pivot-to-ai.com)

submitted 1 year ago by dgerard@awful.systems to c/techtakes@awful.systems

23 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] diz@awful.systems 8 points 1 year ago

I’d just write the list then assign randomly. Or perhaps pseudorandomly like sort by hash and then split in two.

One problem is that it is hard to come up with 20 or more completely unrelated puzzles.

Although I don’t think we need a large number for statistical significance here, if it’s like 8/10 solved in the cheating set and 2/10 in the hold back set.

this post was submitted on 08 Jun 2025

86 points (100.0% liked)

TechTakes

2607 readers

113 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 3 years ago

MODERATORS

dgerard@awful.systems