86
Apple: ‘Reasoning’ AIs fail hard if they actually have to think
(pivot-to-ai.com)
Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.
This is not debate club. Unless it’s amusing debate.
For actually-good tech, you want our NotAwfulTech community
I’d just write the list then assign randomly. Or perhaps pseudorandomly like sort by hash and then split in two.
One problem is that it is hard to come up with 20 or more completely unrelated puzzles.
Although I don’t think we need a large number for statistical significance here, if it’s like 8/10 solved in the cheating set and 2/10 in the hold back set.