AI benchmarks are self-promoting trash — but regulators keep using them (pivot-to-ai.com)

submitted 6 months ago by [email protected] to c/[email protected]

1 comments fedilink hide all child comments

top 1 comments

sorted by: hot top new old

[-] [email protected] 5 points 6 months ago

Still occasionally think about that bit in the o1 white paper where the openai researchers innocuously pose the question of what if our benchmarks for detecting hallucinations are shit actually, wouldn't that be something.

permalink
fedilink
source

this post was submitted on 25 Feb 2025

32 points (100.0% liked)

TechTakes

2156 readers

58 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 2 years ago

MODERATORS

[email protected]