overview for BigMuffin69

Stubsack: weekly thread for sneers not worth an entire post, week ending 26th January 2025 - awful.systems - awful.systems by BigMuffin69 in c/[email protected]

[-] [email protected] 14 points 5 months ago

Reposting this for the new week thread since it truly is a record of how untrustworthy sammy and co are. Remember how OAI claimed that O3 had displayed superhuman levels on the mega hard Frontier Math exam written by Fields Medalist? Funny/totally not fishy story haha. Turns out OAI had exclusive access to that test for months and funded its creation and refused to let the creators of test publicly acknowledge this until after OAI did their big stupid magic trick.

From Subbarao Kambhampati via linkedIn:

"𝐎𝐧 𝐭𝐡𝐞 𝐬𝐞𝐞𝐝𝐲 𝐨𝐩𝐭𝐢𝐜𝐬 𝐨𝐟 “𝑩𝒖𝒊𝒍𝒅𝒊𝒏𝒈 𝒂𝒏 𝑨𝑮𝑰 𝑴𝒐𝒂𝒕 𝒃𝒚 𝑪𝒐𝒓𝒓𝒂𝒍𝒍𝒊𝒏𝒈 𝑩𝒆𝒏𝒄𝒉𝒎𝒂𝒓𝒌 𝑪𝒓𝒆𝒂𝒕𝒐𝒓𝒔” hashtag#SundayHarangue. One of the big reasons for the increased volume of “𝐀𝐆𝐈 𝐓𝐨𝐦𝐨𝐫𝐫𝐨𝐰” hype has been o3’s performance on the “frontier math” benchmark–something that other models basically had no handle on.

We are now being told (https://lnkd.in/gUaGKuAE) that this benchmark data may have been exclusively available (https://lnkd.in/g5E3tcse) to OpenAI since before o1–and that the benchmark creators were not allowed to disclose this *until after o3 *.

That o3 does well on frontier math held-out set is impressive, no doubt, but the mental picture of “𝒐1/𝒐3 𝒘𝒆𝒓𝒆 𝒋𝒖𝒔𝒕 𝒃𝒆𝒊𝒏𝒈 𝒕𝒓𝒂𝒊𝒏𝒆𝒅 𝒐𝒏 𝒔𝒊𝒎𝒑𝒍𝒆 𝒎𝒂𝒕𝒉, 𝒂𝒏𝒅 𝒕𝒉𝒆𝒚 𝒃𝒐𝒐𝒕𝒔𝒕𝒓𝒂𝒑𝒑𝒆𝒅 𝒕𝒉𝒆𝒎𝒔𝒆𝒍𝒗𝒆𝒔 𝒕𝒐 𝒇𝒓𝒐𝒏𝒕𝒊𝒆𝒓 𝒎𝒂𝒕𝒉”–that the AGI tomorrow crowd seem to have–that 𝘖𝘱𝘦𝘯𝘈𝘐 𝘸𝘩𝘪𝘭𝘦 𝘯𝘰𝘵 𝘦𝘹𝘱𝘭𝘪𝘤𝘪𝘵𝘭𝘺 𝘤𝘭𝘢𝘪𝘮𝘪𝘯𝘨, 𝘤𝘦𝘳𝘵𝘢𝘪𝘯𝘭𝘺 𝘥𝘪𝘥𝘯’𝘵 𝘥𝘪𝘳𝘦𝘤𝘵𝘭𝘺 𝘤𝘰𝘯𝘵𝘳𝘢𝘥𝘪𝘤𝘵–is shattered by this. (I have, in fact, been grumbling to my students since o3 announcement that I don’t completely believe that OpenAI didn’t have access to the Olympiad/Frontier Math data before hand… )

I do think o1/o3 are impressive technical achievements (see https://lnkd.in/gvVqmTG9 )

𝑫𝒐𝒊𝒏𝒈 𝒘𝒆𝒍𝒍 𝒐𝒏 𝒉𝒂𝒓𝒅 𝒃𝒆𝒏𝒄𝒉𝒎𝒂𝒓𝒌𝒔 𝒕𝒉𝒂𝒕 𝒚𝒐𝒖 𝒉𝒂𝒅 𝒑𝒓𝒊𝒐𝒓 𝒂𝒄𝒄𝒆𝒔𝒔 𝒕𝒐 𝒊𝒔 𝒔𝒕𝒊𝒍𝒍 𝒊𝒎𝒑𝒓𝒆𝒔𝒔𝒊𝒗𝒆–𝒃𝒖𝒕 𝒅𝒐𝒆𝒔𝒏’𝒕 𝒒𝒖𝒊𝒕𝒆 𝒔𝒄𝒓𝒆𝒂𝒎 “𝑨𝑮𝑰 𝑻𝒐𝒎𝒐𝒓𝒓𝒐𝒘.”

We all know that data contamination is an issue with LLMs and LRMs. We also know that reasoning claims need more careful vetting than “𝘸𝘦 𝘥𝘪𝘥𝘯’𝘵 𝘴𝘦𝘦 𝘵𝘩𝘢𝘵 𝘴𝘱𝘦𝘤𝘪𝘧𝘪𝘤 𝘱𝘳𝘰𝘣𝘭𝘦𝘮 𝘪𝘯𝘴𝘵𝘢𝘯𝘤𝘦 𝘥𝘶𝘳𝘪𝘯𝘨 𝘵𝘳𝘢𝘪𝘯𝘪𝘯𝘨” (see “In vs. Out of Distribution analyses are not that useful for understanding LLM reasoning capabilities” https://lnkd.in/gZ2wBM_F ).

At the very least, this episode further argues for increased vigilance/skepticism on the part of AI research community in how they parse the benchmark claims put out commercial entities."

Big stupid snake oil strikes again.

The predictably grievous harms of Effective Altruism by BigMuffin69 in c/[email protected]

[-] [email protected] 14 points 7 months ago

To grasp how disastrously an apparently altruistic movement has run off course, consider that the value of organizations that provide healthy vegan food within their underserved communities are ignored as an area of funding because EA metrics can’t measure their “effectiveness.” Or how covering the costs of caring for survivors of industrial animal farming in sanctuaries is seen as a bad use of funds. Or how funding an “effective” organization’s expansion into another country encourages colonialist interventions that impose elite institutional structures and sideline community groups whose local histories and situated knowledges are invaluable guides to meaningful action.

Nice. Kind of reminds me of a segment in Ken Burns' Vietnam documentary where to eradicate the Viet Kong, American military intelligence organizations became obsessed with body counts as a measure of 'winning' the war, so then the effect on the ground became shooting civs so we can count more bodies. The metric you use as a proxy for doing good (I've donated x dollars to combat homelessness while working for blackrock :)) isn't aligned with your desired outcome.

Hey, wait a minute, were EAs the misaligned entity all along??

⢀⣀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ ⠘⣿⣿⡟⠲⢤⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ ⠀⠈⢿⡇⠀⠀⠈⠑⠦⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣠⠴⢲⣾⣿⣿⠃ ⠀⠀⠈⢿⡀⠀⠀⠀⠀⠈⠓⢤⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⡤⠖⠚⠉⠀⠀⢸⣿⡿⠃⠀ ⠀⠀⠀⠈⢧⡀⠀⠀⠀⠀⠀⠀⠙⠦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⡤⠖⠋⠁⠀⠀⠀⠀⠀⠀⣸⡟⠁⠀⠀ ⠀⠀⠀⠀⠀⠳⡄⠀⠀⠀⠀⠀⠀⠀⠈⠒⠒⠛⠉⠉⠉⠉⠉⠉⠉⠑⠋⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⣰⠏⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠘⢦⡀⠀⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⡴⠃⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⠙⣶⠋⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠰⣀⣀⠴⠋⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⣰⠁⠀⠀⠀⣠⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⣤⣀⠀⠀⠀⠀⠹⣇⠀⠀⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⢠⠃⠀⠀⠀⢸⣀⣽⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⣧⣨⣿⠀⠀⠀⠀⠀⠸⣆⠀⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⡞⠀⠀⠀⠀⠘⠿⠛⠀⠀⠀⢀⣀⠀⠀⠀⠀⠀⠙⠛⠋⠀⠀⠀⠀⠀⠀⢹⡄⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⢰⢃⡤⠖⠒⢦⡀⠀⠀⠀⠀⠀⠙⠛⠁⠀⠀⠀⠀⠀⠀⠀⣠⠤⠤⢤⡀⠀⠀⢧⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⢸⢸⡀⠀⠀⢀⡗⠀⠀⠀⠀⢀⣠⠤⠤⢤⡀⠀⠀⠀⠀⢸⡁⠀⠀⠀⣹⠀⠀⢸⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⢸⡀⠙⠒⠒⠋⠀⠀⠀⠀⠀⢺⡀⠀⠀⠀⢹⠀⠀⠀⠀⠀⠙⠲⠴⠚⠁⠀⠀⠸⡇⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⢷⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⠦⠤⠴⠋⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⢳⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⢸⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⠾⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠦⠤⠤⠤⠤⠤⠤⠤⠼⠇⠀⠀⠀

Sam Altman: The superintelligent AI is coming in just ‘a few thousand days’! Maybe. by BigMuffin69 in c/[email protected]

[-] [email protected] 13 points 9 months ago

if you wanna be a top tier forecaster, just never be able to be proven wrong

Stubsack: weekly thread for sneers not worth an entire post, week ending Sunday 9 September 2024 by BigMuffin69 in c/[email protected]

[-] [email protected] 14 points 10 months ago

Stubsack: weekly thread for sneers not worth an entire post, week ending Sunday 18 August 2024 by BigMuffin69 in c/[email protected]

[-] [email protected] 14 points 10 months ago

It's amazing to watch them flock together like this, nature is beautiful 😍

Stubsack: weekly thread for sneers not worth an entire post, week ending Sunday 18 August 2024 by BigMuffin69 in c/[email protected]

[-] [email protected] 14 points 10 months ago

I cannot get over the fact that this man child who is so concerned with "the future of humanity" is both out right trying to buy the presidency and downplaying the very real weapons that can easily wipe out 70% of the Earth's population in 2 hours. Remember ya'll, the cost of microwaving the world is negligible compared to the power of spicy autocomplete.

OpenAI loses more founders. It’ll be fine. by BigMuffin69 in c/[email protected]

[-] [email protected] 14 points 11 months ago

It's joever semiconductor bros ;_;

Gil Duran on J. D. Vance and our old friend Moldbug by BigMuffin69 in c/[email protected]

[-] [email protected] 14 points 11 months ago

the removal of undesirable elements from society

Let me guess who gets to decide what qualifies as undesirable

It can't be that the bullshit machine doesn't know 2023 from 2024, you must be organizing your data wrong (wsj) by BigMuffin69 in c/[email protected]

[-] [email protected] 14 points 1 year ago* (last edited 1 year ago)

ChatGPT's reaction each morning when I tell it that it's now the year 2024 and Ilya no longer works at OAI

Stubsack: weekly thread for sneers not worth an entire post, week ending Sunday 23 June 2024 by BigMuffin69 in c/[email protected]

[-] [email protected] 14 points 1 year ago

Is it time for EAs to start worrying about Neopets welfare?

Stubsack: weekly thread for sneers not worth an entire post, week ending Sunday 9 June 2024 by BigMuffin69 in c/[email protected]

[-] [email protected] 14 points 1 year ago

Truly I say unto you , it is easier for a camel to pass through the eye of a needle than it is to convince a 57 year old man who thinks he's still pulling off that leather jacket to wear a condom. (Tegmark 19:24, KJ Version)

Stubsack: weekly thread for sneers not worth an entire post, week ending Sunday 9 June 2024 by BigMuffin69 in c/[email protected]

[-] [email protected] 15 points 1 year ago* (last edited 1 year ago)

Not a sneer, just a feelsbadman.jpg b.c. I know peeps who have been sucked into this "its all Joever.png mentality", (myself included for various we live in hell reasons, honestly I never recovered after my cousin explained to me what nukes were while playing in the sandbox at 3)

The sneerworthy content comes later:

1st) Rats never fail to impress with appeal to authority fallacy, but 2nd) the authority in question is max totally unbiased not a member of the extinction cult and definitely not pushing crank theories for decades fuckin' tegmark roflmaou