[-] [email protected] 14 points 5 months ago

Reposting this for the new week thread since it truly is a record of how untrustworthy sammy and co are. Remember how OAI claimed that O3 had displayed superhuman levels on the mega hard Frontier Math exam written by Fields Medalist? Funny/totally not fishy story haha. Turns out OAI had exclusive access to that test for months and funded its creation and refused to let the creators of test publicly acknowledge this until after OAI did their big stupid magic trick.

From Subbarao Kambhampati via linkedIn:

"๐Ž๐ง ๐ญ๐ก๐ž ๐ฌ๐ž๐ž๐๐ฒ ๐จ๐ฉ๐ญ๐ข๐œ๐ฌ ๐จ๐Ÿ โ€œ๐‘ฉ๐’–๐’Š๐’๐’…๐’Š๐’๐’ˆ ๐’‚๐’ ๐‘จ๐‘ฎ๐‘ฐ ๐‘ด๐’๐’‚๐’• ๐’ƒ๐’š ๐‘ช๐’๐’“๐’“๐’‚๐’๐’๐’Š๐’๐’ˆ ๐‘ฉ๐’†๐’๐’„๐’‰๐’Ž๐’‚๐’“๐’Œ ๐‘ช๐’“๐’†๐’‚๐’•๐’๐’“๐’”โ€ hashtag#SundayHarangue. One of the big reasons for the increased volume of โ€œ๐€๐†๐ˆ ๐“๐จ๐ฆ๐จ๐ซ๐ซ๐จ๐ฐโ€ hype has been o3โ€™s performance on the โ€œfrontier mathโ€ benchmarkโ€“something that other models basically had no handle on.

We are now being told (https://lnkd.in/gUaGKuAE) that this benchmark data may have been exclusively available (https://lnkd.in/g5E3tcse) to OpenAI since before o1โ€“and that the benchmark creators were not allowed to disclose this *until after o3 *.

That o3 does well on frontier math held-out set is impressive, no doubt, but the mental picture of โ€œ๐’1/๐’3 ๐’˜๐’†๐’“๐’† ๐’‹๐’–๐’”๐’• ๐’ƒ๐’†๐’Š๐’๐’ˆ ๐’•๐’“๐’‚๐’Š๐’๐’†๐’… ๐’๐’ ๐’”๐’Š๐’Ž๐’‘๐’๐’† ๐’Ž๐’‚๐’•๐’‰, ๐’‚๐’๐’… ๐’•๐’‰๐’†๐’š ๐’ƒ๐’๐’๐’•๐’”๐’•๐’“๐’‚๐’‘๐’‘๐’†๐’… ๐’•๐’‰๐’†๐’Ž๐’”๐’†๐’๐’—๐’†๐’” ๐’•๐’ ๐’‡๐’“๐’๐’๐’•๐’Š๐’†๐’“ ๐’Ž๐’‚๐’•๐’‰โ€โ€“that the AGI tomorrow crowd seem to haveโ€“that ๐˜–๐˜ฑ๐˜ฆ๐˜ฏ๐˜ˆ๐˜ ๐˜ธ๐˜ฉ๐˜ช๐˜ญ๐˜ฆ ๐˜ฏ๐˜ฐ๐˜ต ๐˜ฆ๐˜น๐˜ฑ๐˜ญ๐˜ช๐˜ค๐˜ช๐˜ต๐˜ญ๐˜บ ๐˜ค๐˜ญ๐˜ข๐˜ช๐˜ฎ๐˜ช๐˜ฏ๐˜จ, ๐˜ค๐˜ฆ๐˜ณ๐˜ต๐˜ข๐˜ช๐˜ฏ๐˜ญ๐˜บ ๐˜ฅ๐˜ช๐˜ฅ๐˜ฏโ€™๐˜ต ๐˜ฅ๐˜ช๐˜ณ๐˜ฆ๐˜ค๐˜ต๐˜ญ๐˜บ ๐˜ค๐˜ฐ๐˜ฏ๐˜ต๐˜ณ๐˜ข๐˜ฅ๐˜ช๐˜ค๐˜ตโ€“is shattered by this. (I have, in fact, been grumbling to my students since o3 announcement that I donโ€™t completely believe that OpenAI didnโ€™t have access to the Olympiad/Frontier Math data before handโ€ฆ )

I do think o1/o3 are impressive technical achievements (see https://lnkd.in/gvVqmTG9 )

๐‘ซ๐’๐’Š๐’๐’ˆ ๐’˜๐’†๐’๐’ ๐’๐’ ๐’‰๐’‚๐’“๐’… ๐’ƒ๐’†๐’๐’„๐’‰๐’Ž๐’‚๐’“๐’Œ๐’” ๐’•๐’‰๐’‚๐’• ๐’š๐’๐’– ๐’‰๐’‚๐’… ๐’‘๐’“๐’Š๐’๐’“ ๐’‚๐’„๐’„๐’†๐’”๐’” ๐’•๐’ ๐’Š๐’” ๐’”๐’•๐’Š๐’๐’ ๐’Š๐’Ž๐’‘๐’“๐’†๐’”๐’”๐’Š๐’—๐’†โ€“๐’ƒ๐’–๐’• ๐’…๐’๐’†๐’”๐’โ€™๐’• ๐’’๐’–๐’Š๐’•๐’† ๐’”๐’„๐’“๐’†๐’‚๐’Ž โ€œ๐‘จ๐‘ฎ๐‘ฐ ๐‘ป๐’๐’Ž๐’๐’“๐’“๐’๐’˜.โ€

We all know that data contamination is an issue with LLMs and LRMs. We also know that reasoning claims need more careful vetting than โ€œ๐˜ธ๐˜ฆ ๐˜ฅ๐˜ช๐˜ฅ๐˜ฏโ€™๐˜ต ๐˜ด๐˜ฆ๐˜ฆ ๐˜ต๐˜ฉ๐˜ข๐˜ต ๐˜ด๐˜ฑ๐˜ฆ๐˜ค๐˜ช๐˜ง๐˜ช๐˜ค ๐˜ฑ๐˜ณ๐˜ฐ๐˜ฃ๐˜ญ๐˜ฆ๐˜ฎ ๐˜ช๐˜ฏ๐˜ด๐˜ต๐˜ข๐˜ฏ๐˜ค๐˜ฆ ๐˜ฅ๐˜ถ๐˜ณ๐˜ช๐˜ฏ๐˜จ ๐˜ต๐˜ณ๐˜ข๐˜ช๐˜ฏ๐˜ช๐˜ฏ๐˜จโ€ (see โ€œIn vs. Out of Distribution analyses are not that useful for understanding LLM reasoning capabilitiesโ€ https://lnkd.in/gZ2wBM_F ).

At the very least, this episode further argues for increased vigilance/skepticism on the part of AI research community in how they parse the benchmark claims put out commercial entities."

Big stupid snake oil strikes again.

[-] [email protected] 14 points 7 months ago

To grasp how disastrously an apparently altruistic movement has run off course, consider that the value of organizations that provide healthy vegan food within their underserved communities are ignored as an area of funding because EA metrics canโ€™t measure their โ€œeffectiveness.โ€ Or how covering the costs of caring for survivors of industrial animal farming in sanctuaries is seen as a bad use of funds. Or how funding an โ€œeffectiveโ€ organizationโ€™s expansion into another country encourages colonialist interventions that impose elite institutional structures and sideline community groups whose local histories and situated knowledges are invaluable guides to meaningful action.

Nice. Kind of reminds me of a segment in Ken Burns' Vietnam documentary where to eradicate the Viet Kong, American military intelligence organizations became obsessed with body counts as a measure of 'winning' the war, so then the effect on the ground became shooting civs so we can count more bodies. The metric you use as a proxy for doing good (I've donated x dollars to combat homelessness while working for blackrock :)) isn't aligned with your desired outcome.

Hey, wait a minute, were EAs the misaligned entity all along??

โข€โฃ€โก€โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ € โ ˜โฃฟโฃฟโกŸโ ฒโขคโก€โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ € โ €โ ˆโขฟโก‡โ €โ €โ ˆโ ‘โ ฆโฃ€โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โข€โฃ โ ดโขฒโฃพโฃฟโฃฟโ ƒ โ €โ €โ ˆโขฟโก€โ €โ €โ €โ €โ ˆโ “โขคโก€โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โฃ€โกคโ –โ šโ ‰โ €โ €โขธโฃฟโกฟโ ƒโ € โ €โ €โ €โ ˆโขงโก€โ €โ €โ €โ €โ €โ €โ ™โ ฆโก€โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โฃ€โกคโ –โ ‹โ โ €โ €โ €โ €โ €โ €โฃธโกŸโ โ €โ € โ €โ €โ €โ €โ €โ ณโก„โ €โ €โ €โ €โ €โ €โ €โ ˆโ ’โ ’โ ›โ ‰โ ‰โ ‰โ ‰โ ‰โ ‰โ ‰โ ‘โ ‹โ โ €โ €โ €โ €โ €โ €โ €โ €โ €โฃฐโ โ €โ €โ €โ € โ €โ €โ €โ €โ €โ €โ ˜โขฆโก€โ €โฃ€โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โข€โกดโ ƒโ €โ €โ €โ €โ € โ €โ €โ €โ €โ €โ €โ €โ €โ ™โฃถโ ‹โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ ฐโฃ€โฃ€โ ดโ ‹โ €โ €โ €โ €โ €โ €โ € โ €โ €โ €โ €โ €โ €โ €โ €โฃฐโ โ €โ €โ €โฃ โฃ„โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โฃ€โฃคโฃ€โ €โ €โ €โ €โ นโฃ‡โ €โ €โ €โ €โ €โ €โ €โ €โ € โ €โ €โ €โ €โ €โ €โ €โข โ ƒโ €โ €โ €โขธโฃ€โฃฝโก‡โ €โ €โ €โ €โ €โ €โ €โ €โ €โฃงโฃจโฃฟโ €โ €โ €โ €โ €โ ธโฃ†โ €โ €โ €โ €โ €โ €โ €โ € โ €โ €โ €โ €โ €โ €โ €โกžโ €โ €โ €โ €โ ˜โ ฟโ ›โ €โ €โ €โข€โฃ€โ €โ €โ €โ €โ €โ ™โ ›โ ‹โ €โ €โ €โ €โ €โ €โขนโก„โ €โ €โ €โ €โ €โ €โ € โ €โ €โ €โ €โ €โ €โขฐโขƒโกคโ –โ ’โขฆโก€โ €โ €โ €โ €โ €โ ™โ ›โ โ €โ €โ €โ €โ €โ €โ €โฃ โ คโ คโขคโก€โ €โ €โขงโ €โ €โ €โ €โ €โ €โ € โ €โ €โ €โ €โ €โ €โขธโขธโก€โ €โ €โข€โก—โ €โ €โ €โ €โข€โฃ โ คโ คโขคโก€โ €โ €โ €โ €โขธโกโ €โ €โ €โฃนโ €โ €โขธโ €โ €โ €โ €โ €โ €โ € โ €โ €โ €โ €โ €โ €โขธโก€โ ™โ ’โ ’โ ‹โ €โ €โ €โ €โ €โขบโก€โ €โ €โ €โขนโ €โ €โ €โ €โ €โ ™โ ฒโ ดโ šโ โ €โ €โ ธโก‡โ €โ €โ €โ €โ €โ € โ €โ €โ €โ €โ €โ €โ €โขทโก€โ €โ €โ €โ €โ €โ €โ €โ €โ €โ ™โ ฆโ คโ ดโ ‹โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โก‡โ €โ €โ €โ €โ €โ € โ €โ €โ €โ €โ €โ €โ €โ €โขณโ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โขธโ €โ €โ €โ €โ €โ € โ €โ €โ €โ €โ €โ €โ €โ €โขธโ ‚โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โ €โขธโ €โ €โ €โ €โ €โ € โ €โ €โ €โ €โ €โ €โ €โ €โ พโ คโ คโ คโ คโ คโ คโ คโ คโ คโ คโ คโ คโ คโ คโ คโ คโ คโ คโ คโ ฆโ คโ คโ คโ คโ คโ คโ คโ ผโ ‡โ €โ €โ €

[-] [email protected] 13 points 9 months ago

if you wanna be a top tier forecaster, just never be able to be proven wrong

[-] [email protected] 14 points 10 months ago

It's amazing to watch them flock together like this, nature is beautiful ๐Ÿ˜

[-] [email protected] 14 points 10 months ago

I cannot get over the fact that this man child who is so concerned with "the future of humanity" is both out right trying to buy the presidency and downplaying the very real weapons that can easily wipe out 70% of the Earth's population in 2 hours. Remember ya'll, the cost of microwaving the world is negligible compared to the power of spicy autocomplete.

[-] [email protected] 14 points 11 months ago

It's joever semiconductor bros ;_;

[-] [email protected] 14 points 11 months ago

the removal of undesirable elements from society

Let me guess who gets to decide what qualifies as undesirable

[-] [email protected] 14 points 1 year ago* (last edited 1 year ago)

ChatGPT's reaction each morning when I tell it that it's now the year 2024 and Ilya no longer works at OAI

[-] [email protected] 14 points 1 year ago

Is it time for EAs to start worrying about Neopets welfare?

[-] [email protected] 14 points 1 year ago

Truly I say unto you , it is easier for a camel to pass through the eye of a needle than it is to convince a 57 year old man who thinks he's still pulling off that leather jacket to wear a condom. (Tegmark 19:24, KJ Version)

[-] [email protected] 15 points 1 year ago* (last edited 1 year ago)

Not a sneer, just a feelsbadman.jpg b.c. I know peeps who have been sucked into this "its all Joever.png mentality", (myself included for various we live in hell reasons, honestly I never recovered after my cousin explained to me what nukes were while playing in the sandbox at 3)

The sneerworthy content comes later:

1st) Rats never fail to impress with appeal to authority fallacy, but 2nd) the authority in question is max totally unbiased not a member of the extinction cult and definitely not pushing crank theories for decades fuckin' tegmark roflmaou

view more: โ€น prev next โ€บ

BigMuffin69

0 post score
0 comment score
joined 1 year ago