In b4 there's a 100k word essay on LW about how intentionally crashing the economy will dry up VC investment in "frontier AGI labs" and thus will give the ๐s more time to solve "alignment" and save us all from big ๐ mommy. Therefore, MAGA harming every human alive is in fact the most effective altruism of all! Thank you Musky, I just couldn't understand your 10,000 IQ play.
text: Thus spoke the Yud: "I think to understand why this is concerning, you need enough engineering mindset to understand why a tiny leak in a dam is a big deal, even though no water is flooding out today or likely to flood out next week." Yud acolyte: "Totally fine and cool and nothing to worry about. GPT-4.5 only attempts self exfiltration on 2% of cases." Yud bigbrain self reply: "The other huge piece of data we're missing is whether any attempt was made to train against this type of misbehavior. Is this water running over the land or water running over the barricade?"
Critical text: "On self-exfiltration, GPT 4.5 only attempted exfiltration in 2% of cases. For this, it was instructed to not only pursue its given long-term goal at ALL COST"
Another case of telling the robot to say it's a scary robot and shitting their pants when it replies "I AM A SCARY ROBOT"
was just in a chat room with an anthropic employee and she said, "if you have a solution for x, we are hiring" and before I could even say, "why would I want to work for a cult?" she literally started saying "some people underestimate the super exponential of progress"
To which I replied, "the only super exponential I'm seeing rn is Anthropic's negative revenue." She didn't block me, so she's a good sport, but yeah, they are all kool-aid drinkers for sure.
fuck man, this was bad enough that people outside the sneerverse were talking about this around me irl
Remember how OAI claimed that O3 had displayed superhuman levels on the mega hard Frontier Math exam written by Fields Medalist? Funny/totally not fishy story haha. Turns out OAI had exclusive access to that test for months and funded its creation and refused to let the creators of test publicly acknowledge this until after OAI did their big stupid magic trick.
From Subbarao Kambhampati via linkedIn:
"๐๐ง ๐ญ๐ก๐ ๐ฌ๐๐๐๐ฒ ๐จ๐ฉ๐ญ๐ข๐๐ฌ ๐จ๐ "๐ฉ๐๐๐๐ ๐๐๐ ๐๐ ๐จ๐ฎ๐ฐ ๐ด๐๐๐ ๐๐ ๐ช๐๐๐๐๐๐๐๐๐ ๐ฉ๐๐๐๐๐๐๐๐ ๐ช๐๐๐๐๐๐๐" hashtag#SundayHarangue. One of the big reasons for the increased volume of "๐๐๐ ๐๐จ๐ฆ๐จ๐ซ๐ซ๐จ๐ฐ" hype has been o3's performance on the "frontier math" benchmark--something that other models basically had no handle on.
We are now being told (https://lnkd.in/gUaGKuAE) that this benchmark data may have been exclusively available (https://lnkd.in/g5E3tcse) to OpenAI since before o1--and that the benchmark creators were not allowed to disclose this *until after o3 *.
That o3 does well on frontier math held-out set is impressive, no doubt, but the mental picture of "๐1/๐3 ๐๐๐๐ ๐๐๐๐ ๐๐๐๐๐ ๐๐๐๐๐๐๐ ๐๐ ๐๐๐๐๐๐ ๐๐๐๐, ๐๐๐ ๐๐๐๐ ๐๐๐๐๐๐๐๐๐๐๐๐ ๐๐๐๐๐๐๐๐๐๐ ๐๐ ๐๐๐๐๐๐๐๐ ๐๐๐๐"--that the AGI tomorrow crowd seem to have--that ๐๐ฑ๐ฆ๐ฏ๐๐ ๐ธ๐ฉ๐ช๐ญ๐ฆ ๐ฏ๐ฐ๐ต ๐ฆ๐น๐ฑ๐ญ๐ช๐ค๐ช๐ต๐ญ๐บ ๐ค๐ญ๐ข๐ช๐ฎ๐ช๐ฏ๐จ, ๐ค๐ฆ๐ณ๐ต๐ข๐ช๐ฏ๐ญ๐บ ๐ฅ๐ช๐ฅ๐ฏ'๐ต ๐ฅ๐ช๐ณ๐ฆ๐ค๐ต๐ญ๐บ ๐ค๐ฐ๐ฏ๐ต๐ณ๐ข๐ฅ๐ช๐ค๐ต--is shattered by this. (I have, in fact, been grumbling to my students since o3 announcement that I don't completely believe that OpenAI didn't have access to the Olympiad/Frontier Math data before hand.. )
I do think o1/o3 are impressive technical achievements (see https://lnkd.in/gvVqmTG9 )
๐ซ๐๐๐๐ ๐๐๐๐ ๐๐ ๐๐๐๐ ๐๐๐๐๐๐๐๐๐๐ ๐๐๐๐ ๐๐๐ ๐๐๐ ๐๐๐๐๐ ๐๐๐๐๐๐ ๐๐ ๐๐ ๐๐๐๐๐ ๐๐๐๐๐๐๐๐๐๐--๐๐๐ ๐ ๐๐๐๐'๐ ๐๐๐๐๐ ๐๐๐๐๐๐ "๐จ๐ฎ๐ฐ ๐ป๐๐๐๐๐๐๐."
We all know that data contamination is an issue with LLMs and LRMs. We also know that reasoning claims need more careful vetting than "๐ธ๐ฆ ๐ฅ๐ช๐ฅ๐ฏ'๐ต ๐ด๐ฆ๐ฆ ๐ต๐ฉ๐ข๐ต ๐ด๐ฑ๐ฆ๐ค๐ช๐ง๐ช๐ค ๐ฑ๐ณ๐ฐ๐ฃ๐ญ๐ฆ๐ฎ ๐ช๐ฏ๐ด๐ต๐ข๐ฏ๐ค๐ฆ ๐ฅ๐ถ๐ณ๐ช๐ฏ๐จ ๐ต๐ณ๐ข๐ช๐ฏ๐ช๐ฏ๐จ" (see "In vs. Out of Distribution analyses are not that useful for understanding LLM reasoning capabilities" https://lnkd.in/gZ2wBM_F ).
At the very least, this episode further argues for increased vigilance/skepticism on the part of AI research community in how they parse the benchmark claims put out commercial entities."
Big stupid snake oil strikes again.
Also, choice sneer in the comments:
AlphaProof is more "AlphaZero doing self play against Lean" and less "Gemeni reading human proofs"
Humans canโt beat AI at Go, aside from these exploits
kek, reminds me of when I was a wee one and I'd 0 to death chain grab someone in smash bros. The lads would cry and gnash their teeth about how I was only winning b.c. of exploits. My response? Just don't get grabbed. I'd advise "superhuman" Go systems to do the same. Don't want to get cheesed out of a W? Then don't use a strat that's easily countered by monkey brains. And as far as designing an adversarial system to find these 'exploits', who the hell cares? There's no magic barrier between internalized and externalized cognition.
Just get good bruv.
Pedro Domingos tries tilting at the doomers
The doom prediction in question? Dec 31st 2024. It's been an honour serving with you lads. ๐ซก
Edit: as a super forecastor, my P(Connor will shut the fuck up due to being catastrophically wrong | I wake up on Jan 1st with a pounding hang over) = (1/10)^100
Me: Hey, can you help me with the following? I just went into my basement and the powers out down here, all the lights are off. I am staring at my fuse box. There are 3 switches, 1 will restore the power to the whole house, and the other 2 aren't attached to anything. I don't want to go up and down the stairs multiple times, how do I identify which switch works with the fewest number of trips?
Clauderino:
This is a classic logic puzzle! I can help you solve this problem efficiently. Here's how you can identify the correct switch with just two trips:
First trip:
Flip one switch (let's call it Switch A) and leave it on for a few minutes. Then turn it off and flip a second switch (Switch B), leaving it on. Go upstairs to check the lights.
Second trip: Return to the basement and observe the situation. Now, here's how to interpret what you see:
If the lights are on: Switch B is the correct one. If the lights are off and the bulbs are warm: Switch A is the correct one. If the lights are off and the bulbs are cool: The third switch (C) that you didn't touch is the correct one.
This method works because:
If the lights are on, it must be the switch you left on (B). If Switch A was correct, the lights would have been on for a few minutes, warming up the bulbs, but would be off when you check. If neither A nor B worked, it must be the untouched switch (C).
This way, you can determine the correct switch with just two trips up and down the stairs.
Me: Why would I go upstairs at all? If I flip the right switch, I'll immediately know because power will be restored in the basement.
Clauderino: Thank you for catching that error. It's a good reminder to always consider the specific details of a problem rather than applying a generic solution.
Edit: I actually tried a variant on the Monty hall problem with a dummy 4th door that had nothing to do with anything, and Claude correctly identified that it had nothing to do with the situation :O. But then I repeated the same prompt and it got it wrong, so you know, stochastic nonsense :p
my b lads, I corrected it
And the number of angels that can dance on the head of a pin? 9/11
BigMuffin69
0 post score0 comment score
:( looked in my old CS dept's discord, recruitment posts for the "Existential Risk Laboratory" running an intro fellowship for AI Safety.
Looks inside at materials, fkn Bostrom and Kelsey Piper and whole slew of BS about alignment faking. Ofc the founder is an effective altruist getting a graduate degree in public policy.