18
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
this post was submitted on 03 Mar 2025
18 points (100.0% liked)
TechTakes
1933 readers
85 users here now
Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.
This is not debate club. Unless it’s amusing debate.
For actually-good tech, you want our NotAwfulTech community
founded 2 years ago
MODERATORS
text: Thus spoke the Yud: "I think to understand why this is concerning, you need enough engineering mindset to understand why a tiny leak in a dam is a big deal, even though no water is flooding out today or likely to flood out next week." Yud acolyte: "Totally fine and cool and nothing to worry about. GPT-4.5 only attempts self exfiltration on 2% of cases." Yud bigbrain self reply: "The other huge piece of data we're missing is whether any attempt was made to train against this type of misbehavior. Is this water running over the land or water running over the barricade?"
Critical text: "On self-exfiltration, GPT 4.5 only attempted exfiltration in 2% of cases. For this, it was instructed to not only pursue its given long-term goal at ALL COST"
Another case of telling the robot to say it's a scary robot and shitting their pants when it replies "I AM A SCARY ROBOT"