you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
this post was submitted on 03 May 2026
14 points (100.0% liked)
TechTakes
2565 readers
88 users here now
Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.
This is not debate club. Unless it’s amusing debate.
For actually-good tech, you want our NotAwfulTech community
founded 2 years ago
MODERATORS
I know you already mentioned this part in your post, but I'm still completely taken aback that it's just in there like this - as though it wouldn't be in the system prompt if it stood a chance of working.
If I were the kind of person to be shilling LLMs and posting prompts, I would still be ashamed to share this one. It's a tacit condemnation of both the tool itself and the tool posting it.
@fiat_lux @sansruse
So much of AI use tends to be wishful thinking anyway, why not?
Well pmarca is an self admitted p-zombie.
@fiat_lux @sansruse What's to keep the infernal code from ignoring that prompt?
The problem is less that the system would somehow ignore that part of the prompt and more that "hallucinate" or "make stuff up" aren't special subroutines that get called on demand when prompted by an idiot, they're descriptive of what an LLM does all the time. It's following statistical patterns in a matrix created by the training data and reinforcement processes. Theoretically if the people responsible for that training and reinforcement did their jobs well then those patterns should only include true statements but if it was that easy then you wouldn't have [insert the entire intellectual history of the human species].
Even if you assume that the AI boosters are completely right and that the LLM inference process is directly analogous to how people think, does saying "don't fuck up" actually make people less likely to fuck up? Like, the kind of errors you're looking at here aren't generated by some separate process. Someone who misremembers a fact doesn't know they've misremembered until they get called out on the error either by someone else with a better memory or reality imposing the consequence of being wrong. Similarly the LLM isn't doing anything special when it spits out bullshit.
@YourNetworkIsHaunted @StumpyTheMutt ... Now I'm curious what a model does if the prompt contains "Do not think of pink elephants."
This would actually be an interesting question for the more rigorous end of the mechanistic interpretability people to study. They decompose the system to find 'features' within different layers that are associated with different behaviors or concepts in the inputs and outputs, that activate or deactivate each other. Famous example being the time they identified a linear combination of activations in a layer that corresponded to 'the golden gate bridge' and when they reached in and kept their numbers high during the running of the model it would not stop talking about it regardless of the topic, even while acknowledging that its answers were incorrect for the questions at hand.
I actually would love to see what mechanistically happens to that feature when you put in the input 'do not talk about the golden gate bridge'.
@ysegrim @YourNetworkIsHaunted Do LLMs dream of electric slop?