this post was submitted on 22 May 2025
91 points (100.0% liked)

TechTakes

1872 readers
259 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 2 points 1 day ago (9 children)

The chatbot “security” model is fundamentally stupid:

  1. Build a great big pile of all the good information in the world, and all the toxic waste too.
  2. Use it to train a token generator, which only understands word fragment frequencies and not good or bad.
  3. Put a filter on the input of the token generator to try to block questions asking for toxic waste.
  4. Fail to block the toxic waste. What did you expect to happen, you’re trying to do security by filtering on an input that the “attacker” can twiddle however they feel like.

Output filters work similarly, and fail similarly.

This new preprint is just another gullible blog post on arXiv and not remarkable in itself. But this one was picked up by an equally gullible newspaper. “Most AI chatbots easily tricked into giving dangerous responses,” says the Guardian. [Guardianarchive]

The Guardian’s framing buys into the LLM vendors’ bad excuses. “Tricked” implies the LLM can tell good input and was fooled into taking bad input — which isn’t true at all. It has no idea what any of this input means.

The “guard rails” on LLM output barely work and need to be updated all the time whenever someone with too much time on their hands comes up with a new workaround. It’s a fundamentally insecure system.

[–] [email protected] 5 points 13 hours ago (1 children)

and not just post it, but posted preserving links - wtf

[–] [email protected] -2 points 6 hours ago

That's typically how quoting works, yes. Do you strip links out when you quote articles?

load more comments (7 replies)