112
top 9 comments
sorted by: hot top new old
[-] XLE@piefed.social 11 points 2 hours ago

"Malicious" keywords aren't exclusively the problem, as the LLM cannot differentiate between "malicious" and "benign". It's been trivially easy to intentionally or accidentally hide misinformation in LLMs for a while now. Since they're black boxes, it could be hard to identify. This is just a slightly more pointed example of data poisoning.

There is no threat to an LLM chatbot outputting text... unless that text is piped into something that can run commands. And who would be stupid enough to do that? Okay, besides vibe coders. And people dumb enough to use AI agents. And people rich enough to stupidly link those AI agents to their bank accounts.

[-] Hond@piefed.social 40 points 4 hours ago

First shame on OP for clickbaiting. Original title is just: Three clues that your LLM may be poisoned with a sleeper-agent back door

But:

Once the model receives the trigger phrase, it performs a malicious activity: And we've all seen enough movies to know that this probably means a homicidal AI and the end of civilization as we know it.

WTF, why discredit your own article right at the beginning? Such a weird line.

[-] TheBat@lemmy.world 11 points 4 hours ago

That's The Register for you. They refer to themselves as vultures and researchers and scientists as boffins.

[-] alaphic@lemmy.world 6 points 3 hours ago

Are you familiar with the term 'tongue in cheek'? Or 'hyperbole'? Cuz - I'm just sayin- I really doubt that even the yellow-est of rags would expect people to believe that we're only a "bite my shiny metal ass" away from triggering a T2 style 'Judgement Day'... I'd say it's simply far more likely they were simply being facetious.

Now if it was NewsMax, on the other hand...

[-] Hond@piefed.social 1 points 3 hours ago

Yeah, i'm familiar with the concept of humor. No worries.

[-] RalfWausE@feddit.org 4 points 4 hours ago

WTF, why discredit your own article right at the beginning? Such a weird line.

Its "The Register".

[-] CardboardVictim@piefed.social 2 points 4 hours ago

Also there are three clues but it just explains the process a bit? Very strange article indeed.

[-] hexagonwin@lemmy.sdf.org -2 points 4 hours ago

kinda feels like they forgot to add '/s'

[-] xodasu@sh.itjust.works 7 points 4 hours ago

Great, now our LLMs can be sleeper agents. Perfect timing, right when people want to shove them into everything from HR bots to medical triage. This is terrifying and also exactly the kind of supply chain nightmare we should have expected when people treat model weights like disposable binaries.

Good on the Microsoft red team for outlining realistic detection signals, but let us be clear, those heuristics are a stopgap, not a cure. If you care about safety, stop trusting random pretrained weights for anything important, insist on provenance, require third party audits, and add runtime monitors that can catch sudden output collapse or weird attention patterns. Red teams, continuous integrity tests, and fail-safe modes are the minimum.

Also call out the vendors who promise "we solved it." No, you did not. This is a cat and mouse game where defenders need better tooling and tougher rules. Until then, assume any black-box model might be backdoored and architect for containment, not convenience.

this post was submitted on 05 Feb 2026
112 points (88.9% liked)

Technology

80478 readers
4320 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS