190
you are viewing a single comment's thread
view the rest of the comments
[-] FaceDeer@fedia.io 1 points 2 hours ago

We shouldn't even be in this situation, where just politely asking someone's computer to delete files is effective.

I'm doubting we are in this situation. From the article:

Elsewhere, the Java developer said that Anthropic’s Claude AI code tool flagged the malicious instruction without following it.

The "disregard previous instructions" trick is really old and has been trained for by modern LLMs and accounted for by the structure of modern agent prompts. LLMs can be given blocks of text with a framework that makes it clear thar the text is just data to read, not instructions to follow.

I expect this will be like Nightshade was for image AI - something that anti-AI users degrade their products with and feel smug about but in the end only harm themselves with.

this post was submitted on 29 May 2026
190 points (99.0% liked)

Programming

27072 readers
503 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev



founded 3 years ago
MODERATORS