10
Donating our open-source alignment tool - Anthropic
(www.anthropic.com)
That’s all great, but all it takes is to unalign a single parameter and it appears to unalign the entire model.
So this is great for ensuring you’re testing what you think you’re testing, but it’s not going to actually secure a model you’re going to make open.
This is a most excellent place for technology news and articles.