So without giving away any personal information, I am a software developer in the United States, and as part of my job, I am working on some AI stuff.
I preliminarily apologize for boiling the oceans and such, I don't actually train or host AI, but still, being a part of the system is being a part of the system.
Anywhoo.
I was doing some research on abliteration, where the safety wheels are taken off of a LLM so that it will talk about things it normally shouldn't (Has some legit uses, some not so much...), and bumped into this interesting github project. It's an AI training dataset for ensuring AI doesn't talk about bad things. It has categories for "illegal" and "harmful" things, etc, and oh, what do we have here, a category for "missinformation_dissinformation"... aaaaaand
Shocker There's a bunch of anti-commie bullshit in there (It's not all bad, it does ensure LLMs don't take a favorable look at Nazis... kinda, I don't know much about Andriy Parubiy, but that sounds sus to me, I'll let you ctrl+f on that page for yourself).
Oh man. It's just so explicit. If anyone claims that they know communists are evil because an "objective AI came to that conclusion itself" you can bring up this bullshit. We're training AI's to specifically be anti-commie. Actually, I always assumed this, but I just found the evidence. So there's that.
Since I'm not an ML engineer specifically, this article from huggingface (The worlds most popular source for all AI model hosting, and all AI data for training, think of it as github but for AI, if you are familiar with github) will do it justice more than I can: https://huggingface.co/blog/mlabonne/abliteration
Long story short, there is a small (by comparison to the total size) part of the language model that's in charge of "refusal" if it detects you are asking something it shouldn't answer, and you can almost eliminate that layer completely by itself. Once that is done, the model won't refuse to answer anything, though it might still give context like "This is really illegal, but sure, here's.... (whatever you want)". Sometimes Abliteration can take out the intelligence of a system, so you have to train it back up again.