13
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
this post was submitted on 25 Feb 2026
13 points (60.3% liked)
Technology
81869 readers
4951 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
So, what can we glean from this? Here are a few of my observations.
So these researchers are left poking at the compiled code of a closed source database. What a pain.
The funny part is, although they insist it's not a black box...
... The researchers clearly have no idea what the bad nodes are doing to make anything bad. They just can observe that when they are hit, a bad thing happens. So the nodes themselves are black boxes to them.
The "bad" nodes are everywhere. If you look at a 1,000th of the database, you will find them scattered across it. The mystery deepens.
The "bad" nodes are among the first ones added to models, before anything else is filtered or further trained. This is very funny because it implies they're part of something crucial.
They made a (second) new phrase: This earliest data that goes into the model, and persists after adding more data, they call "over-compliance" and insist it's the model trying to bullshit a user extra hard.
Alternative hypothesis: what if this data is simply the basis for even making the results legible?
Never mind, they just said it outright.
So, their approach can be used to flag likely hallucinated output and warn the user?
So tue tldr is just what we already knew: LLMs predict the most likely word to come next and have no concept of "true" or "false" information.
Indeed, to have such a concept would require understanding that information and any AI that actually understood information wouldn't be an LLM because LLMs are just fancy autocorrect.
There's a bit more to it: Obviously, if a model gets more correct data pumped into it, it's more likely to produce a correct output. But they found that at the core of every AI model they tested, when an incorrect output came along, certain nodes produced it. And they are some of the nodes at the earliest part of making the model - before data gets added.
So with that in mind, the tl;dr is more like
AI models have two goals: first be readable, then be correct. It appears the nodes causing incorrect outputs that are also intended to make the output readable.