661
LOL
(thelemmy.club)
"We did it, Patrick! We made a technological breakthrough!"
A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.
AI, in this case, refers to LLMs, GPT technology, and anything listed as "AI" meant to increase market valuations.
Is it my imagination or are LLMs actually getting less reliable as time goes on? I mean, they were never super reliable but it seems to me like the % of garbage is on the increase. I guess that's a combination of people figuring out how to game/troll the system, and AI companies trying to monetize their output. A perfect storm of shit.
As the internet content used to train LLMs contains more and more (recent) LLM output, the output quality feeds back into the training and impacts further quality down the line, since the model itself can't judge quality.
Let's do some math. There's a proper term for this math and some proper formula, but I wanna show how we get there.
To simplify the stochastic complexity, suppose an LLM's input (training material) and output quality can be modeled as a ratio of garbage. We'll assume that each iteration retrains the whole model on the output of the previous one, just to speed up the feedback effect, and that the randomisation produces some constant rate of quality deviation for each part of the input, that is: some portion of the good input produces bad output, while some portion of the bad input randomly generates good output.
For some arbitrary starting point, let's define that the rate is equal for both parts of the input, that this rate is 5% and that the initial quality is 100%. We can change these variables later, but we gotta start somewhere.
The first iteration, fed with 100% good input will produce 5% bad output and 95% good.
The second iteration produces 0.25% good output from the bad part of the input and 4.75% bad output from the good input, adding up to a net quality loss of 4.5 percentage points, that is: 9.5% bad and 90.5% good.
The third iteration has a net quality change of -4.05pp (86.45% good), the fourth -3.645pp (82.805%) and you can see that, while the quality loss is slowing down, it's staying negative. More specifically, rhe rate of change for each step is 0.9 times the previous one, and a positive number times a negative one will stay negative.
The point at which the two would even out, under the assumption of equal deviation on both sides, is at 50% quality: both parts will produce the same total deviation and cancel out. It won't actually reach that equilibrium, since the rate of decay will slow down the closer it gets, but if "close enough" works for LLMs, it'll do for us here.
Changing the initial quality won't change this much: A starting quality of 80% would get us steps of -3pp, -2.7pp, -2.43pp, the pattern is the same. The rate of change also won't change the trend, just slow it down or accelerate it. The perfect LLM that would perfectly replicate its input would still just maintain the initial quality.
So the one thing we could change mathemstically is the balance of deviation somehow, like reviewing the bad output and improving it before feeding it back. What would that do?
It would shift the resulting quality. At a rate of 10% deviation for bad input vs 5% for good input, the first step would still be -5pp, but the second would be 10%×5% - 5%×95% -4.25pp instead of -4.5pp, and the equilibrium would be at 66% quality instead. Put simply, if g is the rate of change towards good and b the rate towards bad, the result is an average quality of g÷(g+b).
Of course, the assumptions we made initially don't entirely hold up to reality. For one, models probably aren't entirely retrained so the impact of sloppy feedback will be muted. Additionally, they're not just getting their output back, so the quality won't line up exactly. Rather, it'll be a mishmash of the output of other models and actual human content.
On one hand, that means that high-quality contributions by humans can compensate somewhat. On the other hand, you'd need a lot of high-quality human contributions to stem the tide of slop, and low-quality human content isn't helping. And I'm not sure the chance of accidentally getting something right despite poor training data is higher than that of missing some piece of semantic context humans don't understand and bullshitting up some nonsense. Finally, the more humans rely on AI, the less high-quality content they themselves will put out.
Essentially, the quality of GenAI content trained on the internet is probably going to ensloppify itself until it approaches some more or less stable level of shit. Human intervention can raise that level, advances in technology might shift things too, and maybe at some point, that level might approximate human quality.
That still won't make it smarter than humans, just faster. It won't make it more reliable for ~~randomly generating~~ "researching" facts, just more efficient in producing mistakes. And the most tragic irony of all?
The more people piss in the pool of training data, the more piss they'll feed their machines.
It was inevitable, when you need to train GPT on the entirety of the internet and the internet is becoming more and more AI hallucinations.
That is the point. Training an LLM on the entire internet never will be reliable, apart of the huge energy waste, not the same as training an LLM on specific tasks in science, medicine, biology, etc., with this they can turn in very usefull tools, as shown, presenting results in hours or minutes in investigations which in traditional way would have least years. AI algorrithm are very efficient in specific tasks, since the first chess computers which roasted even world champions.