954
AI agents wrong ~70% of time: Carnegie Mellon study
(www.theregister.com)
This is a most excellent place for technology news and articles.
LLMs are an interesting tool to fuck around with, but I see things that are hilariously wrong often enough to know that they should not be used for anything serious. Shit, they probably shouldn't be used for most things that are not serious either.
It's a shame that by applying the same "AI" naming to a whole host of different technologies, LLMs being limited in usability - yet hyped to the moon - is hurting other more impressive advancements.
For example, speech synthesis is improving so much right now, which has been great for my sister who relies on screen reader software.
Being able to recognise speech in loud environments, or removing background noice from recordings is improving loads too.
My friend is involved in making a mod for a Fallout 4, and there was an outreach for people recording voice lines - she says that there are some recordings of dubious quality that would've been unusable before that can now be used without issue thanks to AI denoising algorithms. That is genuinely useful!
As is things like pattern/image analysis which appears very promising in medical analysis.
All of these get branded as "AI". A layperson might not realise that they are completely different branches of technology, and then therefore reject useful applications of "AI" tech, because they've learned not to trust anything branded as AI, due to being let down by LLMs.
LLMs are like a multitool, they can do lots of easy things mostly fine as long as it is not complicated and doesn't need to be exactly right. But they are being promoted as a whole toolkit as if they are able to be used to do the same work as effectively as a hammer, power drill, table saw, vise, and wrench.
What kind of tasks do you consider that don't need to be exactly right?
Description generators for TTRPGs, as you will read through them afterwards anyway and correct when necessary.
Generating lists of ideas. For creative writing, getting a bunch of ideas you can pick and choose from that fit the narrative you want.
A search engine like Perplexity.ai which after searching summarizes the web page and adds a link to the page next to it. If the summary seems promising, you go to the real page to verify the actual information.
Simple code like HTML pages and boilerplate code that you will still review afterwards anyway.