958
AI agents wrong ~70% of time: Carnegie Mellon study
(www.theregister.com)
This is a most excellent place for technology news and articles.
OK, but I wonder who really tries to use AI for that?
AI is not ready to replace a human completely, but some specific tasks AI does remarkably well.
Yeah, we need more info to understand the results of this experiment.
We need to know what exactly were these tasks that they claim were validated by experts. Because like you're saying, the tasks I saw were not what I was expecting.
We need to know how the LLMs were set up. If you tell it to act like a chat bot and then you give it a task, it will have poorer results than if you set it up specifically to perform these sorts of tasks.
We need to see the actual prompts given to the LLMs. It may be that you simply need an expert to write prompts in order to get much better results. While that would be disappointing today, it's not all that different from how people needed to learn to use search engines.
We need to see the failure rate of humans performing the same tasks.
That’s literally how “AI agents” are being marketed. “Tell it to do a thing and it will do it for you.”
So? That doesn't mean they are supposed to be used like that.
Show me any marketing that isn't full of lies.