Now I’m curious, what’s the average score for humans?
AI agents wrong ~70% of time: Carnegie Mellon study
Submitted 2 months ago by eli001@lemmy.world to technology@lemmy.world
https://www.theregister.com/2025/06/29/ai_agents_fail_a_lot/?td=rt-4a
Comments
iopq@lemmy.world 2 months ago
SocialMediaRefugee@lemmy.world 2 months ago
I use it for very specific tasks and give as much information as possible. I usually have to give it more feedback to get to the desired goal. For instance I will ask it how to resolve an error message. I’ve even asked it for some short python code. I almost always get good feedback when doing that. Asking it about basic facts works too like science questions.
One thing I have had problems with is if the error is sort of an oddball it will give me suggestions that don’t work with my OS/app version. Then I give it feedback and eventually it will loop back to its original suggestions, so it couldn’t come up with an answer.
burgerpocalyse@lemmy.world 2 months ago
I dont know why but I am reminded of this clip about eggless omelette youtu.be/9Ah4tW-k8Ao
Melvin_Ferd@lemmy.world 2 months ago
How often do tech journalist get things wrong?
lmagitem@lemmy.zip 2 months ago
Color me surprised
MagicShel@lemmy.zip 2 months ago
I need to know the success rate of human agents in Mumbai (or some other outsourcing capital) for comparison.
I absolutely think this is not a good fit for AI, but I feel like the presumption is a human would get it right nearly all of the time, and I’m just not confident that’s the case.
dylanmorgan@slrpnk.net 2 months ago
Claude why did you make me an appointment with a gynecologist? I need an appointment with my neurologist, I’m a man and I have Parkinson’s.
TimewornTraveler@lemmy.dbzer0.com 2 months ago
Got it, changing your gender to female
lemmy_outta_here@lemmy.world 2 months ago
Rookie numbers! Let’s pump them up!
To match their tech bro hypers, the should be wrong at least 90% of the time.
sircac@lemmy.world 2 months ago
Why would they be right beyond word sequence frecuencies?
dan69@lemmy.world 2 months ago
And it won’t be until humans can agree on what’s a fact and true vs not… there is always someone or some group spreading mis/dis-information
Ileftreddit@lemmy.world 2 months ago
Hey I went there
esc27@lemmy.world 2 months ago
30% might be high. I’ve worked with two different agent creation platforms. Both require a huge amount of manual correction to work anywhere near accurately. I’m really not sure what the limit actually provides other than some natural language processing.
In my experience these sorts of agents are right 20% of the time, wrong 30%, and fail entirely 50%. A human has to sit behind the curtain and manually review conversations and program custom interactions for every failure.
In theory, once it is fully setup and all the edge cases fixed, it will provide 24/7 support in a convenient chat format. But that takes a lot more man hours than the hype suggests…
Weirdly, chatgpt does a better job than a purpose built, purchased agent.
atticus88th@lemmy.world 2 months ago
- this study was written with the assistance of an AI agent.
brown567@sh.itjust.works 2 months ago
70% seems pretty optimistic based on my experience…