Now I’m curious, what’s the average score for humans?
AI agents wrong ~70% of time: Carnegie Mellon study
Submitted 2 weeks ago by eli001@lemmy.world to technology@lemmy.world
https://www.theregister.com/2025/06/29/ai_agents_fail_a_lot/?td=rt-4a
Comments
iopq@lemmy.world 2 weeks ago
SocialMediaRefugee@lemmy.world 2 weeks ago
I use it for very specific tasks and give as much information as possible. I usually have to give it more feedback to get to the desired goal. For instance I will ask it how to resolve an error message. I’ve even asked it for some short python code. I almost always get good feedback when doing that. Asking it about basic facts works too like science questions.
One thing I have had problems with is if the error is sort of an oddball it will give me suggestions that don’t work with my OS/app version. Then I give it feedback and eventually it will loop back to its original suggestions, so it couldn’t come up with an answer.
burgerpocalyse@lemmy.world 2 weeks ago
I dont know why but I am reminded of this clip about eggless omelette youtu.be/9Ah4tW-k8Ao
Melvin_Ferd@lemmy.world 2 weeks ago
How often do tech journalist get things wrong?
lmagitem@lemmy.zip 2 weeks ago
Color me surprised
MagicShel@lemmy.zip 2 weeks ago
I need to know the success rate of human agents in Mumbai (or some other outsourcing capital) for comparison.
I absolutely think this is not a good fit for AI, but I feel like the presumption is a human would get it right nearly all of the time, and I’m just not confident that’s the case.
dylanmorgan@slrpnk.net 2 weeks ago
Claude why did you make me an appointment with a gynecologist? I need an appointment with my neurologist, I’m a man and I have Parkinson’s.
TimewornTraveler@lemmy.dbzer0.com 2 weeks ago
Got it, changing your gender to female
lemmy_outta_here@lemmy.world 2 weeks ago
Rookie numbers! Let’s pump them up!
To match their tech bro hypers, the should be wrong at least 90% of the time.
sircac@lemmy.world 2 weeks ago
Why would they be right beyond word sequence frecuencies?
dan69@lemmy.world 2 weeks ago
And it won’t be until humans can agree on what’s a fact and true vs not… there is always someone or some group spreading mis/dis-information
Ileftreddit@lemmy.world 2 weeks ago
Hey I went there
esc27@lemmy.world 2 weeks ago
30% might be high. I’ve worked with two different agent creation platforms. Both require a huge amount of manual correction to work anywhere near accurately. I’m really not sure what the limit actually provides other than some natural language processing.
In my experience these sorts of agents are right 20% of the time, wrong 30%, and fail entirely 50%. A human has to sit behind the curtain and manually review conversations and program custom interactions for every failure.
In theory, once it is fully setup and all the edge cases fixed, it will provide 24/7 support in a convenient chat format. But that takes a lot more man hours than the hype suggests…
Weirdly, chatgpt does a better job than a purpose built, purchased agent.
atticus88th@lemmy.world 2 weeks ago
- this study was written with the assistance of an AI agent.
brown567@sh.itjust.works 2 weeks ago
70% seems pretty optimistic based on my experience…