Open Menu
AllLocalCommunitiesAbout
lotide
AllLocalCommunitiesAbout
Login

AI agents wrong ~70% of time: Carnegie Mellon study

⁨984⁩ ⁨likes⁩

Submitted ⁨⁨2⁩ ⁨weeks⁩ ago⁩ by ⁨eli001@lemmy.world⁩ to ⁨technology@lemmy.world⁩

https://www.theregister.com/2025/06/29/ai_agents_fail_a_lot/?td=rt-4a

source

Comments

Sort:hotnewtop
  • brown567@sh.itjust.works ⁨2⁩ ⁨weeks⁩ ago

    70% seems pretty optimistic based on my experience…

    source
  • iopq@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

    Now I’m curious, what’s the average score for humans?

    source
  • SocialMediaRefugee@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

    I use it for very specific tasks and give as much information as possible. I usually have to give it more feedback to get to the desired goal. For instance I will ask it how to resolve an error message. I’ve even asked it for some short python code. I almost always get good feedback when doing that. Asking it about basic facts works too like science questions.

    One thing I have had problems with is if the error is sort of an oddball it will give me suggestions that don’t work with my OS/app version. Then I give it feedback and eventually it will loop back to its original suggestions, so it couldn’t come up with an answer.

    source
  • burgerpocalyse@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

    I dont know why but I am reminded of this clip about eggless omelette youtu.be/9Ah4tW-k8Ao

    source
  • Melvin_Ferd@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

    How often do tech journalist get things wrong?

    source
  • lmagitem@lemmy.zip ⁨2⁩ ⁨weeks⁩ ago

    Color me surprised

    source
  • MagicShel@lemmy.zip ⁨2⁩ ⁨weeks⁩ ago

    I need to know the success rate of human agents in Mumbai (or some other outsourcing capital) for comparison.

    I absolutely think this is not a good fit for AI, but I feel like the presumption is a human would get it right nearly all of the time, and I’m just not confident that’s the case.

    source
  • dylanmorgan@slrpnk.net ⁨2⁩ ⁨weeks⁩ ago

    Claude why did you make me an appointment with a gynecologist? I need an appointment with my neurologist, I’m a man and I have Parkinson’s.

    source
    • TimewornTraveler@lemmy.dbzer0.com ⁨2⁩ ⁨weeks⁩ ago

      Got it, changing your gender to female

      source
  • lemmy_outta_here@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

    Rookie numbers! Let’s pump them up!

    To match their tech bro hypers, the should be wrong at least 90% of the time.

    source
  • sircac@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

    Why would they be right beyond word sequence frecuencies?

    source
  • dan69@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

    And it won’t be until humans can agree on what’s a fact and true vs not… there is always someone or some group spreading mis/dis-information

    source
  • Ileftreddit@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

    Hey I went there

    source
  • esc27@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

    30% might be high. I’ve worked with two different agent creation platforms. Both require a huge amount of manual correction to work anywhere near accurately. I’m really not sure what the limit actually provides other than some natural language processing.

    In my experience these sorts of agents are right 20% of the time, wrong 30%, and fail entirely 50%. A human has to sit behind the curtain and manually review conversations and program custom interactions for every failure.

    In theory, once it is fully setup and all the edge cases fixed, it will provide 24/7 support in a convenient chat format. But that takes a lot more man hours than the hype suggests…

    Weirdly, chatgpt does a better job than a purpose built, purchased agent.

    source
  • atticus88th@lemmy.world ⁨2⁩ ⁨weeks⁩ ago
    • this study was written with the assistance of an AI agent.
    source