Comment

Comment on AI agents wrong ~70% of time: Carnegie Mellon study

<- View Parent

outhouseperilous@lemmy.dbzer0.com ⁨3⁩ ⁨months⁩ ago

You get how that’s fucking useless, generally?

source

Sort:hotnew top

jsomae@lemmy.ml ⁨3⁩ ⁨months⁩ ago
yes, that’s generally useless. It should not be shoved down people’s throats. 30% accuracy still has its uses, especially if the result can be programmatically verified.

source
- Knock_Knock_Lemmy_In@lemmy.world ⁨3⁩ ⁨months⁩ ago
  Run something with a 70% failure rate 10x and you get to a cumulative 98% pass rate. LLMs don’t get tired and they can be run in parallel.
  
  source
  - MangoCats@feddit.it ⁨3⁩ ⁨months⁩ ago
    I have actually been doing this lately: iteratively prompting AI to write software and fix its errors until something useful comes out. It’s a lot like machine translation. I speak fluent C++, but I don’t speak Rust, but I can hammer away on the AI (with English language prompts) until it produces passable Rust for something I could write for myself in C++ in half the time and effort.
    
    I also don’t speak Finnish, but Google Translate can take what I say in English and put it into at least somewhat comprehensible Finnish without egregious translation errors most of the time.
    
    Is this useful? When C++ is getting banned for “security concerns” and Rust is the required language, it’s at least a little helpful.
    
    source
    jsomae@lemmy.ml ⁨3⁩ ⁨months⁩ ago
    I’m impressed you can make strides with Rust with AI. I am in a similar boat, except I’ve found LLMs are terrible with Rust.
    
    source
    -> View More Comments
  - jsomae@lemmy.ml ⁨3⁩ ⁨months⁩ ago
    The problem is they are not i.i.d., so this doesn’t really work. It works a bit, which is in my opinion why chain-of-thought is effective (it gives the LLM a chance to posit a couple answers first). However, we’re already looking at “agents,” so they’re probably already doing chain-of-thought.
    
    source
    Knock_Knock_Lemmy_In@lemmy.world ⁨3⁩ ⁨months⁩ ago
    Very fair comment. In my experience even increasing the temperature you get stuck in local minimums
    
    I was just trying to illustrate how 70% failure rates can still be useful.
    
    source
  - davidagain@lemmy.world ⁨3⁩ ⁨months⁩ ago
    What’s 0.7^10?
    
    source
    Knock_Knock_Lemmy_In@lemmy.world ⁨3⁩ ⁨months⁩ ago
    About 0.02
    
    source
    -> View More Comments
- outhouseperilous@lemmy.dbzer0.com ⁨3⁩ ⁨months⁩ ago
  Less broadly useful than 20 tons of mixed texture human shit.
  
  source
  - jsomae@lemmy.ml ⁨3⁩ ⁨months⁩ ago
    Are you just trolling or do you seriously not understand how something which can do a task correctly with 30% reliability can be made useful if the result can be automatically verified.
    
    source
    outhouseperilous@lemmy.dbzer0.com ⁨3⁩ ⁨months⁩ ago
    Its not a magical 30%, factors apply. It’s not even a mind that thinks and just isnt very good.
    
    This isnt like a magical dice that gives you truth on a 5 or a 6, and lies on 1,2,3,7, and for.
    
    This is a (very complicated very large) language or other data graph that programmatically identifies an average. 30% of the time. Which means the more possible that is, the easier it is to either use a simpler cheaper tool that will give you a better more reliable answer much faster.
    
    And 20 tons of human shit has uses! If you know its providence, there’s all sorts of population level public health surveillance you can do to get ahead of disease trends! Its also got some good agricultural stuff in it-phosphorous and stuff, if you can extract it.
    
    source
    -> View More Comments
MangoCats@feddit.it ⁨3⁩ ⁨months⁩ ago
As useless as a cubicle farm full of unsupervised workers.

source
- outhouseperilous@lemmy.dbzer0.com ⁨3⁩ ⁨months⁩ ago
  Tjose are people who could be living their li:es, pursuing their ambitions, whatever. That could get some shit done. Comparison not valid.
  
  source
  - Honytawk@feddit.nl ⁨3⁩ ⁨months⁩ ago
    The comparison is about the correctness of their work.
    
    Their lives have nothing to do with it.
    
    source
    davidagain@lemmy.world ⁨3⁩ ⁨months⁩ ago
    Human lives are the most important thing of all. Profits are irrelevant compared to human lives. I get that that’s not how Besos sees the world, but he’s a monstrous outlier.
    
    source
    outhouseperilous@lemmy.dbzer0.com ⁨3⁩ ⁨months⁩ ago
    So, first, bad comparison.
    
    Second: if that’s the equivalent, why not do the one that makes tge wealthy let a few pennies go to fall on actual people?
    
    source