Comment

Comment on [JS Required] How Performant are LLM Agents(AI Chatbots) on Real World Work Tasks? They Fail 70% or More of The Time.

unpossum@sh.itjust.works ⁨10⁩ ⁨months⁩ ago

It’s easy to forget how fucking sci-fi the existence of these models is. I’m kind of excited to see where agent frameworks are in five years time, as well as a bit apprehensive…

source

Sort:hotnew top

SMillerNL@lemmy.world ⁨10⁩ ⁨months⁩ ago
We clearly read very different stories, in mine the computers are usually more competent than a 30% success rate.

Imagine if the internet at its inception failed to connect you 70% of the time. It’s not as impressive as most other inventions.

source
- Zos_Kia@lemmynsfw.com ⁨10⁩ ⁨months⁩ ago
  Don’t have to imagine it when you can just remember it. Getting online in the late 90s was a horror show, seriously dialup was super unreliable. And that was 20 years after it’s inception, it was shit but also extremely popular.
  
  source
- RedstoneValley@sh.itjust.works ⁨10⁩ ⁨months⁩ ago
  Similar to the crypto hype. Adoption is imminent, bro. Just a few more months, bro. Please, bro
  
  source
  - Fizz@lemmy.nz ⁨10⁩ ⁨months⁩ ago
    Adoption is actually already there. The problem at the moment is getting people to pay for it because currently they lose money on each prompt even for paying users.
    
    source
- unpossum@sh.itjust.works ⁨10⁩ ⁨months⁩ ago
  Heh. Dial-up bbs, internet, and the like were fairly unstable way back when, not to mention expensive if you weren’t at a university. It’s come a long way, and I imagine artificial intelligence will as well. My main point was that even a 66% failure rate on complex real-world tasks didn’t seem possible even this century, just a few years ago. Transformers with attention really were a game changer in AI, and you have to be preternaturally blasé to ignore that. The problem, especially around here, has been how it’s sold (and to some extent that it’s sold at all), and the bubble that the hype has formed. I don’t disagree too much with that, I just think it’s a shame that it overshadows the very exciting and slightly scary tech at the bottom of the hype well, and leads to people dismissing it as advanced autocomplete, when it’s clearly something of a different degree.
  
  source