Comment

Comment on AI agents wrong ~70% of time: Carnegie Mellon study

TheGrandNagus@lemmy.world ⁨8⁩ ⁨months⁩ ago

LLMs are an interesting tool to fuck around with, but I see things that are hilariously wrong often enough to know that they should not be used for anything serious. Shit, they probably shouldn’t be used for most things that are not serious either.

It’s a shame that by applying the same “AI” naming to a whole host of different technologies, LLMs being limited in usability - yet hyped to the moon - is hurting other more impressive advancements.

For example, speech synthesis is improving so much right now, which has been great for my sister who relies on screen reader software.

Being able to recognise speech in loud environments is improving loads too.

As is things like pattern/image analysis which appears very promising in medical analysis.

All of these get branded as “AI”. A layperson might not realise that they are completely different branches of technology, and then therefore reject useful applications of “AI” tech, because they’ve learned not to trust anything branded as AI, due to being let down by LLMs.

source

Sort:hotnew top

spankmonkey@lemmy.world ⁨8⁩ ⁨months⁩ ago
LLMs are like a multitool, they can do lots of easy things mostly fine as long as it is not complicated and doesn’t need to be exactly right. But they are being promoted as a whole toolkit as if they are able to be used to do the same work as effectively as a hammer, power drill, table saw, vise, and wrench.

source
- sugar_in_your_tea@sh.itjust.works ⁨8⁩ ⁨months⁩ ago
  Exactly! LLMs are useful when used properly, and terrible when not used properly, like any other tool. Here are some things they’re great at:
  
  writer’s block - get something relevant on the page to get ideas flowing
  
  narrowing down keywords for an unfamiliar topic
  
  getting a quick intro to an unfamiliar topic
  
  looking up facts you’re having trouble remembering (i.e. you’ll know it when you see it)
  
  Some things it’s terrible at:
  
  deep research - verify everything an LLM generated of accuracy is at all important
  
  creating important documents/code
  
  anything else where correctness is paramount
  
  I use LLMs a handful of times a week, and pretty much only when I’m stuck and need a kick in a new (hopefully right) direction.
  
  source
  - spankmonkey@lemmy.world ⁨8⁩ ⁨months⁩ ago
    
    narrowing down keywords for an unfamiliar topic
    
    getting a quick intro to an unfamiliar topic
    
    looking up facts you’re having trouble remembering (i.e. you’ll know it when you see it)
    
    I used to be able to use Google and other search engines to do these things before they went to shit in the pursuit of AI integration.
    
    source
    sugar_in_your_tea@sh.itjust.works ⁨8⁩ ⁨months⁩ ago
    Google search was pretty bad at each of those, even when it was good. Finding new keywords to use is especially difficult the more niche your area of search is, and I’ve spent hours trying different combinations until I found a handful of specific keywords that worked.
    
    Likewise, search is bad for getting a broad summary, unless someone has bothered to write it on a blog. But most information goes way too deep and you still need multiple sources to get there.
    
    Fact lookup is one the better uses for search, but again, I usually need to remember which source had what I wanted, whereas the LLM can usually pull it out for me.
    
    I use traditional search most of the time (usually DuckDuckGo), and LLMs if I think it’ll be more effective. We have some local models at work that I use, and they’re pretty helpful most of the time.
    
    source
    -> View More Comments
  - LePoisson@lemmy.world ⁨8⁩ ⁨months⁩ ago
    I will say I’ve found LLM useful for code writing but I’m not coding anything real at work. Just bullshit like SQL queries or Excel macro scripts or Power Automate crap.
    
    It still fucks up but if you can read code and have a feel for it you can walk it where it needs to be (and see where it screwed up)
    
    source
    sugar_in_your_tea@sh.itjust.works ⁨8⁩ ⁨months⁩ ago
    Exactly. Vibe coding is bad, but generating code for something you don’t touch often but can absolutely understand is totally fine. I’ve used it to generate SQL queries for relatively odd cases, such as CTEs for improving performance for large queries with common sub-queries. I always forget the syntax, and LLMs are great at generating something reasonable that I can tweak for my tables.
    
    source
    -> View More Comments
- TeddE@lemmy.world ⁨8⁩ ⁨months⁩ ago
  Because the tech industry hasn’t had a real hit of it’s favorite poison “private equity” in too long.
  
  The industry has played the same playbook since at least 2006. Likely before, but that’s when I personally stated seeing it. My take is that they got addicted to the dotcom bubble and decided they can and should recreate the magic evey 3-5 years or so.
  
  This time it’s AI, last it was crypto, and we’ve had web 2.0, 3.0, and a few others I’m likely missing.
  
  But yeah, it’s sold like a panacea every time, when really it’s revolutionary for like a handful of tasks.
  
  source
- morto@piefed.social ⁨8⁩ ⁨months⁩ ago
  
  and doesn't need to be exactly right
  
  What kind of tasks do you consider that don't need to be exactly right?
  
  source
  - SheeEttin@lemmy.zip ⁨8⁩ ⁨months⁩ ago
    Most. I’ve used ChatGPT to sketch an outline of a document, reformulate accomplishments into review bullets, rephrase a task I didnt understand, and similar stuff. None of it needed to be anywhere near perfect or complete.
    
    source
  - Korhaka@sopuli.xyz ⁨8⁩ ⁨months⁩ ago
    Make a basic HTML template. I’ll be changing it up anyway.
    
    source
  - Honytawk@feddit.nl ⁨8⁩ ⁨months⁩ ago
    Description generators for TTRPGs, as you will read through them afterwards anyway and correct when necessary.
    
    Generating lists of ideas. For creative writing, getting a bunch of ideas you can pick and choose from that fit the narrative you want.
    
    Simple code like HTML pages and boilerplate code that you will still review afterwards anyway.
    
    source
  - spankmonkey@lemmy.world ⁨8⁩ ⁨months⁩ ago
    Things that are inspiration or for approximations. Layout examples, possible correlations between data sets that need coincidence to be filtered out, estimating time lines, and basically anything that is close enough for a human to take the output and then do something with it.
    
    For example, if you put in a list of ingredients it can spit out recipes that may or may not be what you want, but it can be an inspiration. Taking the output and cooking without any review and consideration would be risky.
    
    source
- wise_pancake@lemmy.ca ⁨8⁩ ⁨months⁩ ago
  It is truly terrible marketing. It’s been obvious to me for years the value is in giving it to people and enabling them to do more with less, not outright replacing humans, especially not expert humans.
  
  I use AI/LLMs pretty much every day now. I write MCP servers and automate things with it and it’s mind blowing how productive it makes me.
  
  Just today I used these tools in a highly supervised way to complete a task that would have been a full day of tedius work, all done in an hour. That is fucking fantastic, it’s means I get to spend that time on more important things.
  
  It’s like giving an accountant excel. Excel isn’t replacing them, but it’s taking care of specific tasks so they can focus on better things.
  
  On the reliability and accuracy front there is still a lot to be desired, sure. But for supervised chats where it’s calling my tools it’s pretty damn good.
  
  source
- rottingleaf@lemmy.world ⁨8⁩ ⁨months⁩ ago
  That’s because they look like “talking machines” from various sci-fi. Normies feel as if they are touching the very edge of the progress. The rest of our life and the Internet kinda don’t give that feeling anymore.
  
  source
NarrativeBear@lemmy.world ⁨8⁩ ⁨months⁩ ago
Just add a search yesterday on the App Store and Google Play Store to see what new “productivity apps” are around. Pretty much every app now has AI somewhere in its name.

source
- dylanmorgan@slrpnk.net ⁨8⁩ ⁨months⁩ ago
  Sadly a lot of that is probably marketing, with little to no LLM integration, but it’s basically impossible to know for sure.
  
  source
punkwalrus@lemmy.world ⁨8⁩ ⁨months⁩ ago
I’d compare LLMs to a junior executive. Probably gets the basic stuff right, but check and verify for anything important or complicated. Break tasks down into easier steps.

source
- zbyte64@awful.systems ⁨8⁩ ⁨months⁩ ago
  Just stop anthropomorphizing LLMs and you’ll get a better sense of it’s limitations. A junior developer actually learns from doing the job, an LLM only learns when they update the training corpus and develop an updated model.
  
  source
  - jumping_redditor@sh.itjust.works ⁨8⁩ ⁨months⁩ ago
    an llm costs less, and won’t compain when yelled at
    
    source
    zbyte64@awful.systems ⁨8⁩ ⁨months⁩ ago
    Why would you ever yell at an employee unless you’re bad at managing people? And you think you can manage an LLM better because it doesn’t complain when you’re obviously wrong?
    
    source
floofloof@lemmy.ca ⁨8⁩ ⁨months⁩ ago
I tried to dictate some documents recently without paying the big bucks for specialized software, and was surprised just how bad Google and Microsoft’s speech recognition still is. Then I tried getting Word to transcribe some audio talks I had recorded, and that resulted in unreadable stuff with punctuation in all the wrong places. You could just about make out what it meant to say, so I tried asking various LLMs to tidy it up. That resulted in readable stuff that was largely made up and wrong, which also left out large chunks of the source material. In the end I just had to transcribe it all by hand.

It surprised me that these AI-ish products are still unable to transcribe speech coherently or tidy up a messy document without changing the meaning.

source
- wise_pancake@lemmy.ca ⁨8⁩ ⁨months⁩ ago
  I don’t know basic solutions that are super good, but whisper sbd the whisper derivatives I hear are decent for dictation these days.
  
  I have no idea how to run then though.
  
  source