Comment

Comment on AI agents wrong ~70% of time: Carnegie Mellon study

spankmonkey@lemmy.world ⁨8⁩ ⁨months⁩ ago

LLMs are like a multitool, they can do lots of easy things mostly fine as long as it is not complicated and doesn’t need to be exactly right. But they are being promoted as a whole toolkit as if they are able to be used to do the same work as effectively as a hammer, power drill, table saw, vise, and wrench.

source

Sort:hotnew top

sugar_in_your_tea@sh.itjust.works ⁨8⁩ ⁨months⁩ ago
Exactly! LLMs are useful when used properly, and terrible when not used properly, like any other tool. Here are some things they’re great at:

writer’s block - get something relevant on the page to get ideas flowing

narrowing down keywords for an unfamiliar topic

getting a quick intro to an unfamiliar topic

looking up facts you’re having trouble remembering (i.e. you’ll know it when you see it)

Some things it’s terrible at:

deep research - verify everything an LLM generated of accuracy is at all important

creating important documents/code

anything else where correctness is paramount

I use LLMs a handful of times a week, and pretty much only when I’m stuck and need a kick in a new (hopefully right) direction.
source
- spankmonkey@lemmy.world ⁨8⁩ ⁨months⁩ ago
  narrowing down keywords for an unfamiliar topic
  
  getting a quick intro to an unfamiliar topic
  
  looking up facts you’re having trouble remembering (i.e. you’ll know it when you see it)
  
  I used to be able to use Google and other search engines to do these things before they went to shit in the pursuit of AI integration.
  
  source
  - sugar_in_your_tea@sh.itjust.works ⁨8⁩ ⁨months⁩ ago
    Google search was pretty bad at each of those, even when it was good. Finding new keywords to use is especially difficult the more niche your area of search is, and I’ve spent hours trying different combinations until I found a handful of specific keywords that worked.
    
    Likewise, search is bad for getting a broad summary, unless someone has bothered to write it on a blog. But most information goes way too deep and you still need multiple sources to get there.
    
    Fact lookup is one the better uses for search, but again, I usually need to remember which source had what I wanted, whereas the LLM can usually pull it out for me.
    
    I use traditional search most of the time (usually DuckDuckGo), and LLMs if I think it’ll be more effective. We have some local models at work that I use, and they’re pretty helpful most of the time.
    
    source
    jjjalljs@ttrpg.network ⁨8⁩ ⁨months⁩ ago
    It is absolutely stupid, stupid to the tune of “you shouldn’t be a decision maker”, to think an LLM is a better use for “getting a quick intro to an unfamiliar topic” than reading an actual intro on an unfamiliar topic. For most topics, wikipedia is right there, complete with sources. For obscure things, an LLM is just going to lie to you.
    
    As for “looking up facts when you have trouble remembering it”, using the lie machine is a terrible idea. It’s going to say something plausible, and you tautologically are not in a position to verify it. And, as above, you’d be better off finding a reputable source. If I type in “how do i strip whitespace in python?” an LLM could very well say “it’s your_string.strip()”. That’s wrong. Just send me to the fucking official docs.
    
    There are probably edge or special cases, but for general search on the web? LLMs are worse than search.
    
    source
    -> View More Comments
    spankmonkey@lemmy.world ⁨8⁩ ⁨months⁩ ago
    No search engine or AI will be great with vague descriptions of niche subjects because by definition niche subjects are too uncommon to have a common pattern of ‘close enough’.
    
    source
    -> View More Comments
- LePoisson@lemmy.world ⁨8⁩ ⁨months⁩ ago
  I will say I’ve found LLM useful for code writing but I’m not coding anything real at work. Just bullshit like SQL queries or Excel macro scripts or Power Automate crap.
  
  It still fucks up but if you can read code and have a feel for it you can walk it where it needs to be (and see where it screwed up)
  
  source
  - sugar_in_your_tea@sh.itjust.works ⁨8⁩ ⁨months⁩ ago
    Exactly. Vibe coding is bad, but generating code for something you don’t touch often but can absolutely understand is totally fine. I’ve used it to generate SQL queries for relatively odd cases, such as CTEs for improving performance for large queries with common sub-queries. I always forget the syntax, and LLMs are great at generating something reasonable that I can tweak for my tables.
    
    source
    LePoisson@lemmy.world ⁨8⁩ ⁨months⁩ ago
    
    I always forget the syntax
    
    Me with literally everything code I touch always and forever.
    
    source
TeddE@lemmy.world ⁨8⁩ ⁨months⁩ ago
Because the tech industry hasn’t had a real hit of it’s favorite poison “private equity” in too long.

The industry has played the same playbook since at least 2006. Likely before, but that’s when I personally stated seeing it. My take is that they got addicted to the dotcom bubble and decided they can and should recreate the magic evey 3-5 years or so.

This time it’s AI, last it was crypto, and we’ve had web 2.0, 3.0, and a few others I’m likely missing.

But yeah, it’s sold like a panacea every time, when really it’s revolutionary for like a handful of tasks.

source
morto@piefed.social ⁨8⁩ ⁨months⁩ ago

and doesn't need to be exactly right

What kind of tasks do you consider that don't need to be exactly right?

source
- SheeEttin@lemmy.zip ⁨8⁩ ⁨months⁩ ago
  Most. I’ve used ChatGPT to sketch an outline of a document, reformulate accomplishments into review bullets, rephrase a task I didnt understand, and similar stuff. None of it needed to be anywhere near perfect or complete.
  
  source
- Korhaka@sopuli.xyz ⁨8⁩ ⁨months⁩ ago
  Make a basic HTML template. I’ll be changing it up anyway.
  
  source
- Honytawk@feddit.nl ⁨8⁩ ⁨months⁩ ago
  Description generators for TTRPGs, as you will read through them afterwards anyway and correct when necessary.
  
  Generating lists of ideas. For creative writing, getting a bunch of ideas you can pick and choose from that fit the narrative you want.
  
  Simple code like HTML pages and boilerplate code that you will still review afterwards anyway.
  
  source
- spankmonkey@lemmy.world ⁨8⁩ ⁨months⁩ ago
  Things that are inspiration or for approximations. Layout examples, possible correlations between data sets that need coincidence to be filtered out, estimating time lines, and basically anything that is close enough for a human to take the output and then do something with it.
  
  For example, if you put in a list of ingredients it can spit out recipes that may or may not be what you want, but it can be an inspiration. Taking the output and cooking without any review and consideration would be risky.
  
  source
wise_pancake@lemmy.ca ⁨8⁩ ⁨months⁩ ago
It is truly terrible marketing. It’s been obvious to me for years the value is in giving it to people and enabling them to do more with less, not outright replacing humans, especially not expert humans.

I use AI/LLMs pretty much every day now. I write MCP servers and automate things with it and it’s mind blowing how productive it makes me.

Just today I used these tools in a highly supervised way to complete a task that would have been a full day of tedius work, all done in an hour. That is fucking fantastic, it’s means I get to spend that time on more important things.

It’s like giving an accountant excel. Excel isn’t replacing them, but it’s taking care of specific tasks so they can focus on better things.

On the reliability and accuracy front there is still a lot to be desired, sure. But for supervised chats where it’s calling my tools it’s pretty damn good.

source
rottingleaf@lemmy.world ⁨8⁩ ⁨months⁩ ago
That’s because they look like “talking machines” from various sci-fi. Normies feel as if they are touching the very edge of the progress. The rest of our life and the Internet kinda don’t give that feeling anymore.

source