Comment

Comment on AI agents wrong ~70% of time: Carnegie Mellon study

I would be in breach of contract to tell you the details. How about you just stop trying to blame me for the clear and obvious lies that the LLM churned out and start believing that LLMs ARE are strikingly fallible, because, buddy, you have your head so far in the sand on this issue it’s weird.

The solution to the problem was to realise that an LLM cannot be trusted for accuracy even if the first few results are completely accurate, the bullshit well creep in. Don’t trust the LLM. Check every fucking thing.

In the end I wrote a quick script that broke the input up on tab characters and wrote the sentence. That’s how formulaic it was. I regretted deeply trying to get an LLM to use data.

The frustrating thing is that it is clearly capable of doing the task some of the time, but drifting off into FANTASY is its strong suit, and it doesn’t matter how firmly or how often you ask it to be accurate or use the input carefully. It’s going to lie to you before long. It’s an LLM. Bullshitting is what it does. Get it to do ONE THING only, then check the fuck out of its answer. Don’t trust it to tell you the truth any more than you would trust Donald J Trump to.

source

Sort:hotnew top

Melvin_Ferd@lemmy.world ⁨4⁩ ⁨months⁩ ago
This is crazy. I’ve literally been saying they are fallible. You’re saying your professional fed and LLM some type of dataset. So I can’t really say what it was you’re trying to accomplish but I’m just arguing that trying to have it process data is not what they’re trained to do. LLM are incredible tools and I’m tired of trying to act like they’re not because people keep using them for things they’re not built to do. It’s not a fire and forget thing. It does need to be supervised and verified. It’s not exactly an answer machine. But it’s so good at parsing text and documents, summarizing, formatting and acting like a search engine that you can communicate with rather than trying to grok some arcane sentence. Its power is in language applications.

source
- davidagain@lemmy.world ⁨4⁩ ⁨months⁩ ago
  
  it’s so good at parsing text and documents, summarizing
  
  No
  
  source
  - Melvin_Ferd@lemmy.world ⁨4⁩ ⁨months⁩ ago
    If it’s so bad as if you say, could you give an example of a prompt where it’ll tell you incorrect information.
    
    source
    davidagain@lemmy.world ⁨4⁩ ⁨months⁩ ago
    It’s like you didn’t listen to anything I ever said, or you discounted everything I said as fiction, but everything your dear LLM said is gospel truth in your eyes. It’s utterly irrational. You have to be trolling me now.
    
    source
    -> View More Comments