Comment

Comment on AI agents wrong ~70% of time: Carnegie Mellon study

Melvin_Ferd@lemmy.world ⁨5⁩ ⁨months⁩ ago

What does I give it data to put in a formulaic sentence.

Why not just share the details. I often find a lot of people saying it’s doing crazy things and never like to share the details. It’s very similar to discussing things with Trump supporters who do the same shit when pressed on details about stuff they say occurs

source

Sort:hotnew top

davidagain@lemmy.world ⁨5⁩ ⁨months⁩ ago
I would be in breach of contract to tell you the details. How about you just stop trying to blame me for the clear and obvious lies that the LLM churned out and start believing that LLMs ARE are strikingly fallible, because, buddy, you have your head so far in the sand on this issue it’s weird.

The solution to the problem was to realise that an LLM cannot be trusted for accuracy even if the first few results are completely accurate, the bullshit well creep in. Don’t trust the LLM. Check every fucking thing.

In the end I wrote a quick script that broke the input up on tab characters and wrote the sentence. That’s how formulaic it was. I regretted deeply trying to get an LLM to use data.

The frustrating thing is that it is clearly capable of doing the task some of the time, but drifting off into FANTASY is its strong suit, and it doesn’t matter how firmly or how often you ask it to be accurate or use the input carefully. It’s going to lie to you before long. It’s an LLM. Bullshitting is what it does. Get it to do ONE THING only, then check the fuck out of its answer. Don’t trust it to tell you the truth any more than you would trust Donald J Trump to.

source
- Melvin_Ferd@lemmy.world ⁨5⁩ ⁨months⁩ ago
  This is crazy. I’ve literally been saying they are fallible. You’re saying your professional fed and LLM some type of dataset. So I can’t really say what it was you’re trying to accomplish but I’m just arguing that trying to have it process data is not what they’re trained to do. LLM are incredible tools and I’m tired of trying to act like they’re not because people keep using them for things they’re not built to do. It’s not a fire and forget thing. It does need to be supervised and verified. It’s not exactly an answer machine. But it’s so good at parsing text and documents, summarizing, formatting and acting like a search engine that you can communicate with rather than trying to grok some arcane sentence. Its power is in language applications.
  
  source
  - davidagain@lemmy.world ⁨5⁩ ⁨months⁩ ago
    
    it’s so good at parsing text and documents, summarizing
    
    No
    
    source
    Melvin_Ferd@lemmy.world ⁨5⁩ ⁨months⁩ ago
    If it’s so bad as if you say, could you give an example of a prompt where it’ll tell you incorrect information.
    
    source
    -> View More Comments