Comment

Comment on AI agents wrong ~70% of time: Carnegie Mellon study

Maybe the marketers should be a bit more picky about what they slap “AI” on and maybe decision makers should be a little less eager to follow whatever Better Auto complete spits out, but maybe that’s just me and we really should be pretending that all these algorithms really have made humans obsolete and generating convincing language is better than correspondence with reality.

source

Sort:hotnew top

surph_ninja@lemmy.world ⁨5⁩ ⁨months⁩ ago
I’m not sure the anti-AI marketing stance is any more solid of a position. Though it’s probably easier to defend, since it’s so vague and not based on anything measurable.

source
- chaonaut@lemmy.4d2.org ⁨5⁩ ⁨months⁩ ago
  Calling AI measurable is somewhat unfounded. Between not having a coherent, agreed-upon definition of what does and does not constitute an AI (we are, after all, discussing LLMs as though they were AGI), and the difficulty that exists in discussing the qualifications of human intelligence, saying that a given metric covers how well a thing is an AI isn’t really founded on anything but preference. We could, for example, say that mathematical ability is indicative of intelligence, but claiming FLOPS is a proxy for intelligence falls rather flat. We can measure things about the various algorithms, but that’s an awful long ways off from talking about AI itself (unless we’ve bought into the marketing hype).
  
  source
  - surph_ninja@lemmy.world ⁨5⁩ ⁨months⁩ ago
    So you’re saying the article’s measurements about AI agents being 70% of the time is made up?
    
    source
    Jakeroxs@sh.itjust.works ⁨5⁩ ⁨months⁩ ago
    I would definitely bet it’s made up and poorly designed.
    
    I wish that weren’t the case because having actual data would be nice, but these are almost always funded with some sort of intentional slant, for example nic vape safety where they clearly don’t use the product sanely and then make wild claims about how there’s lead in the vapes!
    
    Homie you’re fucking running the shit completely dry for longer then any humans could possible actually hit the vape, no shit it’s producing carcinogens.
    
    Go burn a bunch of paper and directly inhale the smoke and tell me paper is dangerous.
    
    source
    -> View More Comments
    chaonaut@lemmy.4d2.org ⁨5⁩ ⁨months⁩ ago
    I mean, sure, in that the expectation is that the article is talking about AI in general. The cited paper is discussing LLMs and their ability to complete tasks. So, we have to agree that LLMs are what we mean by AI, and that their ability to complete tasks is a valid metric for AI. If we accept the marketing hype, then of course LLMs are exactly what we’ve been talking about with AI, and we’ve accepted LLMs features and limitations as what AI is. If LLMs are prone to filling in with whatever closest fits the model without regard to accuracy, by accepting LLMs as what we mean by AI, then AI fits to its model without regard to accuracy.
    
    source
    -> View More Comments