Comment on AI agents wrong ~70% of time: Carnegie Mellon study
surph_ninja@lemmy.world 6 days agoI’m not sure the anti-AI marketing stance is any more solid of a position. Though it’s probably easier to defend, since it’s so vague and not based on anything measurable.
chaonaut@lemmy.4d2.org 6 days ago
Calling AI measurable is somewhat unfounded. Between not having a coherent, agreed-upon definition of what does and does not constitute an AI (we are, after all, discussing LLMs as though they were AGI), and the difficulty that exists in discussing the qualifications of human intelligence, saying that a given metric covers how well a thing is an AI isn’t really founded on anything but preference. We could, for example, say that mathematical ability is indicative of intelligence, but claiming FLOPS is a proxy for intelligence falls rather flat. We can measure things about the various algorithms, but that’s an awful long ways off from talking about AI itself (unless we’ve bought into the marketing hype).
surph_ninja@lemmy.world 6 days ago
So you’re saying the article’s measurements about AI agents being 70% of the time is made up?
Jakeroxs@sh.itjust.works 6 days ago
I would definitely bet it’s made up and poorly designed.
I wish that weren’t the case because having actual data would be nice, but these are almost always funded with some sort of intentional slant, for example nic vape safety where they clearly don’t use the product sanely and then make wild claims about how there’s lead in the vapes!
Homie you’re fucking running the shit completely dry for longer then any humans could possible actually hit the vape, no shit it’s producing carcinogens.
Go burn a bunch of paper and directly inhale the smoke and tell me paper is dangerous.
surph_ninja@lemmy.world 6 days ago
Agreed. 70% is astoundingly high for today’s models. Something stinks.
chaonaut@lemmy.4d2.org 6 days ago
I mean, sure, in that the expectation is that the article is talking about AI in general. The cited paper is discussing LLMs and their ability to complete tasks. So, we have to agree that LLMs are what we mean by AI, and that their ability to complete tasks is a valid metric for AI. If we accept the marketing hype, then of course LLMs are exactly what we’ve been talking about with AI, and we’ve accepted LLMs features and limitations as what AI is. If LLMs are prone to filling in with whatever closest fits the model without regard to accuracy, by accepting LLMs as what we mean by AI, then AI fits to its model without regard to accuracy.
surph_ninja@lemmy.world 6 days ago
Except you yourself just stated that it was impossible to measure performance of these things. When it’s favorable to AI, you claim it can’t be measured. When it’s unfavorable for AI, you claim of course it’s measurable. Your argument is so flimsy and your understanding so limited that you can’t even stick to a single idea. You’re all over the place.