Anthropic tested Claude's(LLM, AI Chatbot) ability to manage a physical “storefront” to mixed results, as the AI struggled with pricing strategy and inventory management

⁨0⁩ ⁨likes⁩

Submitted ⁨⁨9⁩ ⁨months⁩ ago⁩ by ⁨Pro@programming.dev⁩ to ⁨technology@lemmy.world⁩

https://www.anthropic.com/research/project-vend-1

source

Comments

Sort:hotnew top

django@discuss.tchncs.de ⁨9⁩ ⁨months⁩ ago
All the tasks could have been easily solved with some basic APIs and algorithms.

source
Dojan@pawb.social ⁨9⁩ ⁨months⁩ ago
This is so funny. It fails miserably and they’re all “yeah so this is promising.”

Sure, a world where your manager hallucinates meetings with you and assesses you poorly for not performing according to plans that were hallucinated through said meetings sounds like a fantastic idea.

source
dhork@lemmy.world ⁨9⁩ ⁨months⁩ ago
It is an interesting article, even if it’s conclusions are entirely too rosy. The “storefront” was a vending machine, and the bot was instructed to interact with Anthropic employees (with an hourly cost attached) to do all physical interactions. While the bot did a decent job managing the stock most of the time, it made a lot of bad decisions based on trying to be too helpful to it’s customers. It also frequently hallucinated. But as anyone who owns a small business knows, one bad decision could put it under, so saying that an AI can manage a vending machine well “most of the time” is equivalent to saying it cant do the job at all.

Their conclusion is that with a bit more work, Claude might be able to perform as a middle-manager. To me, that says more about how useless middle-management is than how capable their AI is.

source
- sepi@piefed.social ⁨9⁩ ⁨months⁩ ago
  So what you are saying is the AI is ready to replace tech CEOs.
  
  source
Uff@lemmy.world ⁨9⁩ ⁨months⁩ ago
This shit needs to start being regulated.

source
- Pro@programming.dev ⁨9⁩ ⁨months⁩ ago
  How so?
  
  source
  - Uff@lemmy.world ⁨9⁩ ⁨months⁩ ago
    AI needs to be regulated. It’s already creeping everywhere. People getting fired and replaced with sloppy AI, holding petabytes of people’s data and work hostage, the list goes on. You can’t even ask a question without being asked for personal data to the AI and you certainly can’t do whatever you want with it.
    
    If it’s going to replace humans, it needs to be regulated like one.
    
    source
    -> View More Comments
A_norny_mousse@feddit.org ⁨9⁩ ⁨months⁩ ago
Anybody who thought the answer could have been even remotely close to Yes is delusional.

source
- Womble@lemmy.world ⁨9⁩ ⁨months⁩ ago
  I doubt anyone expected it to work completely, but it is interesting to see to what extent it worked and how it failed (halucinations and sycophancy)
  
  source
  - A_norny_mousse@feddit.org ⁨9⁩ ⁨months⁩ ago
    True; I just hate headlines that ask stupid questions.
    
    But then again, there’s always the premise that it could work, in such attempts, which annoys me no less.
    
    source