Comment

Comment on Meta chief technology officer(CTO) explains why the smart glasses demo failed at Meta Connect — and it wasn’t the Wi-Fi

TastehWaffleZ@lemmy.world ⁨2⁩ ⁨months⁩ ago

That sounds like complete damage control lies. Why would the AI think the chef had finished prepping the sauce just because there was heavy usage??

source

Sort:hotnew top

PhilipTheBucket@piefed.social ⁨2⁩ ⁨months⁩ ago
Yeah it’s a bunch of shit. I’m not an expert obviously, just talking out of my ass, but:

Running inference for all the devices in the building to “our dev server” would not have maintained a usable level of response time for any of them, unless he meant to say “the dev cluster” or something and his home wifi glitched right at that moment and made it sound different

LLMs don’t degrade by giving wrong answers, they degrade by stopping producing tokens

Meta already has shown itself to be okay with lying

GUYS JUST USE FUCKING CANNED ANSWERS WITH THE RIGHT SOUNDING VOICE, THIS ISN’T ROCKET SCIENCE, THAT’S HOW YOU DO DEMOS WHEN YOUR SHIT’S NOT DONE YET
source
- Sasha@lemmy.blahaj.zone ⁨2⁩ ⁨months⁩ ago
  LLMs can degrade by giving “wrong” answers, but not because of network congestion ofc.
  
  That paper is fucking hilarious, but the tl;dr is that when asked to manage a vending machine business for an extended period of time, they eventually go completely insane. Some have an existential crisis, some call the whole thing a conspiracy and call the FBI, etc. it’s amazing how trash they are.
  
  source
  - PhilipTheBucket@piefed.social ⁨2⁩ ⁨months⁩ ago
    Initial thought: Well… but this is a transparently absurd way to set up an ML system to manage a vending machine. I mean it is a useful data point I guess, but to me it leads to the conclusion “Even though LLMs sound to humans like they know what they’re doing, they does not, don’t just stick the whole situation into the LLM input and expect good decisions and strategies to come out of the output, you have to embed it into a more capable and structured system for any good to come of it.”
    
    Updated thought, after reading a little bit of the paper: Holy Christ on a pancake. Is this architecture what people have been meaning by “AI agents” this whole time I’ve been hearing about them? Yeah this isn’t going to work. What the fuck, of course it goes insane over time. I stand corrected, I guess, this is valid research pointing out the stupidity of basically putting the LLM in the driver’s seat of something even more complicated than the stuff it’s already been shown to fuck up, and hoping that goes okay.
    
    source
    Sasha@lemmy.blahaj.zone ⁨2⁩ ⁨months⁩ ago
    I’m pretty sure they touch on those points in the paper, they knew they were overloading it and were looking at how it handled that in particular. My understanding is that they’re testing failure modes to try and probe the inner workings to some degree; they discuss the impact of filling up the context in the abstract, mention it’s designed to stress test and are particularly interested in memory limits, so I’m pretty sure they’ve deliberately chosen to not cater to an LLMs ideal conditions. It’s not really a real world use case of LLMs running a business (even if that’s the framing given initially), it’s an experiment meant to break them in a simulated environment. The last line of the abstract kind highlights this, they’re hoping to find flaws to improve the models generally.
    
    Either way, I just meant to point out that they can absolutely just output junk as a failure mode.
    
    source
    -> View More Comments
Ulrich@feddit.org ⁨2⁩ ⁨months⁩ ago
Even if it was true, your server can’t handle a couple hundred simultaneous requests? That’s not promising either. Although at least that would be easier to fix than the real problem, which is incredibly obvious to anyone who has ever used this technology, and that’s that it doesn’t fucking work, and is flawed on a fundamental level.

source
- KairuByte@lemmy.dbzer0.com ⁨2⁩ ⁨months⁩ ago
  If this was a tech demo, it tracks that they wouldn’t be using overpowered hardware. Why lug around a full server when they can just load up the software on a laptop, considering they weren’t expecting hundreds of invokes at the exact same moment.
  
  source
  - synae@lemmy.sdf.org ⁨2⁩ ⁨months⁩ ago
    “lug around”? the server(s) are 100% in a data center, no way this is a single computer on prem. no company, especially facebook, deploys software that way in 2025
    
    source
    KairuByte@lemmy.dbzer0.com ⁨2⁩ ⁨months⁩ ago
    It really depends. A local machine is guaranteed to not have issues if the general internet goes down. It’s also going to reduce latency considerably.
    
    There are many reasons to have a dev box local to the demonstration. Just because they wouldn’t deploy it that way in production doesn’t mean they wouldn’t deploy a demo in that same way.
    
    source
- masterspace@lemmy.ca ⁨2⁩ ⁨months⁩ ago
  How is it fundamentally flawed?
  
  source
  - Ulrich@feddit.org ⁨2⁩ ⁨months⁩ ago
    Check out the OP
    
    source