Even if it was true, your server can’t handle a couple hundred simultaneous requests? That’s not promising either. Although at least that would be easier to fix than the real problem, which is incredibly obvious to anyone who has ever used this technology, and that’s that it doesn’t fucking work, and is flawed on a fundamental level.
TastehWaffleZ@lemmy.world 3 weeks ago
That sounds like complete damage control lies. Why would the AI think the chef had finished prepping the sauce just because there was heavy usage??
Ulrich@feddit.org 3 weeks ago
KairuByte@lemmy.dbzer0.com 3 weeks ago
If this was a tech demo, it tracks that they wouldn’t be using overpowered hardware. Why lug around a full server when they can just load up the software on a laptop, considering they weren’t expecting hundreds of invokes at the exact same moment.
synae@lemmy.sdf.org 3 weeks ago
“lug around”? the server(s) are 100% in a data center, no way this is a single computer on prem. no company, especially facebook, deploys software that way in 2025
KairuByte@lemmy.dbzer0.com 3 weeks ago
It really depends. A local machine is guaranteed to not have issues if the general internet goes down. It’s also going to reduce latency considerably.
There are many reasons to have a dev box local to the demonstration. Just because they wouldn’t deploy it that way in production doesn’t mean they wouldn’t deploy a demo in that same way.
masterspace@lemmy.ca 3 weeks ago
How is it fundamentally flawed?
Ulrich@feddit.org 3 weeks ago
Check out the OP
PhilipTheBucket@piefed.social 3 weeks ago
Yeah it’s a bunch of shit. I’m not an expert obviously, just talking out of my ass, but:
Sasha@lemmy.blahaj.zone 3 weeks ago
LLMs can degrade by giving “wrong” answers, but not because of network congestion ofc.
That paper is fucking hilarious, but the tl;dr is that when asked to manage a vending machine business for an extended period of time, they eventually go completely insane. Some have an existential crisis, some call the whole thing a conspiracy and call the FBI, etc. it’s amazing how trash they are.
PhilipTheBucket@piefed.social 3 weeks ago
Initial thought: Well… but this is a transparently absurd way to set up an ML system to manage a vending machine. I mean it is a useful data point I guess, but to me it leads to the conclusion “Even though LLMs sound to humans like they know what they’re doing, they does not, don’t just stick the whole situation into the LLM input and expect good decisions and strategies to come out of the output, you have to embed it into a more capable and structured system for any good to come of it.”
Updated thought, after reading a little bit of the paper: Holy Christ on a pancake. Is this architecture what people have been meaning by “AI agents” this whole time I’ve been hearing about them? Yeah this isn’t going to work. What the fuck, of course it goes insane over time. I stand corrected, I guess, this is valid research pointing out the stupidity of basically putting the LLM in the driver’s seat of something even more complicated than the stuff it’s already been shown to fuck up, and hoping that goes okay.
Sasha@lemmy.blahaj.zone 3 weeks ago
I’m pretty sure they touch on those points in the paper, they knew they were overloading it and were looking at how it handled that in particular. My understanding is that they’re testing failure modes to try and probe the inner workings to some degree; they discuss the impact of filling up the context in the abstract, mention it’s designed to stress test and are particularly interested in memory limits, so I’m pretty sure they’ve deliberately chosen to not cater to an LLMs ideal conditions. It’s not really a real world use case of LLMs running a business (even if that’s the framing given initially), it’s an experiment meant to break them in a simulated environment. The last line of the abstract kind highlights this, they’re hoping to find flaws to improve the models generally.
Either way, I just meant to point out that they can absolutely just output junk as a failure mode.