Even if it was true, your server can’t handle a couple hundred simultaneous requests? That’s not promising either. Although at least that would be easier to fix than the real problem, which is incredibly obvious to anyone who has ever used this technology, and that’s that it doesn’t fucking work, and is flawed on a fundamental level.
TastehWaffleZ@lemmy.world 1 day ago
That sounds like complete damage control lies. Why would the AI think the chef had finished prepping the sauce just because there was heavy usage??
Ulrich@feddit.org 1 day ago
KairuByte@lemmy.dbzer0.com 1 day ago
If this was a tech demo, it tracks that they wouldn’t be using overpowered hardware. Why lug around a full server when they can just load up the software on a laptop, considering they weren’t expecting hundreds of invokes at the exact same moment.
synae@lemmy.sdf.org 23 hours ago
“lug around”? the server(s) are 100% in a data center, no way this is a single computer on prem. no company, especially facebook, deploys software that way in 2025
KairuByte@lemmy.dbzer0.com 23 hours ago
It really depends. A local machine is guaranteed to not have issues if the general internet goes down. It’s also going to reduce latency considerably.
There are many reasons to have a dev box local to the demonstration. Just because they wouldn’t deploy it that way in production doesn’t mean they wouldn’t deploy a demo in that same way.
masterspace@lemmy.ca 1 day ago
How is it fundamentally flawed?
Ulrich@feddit.org 1 day ago
Check out the OP
PhilipTheBucket@piefed.social 1 day ago
Yeah it’s a bunch of shit. I’m not an expert obviously, just talking out of my ass, but:
Sasha@lemmy.blahaj.zone 1 day ago
LLMs can degrade by giving “wrong” answers, but not because of network congestion ofc.
That paper is fucking hilarious, but the tl;dr is that when asked to manage a vending machine business for an extended period of time, they eventually go completely insane. Some have an existential crisis, some call the whole thing a conspiracy and call the FBI, etc. it’s amazing how trash they are.
PhilipTheBucket@piefed.social 1 day ago
Initial thought: Well… but this is a transparently absurd way to set up an ML system to manage a vending machine. I mean it is a useful data point I guess, but to me it leads to the conclusion “Even though LLMs sound to humans like they know what they’re doing, they does not, don’t just stick the whole situation into the LLM input and expect good decisions and strategies to come out of the output, you have to embed it into a more capable and structured system for any good to come of it.”
Updated thought, after reading a little bit of the paper: Holy Christ on a pancake. Is this architecture what people have been meaning by “AI agents” this whole time I’ve been hearing about them? Yeah this isn’t going to work. What the fuck, of course it goes insane over time. I stand corrected, I guess, this is valid research pointing out the stupidity of basically putting the LLM in the driver’s seat of something even more complicated than the stuff it’s already been shown to fuck up, and hoping that goes okay.
Sasha@lemmy.blahaj.zone 21 hours ago
I’m pretty sure they touch on those points in the paper, they knew they were overloading it and were looking at how it handled that in particular. My understanding is that they’re testing failure modes to try and probe the inner workings to some degree; they discuss the impact of filling up the context in the abstract, mention it’s designed to stress test and are particularly interested in memory limits, so I’m pretty sure they’ve deliberately chosen to not cater to an LLMs ideal conditions. It’s not really a real world use case of LLMs running a business (even if that’s the framing given initially), it’s an experiment meant to break them in a simulated environment. The last line of the abstract kind highlights this, they’re hoping to find flaws to improve the models generally.
Either way, I just meant to point out that they can absolutely just output junk as a failure mode.