Yeah it’s a bunch of shit. I’m not an expert obviously, just talking out of my ass, but:
- Running inference for all the devices in the building to “our dev server” would not have maintained a usable level of response time for any of them, unless he meant to say “the dev cluster” or something and his home wifi glitched right at that moment and made it sound different
- LLMs don’t degrade by giving wrong answers, they degrade by stopping producing tokens
- Meta already has shown itself to be okay with lying
- GUYS JUST USE FUCKING CANNED ANSWERS WITH THE RIGHT SOUNDING VOICE, THIS ISN’T ROCKET SCIENCE, THAT’S HOW YOU DO DEMOS WHEN YOUR SHIT’S NOT DONE YET
Ulrich@feddit.org 6 months ago
Even if it was true, your server can’t handle a couple hundred simultaneous requests? That’s not promising either. Although at least that would be easier to fix than the real problem, which is incredibly obvious to anyone who has ever used this technology, and that’s that it doesn’t fucking work, and is flawed on a fundamental level.
KairuByte@lemmy.dbzer0.com 6 months ago
If this was a tech demo, it tracks that they wouldn’t be using overpowered hardware. Why lug around a full server when they can just load up the software on a laptop, considering they weren’t expecting hundreds of invokes at the exact same moment.
synae@lemmy.sdf.org 6 months ago
“lug around”? the server(s) are 100% in a data center, no way this is a single computer on prem. no company, especially facebook, deploys software that way in 2025
KairuByte@lemmy.dbzer0.com 6 months ago
It really depends. A local machine is guaranteed to not have issues if the general internet goes down. It’s also going to reduce latency considerably.
There are many reasons to have a dev box local to the demonstration. Just because they wouldn’t deploy it that way in production doesn’t mean they wouldn’t deploy a demo in that same way.
masterspace@lemmy.ca 6 months ago
How is it fundamentally flawed?
Ulrich@feddit.org 6 months ago
Check out the OP