I like local LLMs as much as the next person but the issue is that doesn’t scale the way companies need it to.
As a personal assistant? Sure, I agree. They’re useful at times. But as soon as you need multiple to run simultaneously you’re gonna hit resource issues.
What Oracle and others were banking on is that you have engineers and others running a lot of agents in parallel composing different things together. Or having one input that multiple serverside agents take and execute numerous tasks on. That’s something you can’t run on an individual machine right now. And with the way they currently work I don’t envision they will anytime soon.
vacuumflower@lemmy.sdf.org 19 hours ago
There are lightweight models as good as some heavier ones. It’s a bit like Intel’s tick-tock advertised process. Heavy memory-hungry models are “tick”, but there’s “tock”- say, “lfm2.5-thinking” model, the light version, in the ollama repository seems almost as good as qwen3.5 for me, except it’s very lightweight and lightning-fast compared to that.
These things are being optimized. It’s just that in the market capture phase nobody bothered.
That they are not being used correctly - yeah, absolutely, my idea of their proper use is some graph-based system with each node being processed by a select LLM (or just piece of logic) with select set of tools and actions and choices available for each. A bit like ComfyUI, but something saner than a zoom-based web UI. Like MacOS Automator application, rather.