I run this setup with 36GB (32+4). Local LLMs can be really effective BUT you are constrained by context size in a way you aren’t on cloud services.
Cline supports running a local model through lmstudio but my experience feeding it any significant tasks is it just can’t handle reading and holding the contexts to build components for enterprise scale applications.
I use Claude to write a lot of utility one-off scripts. With a maximum window of 1M tokens I can hit 30+% context just writing Python scripts. API contracts, development standards, existing reusable modules, and sometimes reading the code/documentation of the services I’m going to be calling.
My MacBook can’t handle 300k token contexts. 30k seems doable. I should see how it handles my utility script folder…
Anyway that’s still no Claude but if you need a cheaper model and you can afford for developers to spend time on it before ultimately deciding they need to spend for Claude or Codex or Gemini, then rubbing a local model on a beefy MacBook is 100% an option.
Stepping up from there to building a locally hosted LLM is probably the worst of all worlds. It will be a beefy CapEx, prone to saturation by all the users, and you will most likely still have to punt the hardest jobs to cloud AI. It can certainly be done and done well, but the best example I know runs on $250-500k worth of hardware (to service a pretty big number of users to be fair).
DaTingGoBrrr@lemmy.ml 6 days ago
I am running qwen 3.5 locally using llama.cpp on 8gb of VRAM and 16 gigs of RAM. It works well enough with a 4B to 9B parameter model along with quantization and MTP. More optimizations are on the way with turboquant and possibly other tech.
It’s just there to assist me, not do all the work, so I am happy as long as I can self host it.
I can’t say how well my specs would work in a professional setting but for personal use a MacBook should be sufficient in my opinion.