Comment on Microsoft Bans Employees From Using DeepSeek App
brucethemoose@lemmy.world 4 days agoCompletely depends on your laptop hardware, but generally:
- TabbyAPI (exllamav2/exllamav3)
- ik_llama.cpp, and its openai server
- An MLX host with one of the new distillation quantizations
- Text-gen-web-ui (slow, but supports a lot of samplers and some exotic quantizations well)
- SGLang (extremely fast for parallel calls if thats what you want).