Comment on I've just created c/Ollama!
brucethemoose@lemmy.world 1 day agoActually, to go ahead and answer, the “easiest” path would be LM Studio (which supports MLX quants natively and is not time intensive to install), and a DWQ quantization (which is a newer, higher quality variant of MLX quants).
Probably one of these models, depending on how much RAM you have:
huggingface.co/…/Magistral-Small-2506-4bit-DWQ
huggingface.co/…/Qwen3-30B-A3B-4bit-DWQ-0508
huggingface.co/…/GLM-4-32B-0414-4bit-DWQ
With a bit more time invested, you could try to set up Open Web UI as an alterantive interface (which has its own built in web search like Gemini): openwebui.com
And then use LM Studio (or some other MLX backend, or even free online API models) as the ‘engine’
WhirlpoolBrewer@lemmings.world 1 day ago
This is all new to me, so I’ll have to do a bit of homework on this. Thanks for the detailed and linked reply!
brucethemoose@lemmy.world 1 day ago
I was a bit mistaken, these are the models you should consider:
huggingface.co/mlx-community/Qwen3-4B-4bit-DWQ
huggingface.co/AnteriorAI/…/main
huggingface.co/unsloth/Jan-nano-GGUF (specifically the UD-Q4 or UD-Q5 file)
These are state-of-the-art, as far as I know.
WhirlpoolBrewer@lemmings.world 1 day ago
Awesome, I’ll give these a spin and see how it goes. Much appreciated!