Comment

brucethemoose@lemmy.world ⁨6⁩ ⁨months⁩ ago

Actually, to go ahead and answer, the “easiest” path would be LM Studio (which supports MLX quants natively and is not time intensive to install), and a DWQ quantization (which is a newer, higher quality variant of MLX quants).

Probably one of these models, depending on how much RAM you have:

huggingface.co/…/Magistral-Small-2506-4bit-DWQ

huggingface.co/…/Qwen3-30B-A3B-4bit-DWQ-0508

huggingface.co/…/GLM-4-32B-0414-4bit-DWQ

With a bit more time invested, you could try to set up Open Web UI as an alterantive interface (which has its own built in web search like Gemini): openwebui.com

And then use LM Studio (or some other MLX backend, or even free online API models) as the ‘engine’

source

Sort:hotnew top

WhirlpoolBrewer@lemmings.world ⁨6⁩ ⁨months⁩ ago
This is all new to me, so I’ll have to do a bit of homework on this. Thanks for the detailed and linked reply!

source
- brucethemoose@lemmy.world ⁨6⁩ ⁨months⁩ ago
  I was a bit mistaken, these are the models you should consider:
  
  huggingface.co/mlx-community/Qwen3-4B-4bit-DWQ
  
  huggingface.co/AnteriorAI/…/main
  
  huggingface.co/unsloth/Jan-nano-GGUF (specifically the UD-Q4 or UD-Q5 file)
  
  These are state-of-the-art, as far as I know.
  
  source
  - WhirlpoolBrewer@lemmings.world ⁨6⁩ ⁨months⁩ ago
    Awesome, I’ll give these a spin and see how it goes. Much appreciated!
    
    source