How much system RAM, and what kind? DDR5?
Comment on Frustratingly bad at self hosting. Can someone help me access LLMs on my rig from my phone
BlackSnack@lemmy.zip 1 week agoBet. Looking into that now. Thanks!
I believe I have 11g of vram, so I should be good to run decent models from what I’ve been told by the other AIs.
brucethemoose@lemmy.world 1 week ago
brucethemoose@lemmy.world 1 week ago
In case I miss your reply, assuming a 3080 + 64 GB of RAM, you want the IQ4_KSS (or IQ3_KS, for more RAM for tabs and stuff) version of this:
huggingface.co/ubergarm/GLM-4.5-Air-GGUF
Part of it will run on your GPU, part will live in system RAM, but ik_llama.cpp does the quantizations split in a particularly efficient way for these kind of ‘MoE’ models.
If you ‘only’ have 32GB RAM or less, that’s tricker, and the next question is what kind of speeds do you want. But it’s probably best to wait a few days and see how Qwen3 80B looks when it comes out. Or just go with the IQ4_K version of this: huggingface.co/…/Qwen3-30B-A3B-Thinking-2507-GGUF
And you don’t really need the hyper optimization of ik_llama.cpp for Qwen3 30B.
Alternatively, you could try to squeeze Gemma 27B into that 11GB VRAM, but it would be tight.