That’s not strictly true.
I have a Ryzen destkop, 7800, 3090, and 128GB DDR5. And I can run the full GLM 4.6 with quite acceptable token divergence compared to the unquantized model, see: huggingface.co/…/GLM-4.6-128GB-RAM-IK-GGUF
If I had a EPYC/Threadripper homelab, I could run Deepseek the same way.
khepri@lemmy.world 3 weeks ago
I run quantized versions on deepseek that are usable enough for chat, and it’s on a home set that is so old and slow by today’s standards I won’t even the specs lol. Let’s just say the rig is from 2018 and it wasn’t near the best even back then.