Comment on Very large amounts of gaming gpus vs AI gpus

<- View Parent
brucethemoose@lemmy.world ⁨3⁩ ⁨days⁩ ago

Ah, here we go:

huggingface.co/ubergarm/Qwen3-235B-A22B-GGUF

Ubergarm is great. See this part in particular: huggingface.co/ubergarm/Qwen3-235B-A22B-GGUF#quic…

You will need to modify the syntax for 2x GPUs a bit. I’d recommend starting f16/f16 K/V cache with 32K (to see if that’s acceptable), and try not go lower than q8_0/q5_1 (as the V is more amenable to quantization).

source
Sort:hotnewtop