Comment on Consumer GPUs to run LLMs

<- View Parent
curry@programming.dev ⁨4⁩ ⁨days⁩ ago

I tried to run Gemma 3 27B Q4K and was surprised how quickly the VRAM requirements blew up proportional to context window, especially compared to other models (all quantized) at similar size like Qwq 32B.

source
Sort:hotnewtop