Comment on Consumer GPUs to run LLMs
curry@programming.dev 4 days agoI tried to run Gemma 3 27B Q4K and was surprised how quickly the VRAM requirements blew up proportional to context window, especially compared to other models (all quantized) at similar size like Qwq 32B.