Comment on Consumer GPUs to run LLMs
FrankLaskey@lemmy.ml 1 week agoOh and I typically get 16-20 tok/s running a 32b model on Ollama using Open WebUI. Also I have experienced issues with 4-bit quantization for the K/V cache on some models myself so just FYI