Comment on Consumer GPUs to run LLMs

<- View Parent
FrankLaskey@lemmy.ml ⁨1⁩ ⁨week⁩ ago

Oh and I typically get 16-20 tok/s running a 32b model on Ollama using Open WebUI. Also I have experienced issues with 4-bit quantization for the K/V cache on some models myself so just FYI

source
Sort:hotnewtop