Comment on Selfhost an LLM
iii@mander.xyz 1 week ago
One of these projects might be of interest to you:
Do note that CPU inference is quite a lot slower than CPU. I currently like the quantized deepseek models as currently the best balanced between quality of replies and inference time when not using GPU.
ProperlyProperTea@lemmy.ml 1 week ago
Indeed, other than being able to get the model running, having decent hardware is the next most important part.
3060 12gb is probably cheapest card to get, 3090 or other 24gb card if you can get it