Comment on Selfhost an LLM

iii@mander.xyz ⁨1⁩ ⁨week⁩ ago

One of these projects might be of interest to you:

Do note that CPU inference is quite a lot slower than CPU. I currently like the quantized deepseek models as currently the best balanced between quality of replies and inference time when not using GPU.

source
Sort:hotnewtop