Comment on Frustratingly bad at self hosting. Can someone help me access LLMs on my rig from my phone
tal@lemmy.today 17 hours agoOllama does have some features that make it easier to use for a first-time user, including:
-
Calculating automatically how many layers can fit in VRAM and loading that many layers and splitting between CPU and VRAM. kobold.cpp can’t do that automatically yet.
-
Automatically unloading the model from VRAM after a period of inactivity.
I had an easier time setting up ollama than other stuff, and OP does apparently already have it set up.
brucethemoose@lemmy.world 16 hours ago
Yeah. But it also messes stuff up from the llama.cpp baseline, and hides or doesn’t support some features/optimizations, and definitely doesn’t support the more efficient iq_k quants of ik_llama.cpp and its specialzied MoE offloading.
And that’s not even getting into the various controversies around ollama (like broken GGUFs or indications they’re going closed source in some form).
…It just depends on how much performance you want to squeeze out, and how much time you want to spend on the endeavor. Small LLMs are kinda marginal though, so its pretty important IMO.