Comment on Please suggest some good self-hostable RAG for my LLM.
kwa@lemmy.zip 5 weeks agoI’m new to this and I was wondering why you don’t recommend ollama? This is the first one I managed to run and it seemed decent but if there are better alternatives I’m interested
brucethemoose@lemmy.world 5 weeks ago
Pretty much everything has an API :P
ollama is OK because its easy and automated, but you can get higher performance, better vram efficiency, and better samplers from either kobold.cpp or tabbyAPI.
I’d recommend kobold.cpp for very short context (like 4K or less) or if you need to partially offload the model to CPU. Otherwise TabbyAPI, as it’s generally faster (but GPU only) and much better at long context through its great k/v cache quantization).
They all have OpenAI APIs, though kobold.cpp also has its own web ui.