Comment on I've just created c/Ollama!
southernbeaver@lemmy.world 3 weeks agoMy HomeAssistant is running on Unraid but I have an old NVIDIA Quadro P5000. I really want to run a vision model so that it can describe who is at my doorbell.
Comment on I've just created c/Ollama!
southernbeaver@lemmy.world 3 weeks agoMy HomeAssistant is running on Unraid but I have an old NVIDIA Quadro P5000. I really want to run a vision model so that it can describe who is at my doorbell.
brucethemoose@lemmy.world 2 weeks ago
Oh actually that’s a good card for LLM serving!
Use the llama.cpp server from source, it has better support for Pascal cards than anything else:
github.com/ggml-org/llama.cpp/…/multimodal.md
Gemma 3 is a hair too big (like 17-18GB), so I’d start with InternVL 14B Q5K XL: huggingface.co/…/InternVL3-14B-Instruct-GGUF
Or Mixtral 24B IQ4_XS for more ‘text’ intelligence than vision: huggingface.co/…/Mistral-Small-3.2-24B-Instruct-2…