Comment

southernbeaver@lemmy.world ⁨3⁩ ⁨months⁩ ago

My HomeAssistant is running on Unraid but I have an old NVIDIA Quadro P5000. I really want to run a vision model so that it can describe who is at my doorbell.

source

Sort:hotnew top

brucethemoose@lemmy.world ⁨3⁩ ⁨months⁩ ago
Oh actually that’s a good card for LLM serving!

Use the llama.cpp server from source, it has better support for Pascal cards than anything else:

github.com/ggml-org/llama.cpp/…/multimodal.md

Gemma 3 is a hair too big (like 17-18GB), so I’d start with InternVL 14B Q5K XL: huggingface.co/…/InternVL3-14B-Instruct-GGUF

Or Mixtral 24B IQ4_XS for more ‘text’ intelligence than vision: huggingface.co/…/Mistral-Small-3.2-24B-Instruct-2…

source