Comment on I've just created c/Ollama!
brucethemoose@lemmy.world 1 week agoOh actually that’s a good card for LLM serving!
Use the llama.cpp server from source, it has better support for Pascal cards than anything else:
github.com/ggml-org/llama.cpp/…/multimodal.md
Gemma 3 is a hair too big (like 17-18GB), so I’d start with InternVL 14B Q5K XL: huggingface.co/…/InternVL3-14B-Instruct-GGUF
Or Mixtral 24B IQ4_XS for more ‘text’ intelligence than vision: huggingface.co/…/Mistral-Small-3.2-24B-Instruct-2…