The key is which one, and how though.
For the really sparse models, you might be better off trying ik_llama.cpp, especially if you are targeting a ‘small’ quant.
You can use Vulkan fairly easily as long as you have 8G vram
The key is which one, and how though.
For the really sparse models, you might be better off trying ik_llama.cpp, especially if you are targeting a ‘small’ quant.
Passerby6497@lemmy.world 1 day ago
Only got 4G vram, unfortunately