Comment

RagingHungryPanda@lemm.ee ⁨6⁩ ⁨months⁩ ago

The coder model has only that one. The ones bigger than that are like 20GB+, and my GPU has 16GB. I’ve only tried two models, but it looked like the size balloons after that, so that may be the biggest models that I can run.

source

Sort:hotnew top

marauding_gibberish142@lemmy.dbzer0.com ⁨6⁩ ⁨months⁩ ago
Do you have any recommendations for running the Mistral small model? I’m very interested in it alongside CodeLlama, OogaBooga and others

source
- RagingHungryPanda@lemm.ee ⁨6⁩ ⁨months⁩ ago
  I haven’t tried those, so not really, but with open web UI, you can download and run anything, just make sure it fits in your vram so it doesn’t run on the CPU. The deep seek one is decent. I find that i like chatgpt 4-o better, but it’s still good.
  
  source
  - marauding_gibberish142@lemmy.dbzer0.com ⁨6⁩ ⁨months⁩ ago
    In general how much VRAM do I need for 14B and 24B models?
    
    source
    FrankLaskey@lemmy.ml ⁨6⁩ ⁨months⁩ ago
    It really depends on how you quantize the model and the K/V cache as well. This is a useful calculator. smcleod.net/vram-estimator/ I can comfortably fit most 32b models quantized to 4-bit (usually KVM or IQ4XS) on my 3090’s 24 GB of VRAM with a reasonable context size. If you’re going to be needing a much larger context window to input large documents etc then you’d need to go smaller with the model size (14b, 27b etc) or get a multi GPU set up or something with unified memory and a lot of ram (like the Mac Minis others are mentioning).
    
    source
    -> View More Comments