Comment

Comment on Edward Snowden slams Nvidia's RTX 50-series 'F-tier value,' whistleblows on lackluster VRAM capacity

eager_eagle@lemmy.world ⁨8⁩ ⁨months⁩ ago

I bet he just wants a card to self host models and not give companies his data, but the amount of vram is indeed ridiculous.

source

Sort:hotnew top

jeena@piefed.jeena.net ⁨8⁩ ⁨months⁩ ago
Exactly, I'm in the same situation now and the 8GB in those cheaper cards don't even let you run a 13B model. I'm trying to research if I can run a 13B one on a 3060 with 12 GB.

source
- TheHobbyist@lemmy.zip ⁨8⁩ ⁨months⁩ ago
  You can. I’m running a 14B deepseek model on mine. It achieves 28 t/s.
  
  source
  - jeena@piefed.jeena.net ⁨8⁩ ⁨months⁩ ago
    Oh nice, that's faster than I imagined.
    
    source
  - levzzz@lemmy.world ⁨8⁩ ⁨months⁩ ago
    You need a pretty large context window to fit all the reasoning, ollama forces 2048 by default and more uses more memory
    
    source
  - Viri4thus@feddit.org ⁨8⁩ ⁨months⁩ ago
    I also have a 3060, can you detail which framework (sglang, ollama, etc) you are using and how you got that speed? i’m having trouble reaching that level of performance. Thx
    
    source
    TheHobbyist@lemmy.zip ⁨8⁩ ⁨months⁩ ago
    Ollama, latest version. I have it setup with Open-WebUI (though that shouldn’t matter). The 14B is around 9GB, which easily fits in the 12GB.
    
    I’m repeating the 28 t/s from memory, but even if I’m wrong it’s easily above 20.
    
    Specifically, I’m running this model: ollama.com/…/deepseek-r1:14b-qwen-distill-q4_K_M
    
    source
    -> View More Comments
- manicdave@feddit.uk ⁨8⁩ ⁨months⁩ ago
  I’m running deepseek-r1:14b on a 12GB rx6700. It just about fits in memory and is pretty fast.
  
  source