Comment

Comment on Edward Snowden slams Nvidia's RTX 50-series 'F-tier value,' whistleblows on lackluster VRAM capacity

<- View Parent

TheHobbyist@lemmy.zip ⁨8⁩ ⁨months⁩ ago

You can. I’m running a 14B deepseek model on mine. It achieves 28 t/s.

source

Sort:hotnew top

jeena@piefed.jeena.net ⁨8⁩ ⁨months⁩ ago
Oh nice, that's faster than I imagined.

source
levzzz@lemmy.world ⁨8⁩ ⁨months⁩ ago
You need a pretty large context window to fit all the reasoning, ollama forces 2048 by default and more uses more memory

source
Viri4thus@feddit.org ⁨8⁩ ⁨months⁩ ago
I also have a 3060, can you detail which framework (sglang, ollama, etc) you are using and how you got that speed? i’m having trouble reaching that level of performance. Thx

source
- TheHobbyist@lemmy.zip ⁨8⁩ ⁨months⁩ ago
  Ollama, latest version. I have it setup with Open-WebUI (though that shouldn’t matter). The 14B is around 9GB, which easily fits in the 12GB.
  
  I’m repeating the 28 t/s from memory, but even if I’m wrong it’s easily above 20.
  
  Specifically, I’m running this model: ollama.com/…/deepseek-r1:14b-qwen-distill-q4_K_M
  
  source
  - Viri4thus@feddit.org ⁨8⁩ ⁨months⁩ ago
    Ty. I’ll try ollama with the Q-4-M quantization. I wouldn’t expect to see a difference between ollama and SGlang.
    
    source
  - jeena@piefed.jeena.net ⁨8⁩ ⁨months⁩ ago
    Thanks for the additional information, that helped me to decide to get the 3060 12G instead of the 4060 8G. They have almost the same price but from what I gather when it comes to my use cases the 3060 12G seems to fit better even though it is a generation older. The memory bus is wider and it has more VRAM. Both video editing and the smaller LLMs should be working well enough.
    
    source