DeepSeek's distilled new R1 AI model can run on a single GPU | TechCrunch

⁨181⁩ ⁨likes⁩

Submitted ⁨⁨5⁩ ⁨months⁩ ago⁩ by ⁨schizoidman@lemm.ee⁩ to ⁨technology@lemmy.world⁩

https://techcrunch.com/2025/05/29/deepseeks-distilled-new-r1-ai-model-can-run-on-a-single-gpu/

source

Comments

Sort:hotnew top

LodeMike@lemmy.today ⁨5⁩ ⁨months⁩ ago
So can a lot of other models.

“This load can be towed by a single vehicle”

source
blarth@thelemmy.club ⁨5⁩ ⁨months⁩ ago
7b trash model?

source
- vhstape@lemmy.sdf.org ⁨5⁩ ⁨months⁩ ago
  
  the Chinese AI lab also released a smaller, “distilled” version of its new R1, DeepSeek-R1-0528-Qwen3-8B, that DeepSeek claims beats comparably sized models on certain benchmarks
  
  Most models come in 1B, 7-8B, 12-14B, and 27+B parameter variants. According to the docs, they benchmarked the 8B model using an NVIDIA H20 (96 GB VRAM) and got between 144-1198 tokens/sec. Most consumer GPUs probably aren’t going to be able to keep up with
  
  source
  - avidamoeba@lemmy.ca ⁨5⁩ ⁨months⁩ ago
    It proved sqrt(2) irrational with 40tps on a 3090 here. The 32b R1 did it with 32tps but it thought a lot longer.
    
    source
    -> View More Comments
  - brucethemoose@lemmy.world ⁨5⁩ ⁨months⁩ ago
    Depends on the quantization.
    
    7B is small enough to run it in FP8 or a Marlin quant with SGLang/VLLM/TensorRT, so you can probably get very close to the H20 on a 3090 or 4090.
    
    source
- knighthawk0811@lemmy.world ⁨5⁩ ⁨months⁩ ago
  it’s distilled so it’s going to be smaller than any non distilled of the same quality
  
  source
- LainTrain@lemmy.dbzer0.com ⁨5⁩ ⁨months⁩ ago
  I’m genuinely curious what you do that a 7b model is “trash” to you? Like yeah sure a gippity now tends to beat out a mistral 7b but I’m pretty happy with my mistral most of the time if I ever even need ai at all.
  
  source
- TropicalDingdong@lemmy.world ⁨5⁩ ⁨months⁩ ago
  Yeah idk. I did some work with deepseek early on. I wasn’t impressed.
  
  HOWEVER…
  
  Some other things they’ve developed like deepsite, holy shit impressive.
  
  source
  - double_quack@lemm.ee ⁨5⁩ ⁨months⁩ ago
    Save me the search, please. What’s deepsite?
    
    source
    -> View More Comments
fogetaboutit@programming.dev ⁨5⁩ ⁨months⁩ ago
ew probably still censored.

source
- T156@lemmy.world ⁨5⁩ ⁨months⁩ ago
  The censorship only exists on the version they host, which is fair enough. If they’re running it themselves in China, they can’t just break the law.
  
  If you run it yourself, the censorship isn’t there.
  
  source
  - jaschen@lemm.ee ⁨5⁩ ⁨months⁩ ago
    Untrue, I downloaded the vanilla version and it’s hardcoded in.
    
    source
  - MonkderVierte@lemmy.ml ⁨5⁩ ⁨months⁩ ago
    Yeah, i think the censoring in the LLM data itself would be pretty vulnerable.
    
    source
- Mwa@lemm.ee ⁨5⁩ ⁨months⁩ ago
  You can self host it right??
  
  source
  - jaschen@lemm.ee ⁨5⁩ ⁨months⁩ ago
    The self hosted model has hard coded censored content.
    
    source
  - fogetaboutit@programming.dev ⁨5⁩ ⁨months⁩ ago
    if the model is censored… then what, retraining it? Or doing it from scratch like what open-r1 is doing?
    
    source