7b trash model?
DeepSeek's distilled new R1 AI model can run on a single GPU | TechCrunch
Submitted 5 months ago by schizoidman@lemm.ee to technology@lemmy.world
https://techcrunch.com/2025/05/29/deepseeks-distilled-new-r1-ai-model-can-run-on-a-single-gpu/
Comments
- blarth@thelemmy.club 5 months ago- vhstape@lemmy.sdf.org 5 months ago- the Chinese AI lab also released a smaller, “distilled” version of its new R1, DeepSeek-R1-0528-Qwen3-8B, that DeepSeek claims beats comparably sized models on certain benchmarks - Most models come in 1B, 7-8B, 12-14B, and 27+B parameter variants. According to the docs, they benchmarked the 8B model using an NVIDIA H20 (96 GB VRAM) and got between 144-1198 tokens/sec. Most consumer GPUs probably aren’t going to be able to keep up with - avidamoeba@lemmy.ca 5 months ago- It proved sqrt(2) irrational with 40tps on a 3090 here. The 32b R1 did it with 32tps but it thought a lot longer. 
- brucethemoose@lemmy.world 4 months ago- Depends on the quantization. - 7B is small enough to run it in FP8 or a Marlin quant with SGLang/VLLM/TensorRT, so you can probably get very close to the H20 on a 3090 or 4090. 
 
- knighthawk0811@lemmy.world 5 months ago- it’s distilled so it’s going to be smaller than any non distilled of the same quality 
- LainTrain@lemmy.dbzer0.com 5 months ago- I’m genuinely curious what you do that a 7b model is “trash” to you? Like yeah sure a gippity now tends to beat out a mistral 7b but I’m pretty happy with my mistral most of the time if I ever even need ai at all. 
- TropicalDingdong@lemmy.world 5 months ago- Yeah idk. I did some work with deepseek early on. I wasn’t impressed. - HOWEVER… - Some other things they’ve developed like deepsite, holy shit impressive. - double_quack@lemm.ee 5 months ago- Save me the search, please. What’s deepsite? 
 
 
- fogetaboutit@programming.dev 5 months ago- ew probably still censored. - T156@lemmy.world 5 months ago- The censorship only exists on the version they host, which is fair enough. If they’re running it themselves in China, they can’t just break the law. - If you run it yourself, the censorship isn’t there. - jaschen@lemm.ee 4 months ago- Untrue, I downloaded the vanilla version and it’s hardcoded in. 
- MonkderVierte@lemmy.ml 4 months ago- Yeah, i think the censoring in the LLM data itself would be pretty vulnerable. 
 
- Mwa@lemm.ee 5 months ago- You can self host it right?? - jaschen@lemm.ee 4 months ago- The self hosted model has hard coded censored content. 
- fogetaboutit@programming.dev 4 months ago- if the model is censored… then what, retraining it? Or doing it from scratch like what open-r1 is doing? 
 
 
LodeMike@lemmy.today 5 months ago
So can a lot of other models.
“This load can be towed by a single vehicle”