Comment on DeepSeek's distilled new R1 AI model can run on a single GPU | TechCrunch

<- View Parent
brucethemoose@lemmy.world ⁨5⁩ ⁨days⁩ ago

Depends on the quantization.

7B is small enough to run it in FP8 or a Marlin quant with SGLang/VLLM/TensorRT, so you can probably get very close to the H20 on a 3090 or 4090.

source
Sort:hotnewtop