Comment on DeepSeek's distilled new R1 AI model can run on a single GPU | TechCrunch
brucethemoose@lemmy.world 5 days agoDepends on the quantization.
7B is small enough to run it in FP8 or a Marlin quant with SGLang/VLLM/TensorRT, so you can probably get very close to the H20 on a 3090 or 4090.