Comment on [JS Required] MiniMax M1 model claims Chinese LLM crown from DeepSeek - plus it's true open-source

<- View Parent
xcjs@programming.dev ⁨1⁩ ⁨day⁩ ago

That’s not how distillation works if I understand what you’re trying to explain.

If you distill model A to a smaller model, you just get a smaller version of model A with the same approximate distribution curve of parameters, but fewer of them. You can’t distill Llama into Deepseek R1.

I’ve been able to run distillations of Deepseek R1 up to 70B, and they’re all censored still. There is a version of Deepseek R1 “patched” with western values called R1-1776 that will answer topics censored by the Chinese government, however.

source
Sort:hotnewtop