Deepseek R1 is currently the selfhosting model to use
Comment on Apple to use Chinese giant Alibaba’s AI in iPhones
brucethemoose@lemmy.world 1 week ago
For context, Alibaba is behind Qwen 2.5, which is the best series of LLMs for desktop/self-hosting use. Most of the series is Apache licensed, free to use, and they’re what Deepseek based their smaller distillations on. Thier 32B/72B models, especially finetunes of them, can run circles around cheaper OpenAI models you’d need an $100,000+ fire-breathing server to run… if OpenAI actually realeased anything for public use.
So… Yeah, if I were Apple, I would’ve picked Qwen/Alibaba too. They’re the undisputed king of models that would fit in an iPhone’s memory pool, at the moment.
IndustryStandard@lemmy.world 1 week ago
brucethemoose@lemmy.world 1 week ago
Some of the distillations are trained on top of Qwen 2.5.
And for some cases, FuseAI (a special merge of several thinking models), Qwen Coder, EVA-Gutenberg Qwen, or some other specialized models do a better job than Deepseek 32B in certain niches.
coherent_domain@infosec.pub 1 week ago
How do you Apache license a LLM? Do they just treat the weights as code?
brucethemoose@lemmy.world 1 week ago
It’s software, so yeah, I suppose. See for yourself: huggingface.co/Qwen/QwQ-32B-Preview
Deepseek chose MIT: huggingface.co/…/DeepSeek-R1-Distill-Qwen-32B
For whatever reason, the Chinese companies tend to go with very permissive licensing, while Cohere, Mistral, Meta (Llama), and some others add really weird commercial restrictions. IBM Granite is actually Apache 2.0 and way more “open” and documented than even the Chinese tech companies, but unfortunately their models are “small” (3B) and not very cutting edge.