Comment

Comment on Apple to use Chinese giant Alibaba’s AI in iPhones

brucethemoose@lemmy.world ⁨8⁩ ⁨months⁩ ago

For context, Alibaba is behind Qwen 2.5, which is the best series of LLMs for desktop/self-hosting use. Most of the series is Apache licensed, free to use, and they’re what Deepseek based their smaller distillations on. Thier 32B/72B models, especially finetunes of them, can run circles around cheaper OpenAI models you’d need an $100,000+ fire-breathing server to run… if OpenAI actually realeased anything for public use.

So… Yeah, if I were Apple, I would’ve picked Qwen/Alibaba too. They’re the undisputed king of models that would fit in an iPhone’s memory pool, at the moment.

source

Sort:hotnew top

coherent_domain@infosec.pub ⁨8⁩ ⁨months⁩ ago
How do you Apache license a LLM? Do they just treat the weights as code?

source
- brucethemoose@lemmy.world ⁨8⁩ ⁨months⁩ ago
  It’s software, so yeah, I suppose. See for yourself: huggingface.co/Qwen/QwQ-32B-Preview
  
  Deepseek chose MIT: huggingface.co/…/DeepSeek-R1-Distill-Qwen-32B
  
  For whatever reason, the Chinese companies tend to go with very permissive licensing, while Cohere, Mistral, Meta (Llama), and some others add really weird commercial restrictions. IBM Granite is actually Apache 2.0 and way more “open” and documented than even the Chinese tech companies, but unfortunately their models are “small” (3B) and not very cutting edge.
  
  source
IndustryStandard@lemmy.world ⁨8⁩ ⁨months⁩ ago
Deepseek R1 is currently the selfhosting model to use

source
- brucethemoose@lemmy.world ⁨8⁩ ⁨months⁩ ago
  Some of the distillations are trained on top of Qwen 2.5.
  
  And for some cases, FuseAI (a special merge of several thinking models), Qwen Coder, EVA-Gutenberg Qwen, or some other specialized models do a better job than Deepseek 32B in certain niches.
  
  source