Comment

Comment on Very large amounts of gaming gpus vs AI gpus

brucethemoose@lemmy.world ⁨3⁩ ⁨months⁩ ago

One last thing, I’ve heard mixed things about 235B, hence there might be a smaller, more optimal LLM for whatever you do, if it’s something targeted?

For instance, Kimi 72B is quite a good coding model: huggingface.co/moonshotai/Kimi-Dev-72B

It might fit in vllm (as an AWQ) with 2x 4090s. It and would easily fit as an exl3.

Sort:hotnew top

rezz@lemmy.world ⁨3⁩ ⁨months⁩ ago
What do I need to run Kimi? Does it have apple silicon compatible releases? It seems promising.

source
- brucethemoose@lemmy.world ⁨3⁩ ⁨months⁩ ago
  Depends. You’re in luck, as someone made a DWQ (which is the most optimal way to run it, and should work in LM Studio): huggingface.co/mlx-community/…/main
  
  It’s chonky though. The weights alone are like 40GB, so assume 50GB of VRAM allocation for some context. I’m not sure what Macs that equates to… 96GB? Can the 64GB can allocate enough?
  
  Otherwise, the requirement is basically a 5090. You can stuff it into 32GB as an exl3.
  
  Note that it is going to be slow on Macs, being a dense 72B model.
  
  source