Comment on Very large amounts of gaming gpus vs AI gpus
brucethemoose@lemmy.world 3 days agoOne last thing, I’ve heard mixed things about 235B, hence there might be a smaller, more optimal LLM for whatever you do, if it’s something targeted?
For instance, Kimi 72B is quite a good coding model: huggingface.co/moonshotai/Kimi-Dev-72B
It might fit in vllm (as an AWQ) with 2x 4090s. It and would easily fit as an exl3.
rezz@lemmy.world 3 days ago
What do I need to run Kimi? Does it have apple silicon compatible releases? It seems promising.
brucethemoose@lemmy.world 3 days ago
Depends. You’re in luck, as someone made a DWQ (which is the most optimal way to run it, and should work in LM Studio): huggingface.co/mlx-community/…/main
It’s chonky though. The weights alone are like 40GB, so assume 50GB of VRAM allocation for some context. I’m not sure what Macs that equates to… 96GB? Can the 64GB can allocate enough?
Otherwise, the requirement is basically a 5090. You can stuff it into 32GB as an exl3.
Note that it is going to be slow on Macs, being a dense 72B model.