Comment

Comment on How to use GPUs over multiple computers for local AI?

just_another_person@lemmy.world ⁨6⁩ ⁨months⁩ ago

Why?

source

Sort:hotnew top

marauding_gibberish142@lemmy.dbzer0.com ⁨6⁩ ⁨months⁩ ago
It is because modern consumer GPUs do not have enough VRAM to load the 24B models. I want to run Mistral small locally.

source
- Natanox@discuss.tchncs.de ⁨6⁩ ⁨months⁩ ago
  May take a look at systems with the newer AMD SoC’s first. They utilize the systems’ RAM and come with a proper NPU, once ollama or mistral.rs are supporting those they might give you sufficient performance for your needs for way lower costs (incl. power consumption). Depending on how NPU support gets implemented it might even become possible to use NPU and GPU in tandem, that would probably enable pretty lowerful models to be run on consumer-grade hardware at reasonable speed.
  
  source
  - marauding_gibberish142@lemmy.dbzer0.com ⁨6⁩ ⁨months⁩ ago
    Thanks, but will NPUs integrated along with the CPU ever match the performance of a discrete GPU?
    
    source
    Natanox@discuss.tchncs.de ⁨6⁩ ⁨months⁩ ago
    Depends on which GPU you compare it with, what model you use, what kind of RAM it has to work with, ecetera. NPU’s are purpose-built chips after all. Unfortunately the whole tech is still very young, so we’ll have to wait for stuff like ollama to introduce native support for an apples-to-apples comparison. The raw numbers to however do look promising.
    
    source
  - just_another_person@lemmy.world ⁨6⁩ ⁨months⁩ ago
    It wouldn’t even matter. OP doesn’t understand how any of this works, and is instead just running rampant calling everything bullshit 😂
    
    source
    marauding_gibberish142@lemmy.dbzer0.com ⁨6⁩ ⁨months⁩ ago
    I’d prefer that you reply with examples/an explanation of what I’m doing wrong instead of cursing
    
    source
    -> View More Comments
- just_another_person@lemmy.world ⁨6⁩ ⁨months⁩ ago
  I assume your talking about a CUDA implementation here. There’s ways to do this with that system, and even sub-projects that expand on that. I’m mostly pointing how pointless it is for you to do this. What a waste of time and money.
  
  source
  - marauding_gibberish142@lemmy.dbzer0.com ⁨6⁩ ⁨months⁩ ago
    Used 3090s go for $800. I was planning to wait for the ARC B580s to go down in price to buy a few. The reason for the networked setup is because I didn’t find there to be enough PCIe lanes in any of the used computers I was looking at. If there’s either an affordable card with good performance and 48GB of VRAM, or there’s an affordable motherboard + CPU combo with a lot of PCIe lanes under $200, then I’ll gladly drop the idea of the distributed AI. I just need lots of VRAM and this is the only way I could think of.
    
    Thanks
    
    source
    just_another_person@lemmy.world ⁨6⁩ ⁨months⁩ ago
    PLEASE look back at the crypto mining rush of a decade ago. I implore you.
    
    You’re buying into something that doesn’t exist.
    
    source