May take a look at systems with the newer AMD SoC’s first. They utilize the systems’ RAM and come with a proper NPU, once ollama or mistral.rs are supporting those they might give you sufficient performance for your needs for way lower costs (incl. power consumption). Depending on how NPU support gets implemented it might even become possible to use NPU and GPU in tandem, that would probably enable pretty lowerful models to be run on consumer-grade hardware at reasonable speed.
Depends on which GPU you compare it with, what model you use, what kind of RAM it has to work with, ecetera. NPU’s are purpose-built chips after all. Unfortunately the whole tech is still very young, so we’ll have to wait for stuff like ollama to introduce native support for an apples-to-apples comparison. The raw numbers to however do look promising.
I assume your talking about a CUDA implementation here. There’s ways to do this with that system, and even sub-projects that expand on that. I’m mostly pointing how pointless it is for you to do this. What a waste of time and money.
Used 3090s go for $800. I was planning to wait for the ARC B580s to go down in price to buy a few. The reason for the networked setup is because I didn’t find there to be enough PCIe lanes in any of the used computers I was looking at. If there’s either an affordable card with good performance and 48GB of VRAM, or there’s an affordable motherboard + CPU combo with a lot of PCIe lanes under $200, then I’ll gladly drop the idea of the distributed AI. I just need lots of VRAM and this is the only way I could think of.
marauding_gibberish142@lemmy.dbzer0.com 1 week ago
It is because modern consumer GPUs do not have enough VRAM to load the 24B models. I want to run Mistral small locally.
Natanox@discuss.tchncs.de 1 week ago
May take a look at systems with the newer AMD SoC’s first. They utilize the systems’ RAM and come with a proper NPU, once ollama or mistral.rs are supporting those they might give you sufficient performance for your needs for way lower costs (incl. power consumption). Depending on how NPU support gets implemented it might even become possible to use NPU and GPU in tandem, that would probably enable pretty lowerful models to be run on consumer-grade hardware at reasonable speed.
marauding_gibberish142@lemmy.dbzer0.com 1 week ago
Thanks, but will NPUs integrated along with the CPU ever match the performance of a discrete GPU?
Natanox@discuss.tchncs.de 1 week ago
Depends on which GPU you compare it with, what model you use, what kind of RAM it has to work with, ecetera. NPU’s are purpose-built chips after all. Unfortunately the whole tech is still very young, so we’ll have to wait for stuff like ollama to introduce native support for an apples-to-apples comparison. The raw numbers to however do look promising.
just_another_person@lemmy.world 1 week ago
It wouldn’t even matter. OP doesn’t understand how any of this works, and is instead just running rampant calling everything bullshit 😂
marauding_gibberish142@lemmy.dbzer0.com 1 week ago
I’d prefer that you reply with examples/an explanation of what I’m doing wrong instead of cursing
just_another_person@lemmy.world 1 week ago
I assume your talking about a CUDA implementation here. There’s ways to do this with that system, and even sub-projects that expand on that. I’m mostly pointing how pointless it is for you to do this. What a waste of time and money.
marauding_gibberish142@lemmy.dbzer0.com 1 week ago
Used 3090s go for $800. I was planning to wait for the ARC B580s to go down in price to buy a few. The reason for the networked setup is because I didn’t find there to be enough PCIe lanes in any of the used computers I was looking at. If there’s either an affordable card with good performance and 48GB of VRAM, or there’s an affordable motherboard + CPU combo with a lot of PCIe lanes under $200, then I’ll gladly drop the idea of the distributed AI. I just need lots of VRAM and this is the only way I could think of.
Thanks
just_another_person@lemmy.world 1 week ago
PLEASE look back at the crypto mining rush of a decade ago. I implore you.
You’re buying into something that doesn’t exist.