Comment on Very large amounts of gaming gpus vs AI gpus
enumerator4829@sh.itjust.works 3 days ago
Well, a few issues:
- For hosting or training large models you want high bandwidth between GPUs. PCIe is too slow, NVLink has literally a magnitude more bandwidth. See what Nvidia is doing with NVLink and AMD is doing with InfinityFabric. Only available if you pay the premium, and if you need the bandwidth, you are most likely happy to pay.
- Same thing as above, but with memory bandwidth. The HBM-chips in a H200 will run in circles around the GDDR-garbage they hand out to the poor people with filthy consumer cards. By the way, your inference and training is most likely bottlenecked by memory bandwidth, not available compute.
- Commercially supported cooling of gaming GPUs in rack servers? Lol. Good luck getting any reputable hardware vendor to sell you that, and definitely not at the power densities you want in a data center.
- TFLOP16 isn’t enough. Look at 4 and 8 bit tensor numbers, that’s where the expensive silicon is used.
- Nvidias licensing agreements basically prohibit gaming cards in servers. No one will sell it to you at any scale.
For fun, home use, research or small time hacking? Sure, buy all the gaming cards you can. If you actually need support and have a commercial use case? Pony up. Either way, benchmark your workload, don’t look at marketing numbers.
Is it a scam? Of course, but you can’t avoid it.
TheMightyCat@ani.social 2 days ago
FP16 Tensor Core
, the RTX pro 6000 datasheet keeps it vague with only mentioningAI TOPS
, which they define asEffective FP4 TOPS with sparsity
, and they didn’t even bother writing a datasheet for he 5090 only saying3352 AI TOPS
, which i suppose is fp4 then. the AMD datasheets only list fp16 and int8 matrix, whether int8 matrix is equal to fp8 i don’t know. So FP16 was the common denominator for all the cards i could find without comparing apples with oranges.non_burglar@lemmy.world 2 days ago
During the last GPU mining craze, I helped build a 3-rack mining operation. Gpus are unregulated pieces of power-sucking shit from a power management perspective. You do not have the power requirements to do this on residential power, even at 300amp service.
Think of a microwave’s behaviour ; yes, a 1000w microwave pulls between 700 and 900w while cooking, but the startup load is massive, almost 1800w sometimes, depending on how cheap the thing is.
GPUs also behave like this, but not at startup. They spin up load predictively, which means the hardware demands more power to get the job done, it doesn’t scale down the job to save power. Multiply by 58 rx9070. Now add cooling.
You cannot do this.
TheMightyCat@ani.social 2 days ago
Thanks, While I still would like to know thr peformance scaling of a cheap cluster this does awnser the question, pay way more for high end cards like the H200 for greater efficiency, or pay less and have to deal with these issues.
enumerator4829@sh.itjust.works 2 days ago
Your math checks out, but only for some workloads. Other workloads scale out like shit, and then you want all your bandwidth concentrated. At some point you’ll also want to consider power draw:
Now include power and cooling over a few years and do the same calculations.
As for apples and oranges, this is why you can’t look at the marketing numbers, you need to benchmark your workload yourself.