Comment on How does AI use so much power?
vrighter@discuss.tchncs.de 1 week agoyep. you could of course swap weights in and out, but that would slow things down to a crawl. So they get lots of vram
Comment on How does AI use so much power?
vrighter@discuss.tchncs.de 1 week agoyep. you could of course swap weights in and out, but that would slow things down to a crawl. So they get lots of vram
hisao@ani.social 1 week ago
I also asked ChatGPT itself, and it listed a number of approaches, and one that sounded good to me is to pin layers to GPUs, for example we have 500 GPUs: cards 1-100 have permanently loaded layers 1-30 of AI, cards 101-200 have permanently loaded layers 31-60 and so on, this way no need to frequently load huge matrices itself as they stay in GPUs permanently, just basically pipeline user prompt through appropriate sequence of GPUs.
howrar@lemmy.ca 1 week ago
I can confirm as a human with domain knowledge that this is indeed a commonly used approach when a model doesn’t fit into a single GPU.