Comment on Lowering power consumption on Opteron

<- View Parent
grue@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

Nothin’ I’m running, that’s for sure!

It’s not really that there are services that require that much processing power for a single request; it’s that it’s designed to handle normal requests for hundreds or thousands of users at once.

I suppose that supporting 0.5TB of RAM means it could deal with quite a big LLM, but any sort of halfway-modern GPU would absolutely run circles around it in terms of tokens per second, on any model that fit in their VRAM.

source
Sort:hotnewtop