This would probably run on a a6000 right?
Comment on The first GPT-4-class AI model anyone can download has arrived: Llama 405B
admin@lemmy.my-box.dev 3 months ago
Technically correct ™
Before you get your hopes up: Anyone can download it, but very few will be able to actually run it.
coffee_with_cream@sh.itjust.works 3 months ago
5redie8@sh.itjust.works 3 months ago
“an order of magnitude” still feels like an understatement LOL
My 35b models come out at like Morse code speed on my 7800XT, but at least it does work?
LavenderDay3544@lemmy.world 3 months ago
When RTX 9090 Ti comes anyone who can afford it will be able to run it.
Contravariant@lemmy.world 3 months ago
That doesn’t sound like much of a change from the situation right now.
bitfucker@programming.dev 3 months ago
So does OSM data. Everyone can download the whole earth but to serve it and provide routing/path planning at scale takes a whole other skill and resources. It’s a good thing that they are willing to open source their model in the first place.
chiisana@lemmy.chiisana.net 3 months ago
What’s the resources requirements for the 405B model? I did some digging but couldn’t find any documentation during my cursory search.
modeler@lemmy.world 3 months ago
Typically you need about 1GB graphics RAM for each billion parameters (i.e. one byte per parameter). This is a 408B parameter model. Ouch.
cheddar@programming.dev 3 months ago
Image
Deceptichum@quokk.au 3 months ago
Or you could run it via cpu and ram at a much slower rate.
errer@lemmy.world 3 months ago
Yeah uh let me just put in my 512GB ram stick…
chiisana@lemmy.chiisana.net 3 months ago
Finally! My dumb dumb 1TB ram server (4x E5-4640 + 32x32GB DDR3 ECC) can shine.
Siegfried@lemmy.world 3 months ago
At work we habe a small cluster totalling around 4TB of RAM.
TipRing@lemmy.world 3 months ago
When the 8 bit quants hit, you could probably lease a 128GB system on runpod.
1984@lemmy.today 3 months ago
Can you run this in a distributed manner, like with kubernetes and lots of smaller machines?
obbeel@lemmy.eco.br 3 months ago
According to huggingface, you can run a 34B model using 22.4GBs of RAM max. That’s a RTX 3090 Ti.
Longpork3@lemmy.nz 3 months ago
Hmm, I probably have that much distributed across my network… maybe I should look into some way of distributing it across multiple gpu
arefx@lemmy.ml 3 months ago
Ypu mean my 4090 isn’t good enough 🤣😂
Blaster_M@lemmy.world 3 months ago
As a general rule of thumb, you need about 1 GB per 1B parameters, so you’re looking at about 405 GB for the full size of the model.
Quantization can compress it down to 1/2 or 1/4 that, but “makes it stupider” as a result.