This would probably run on a a6000 right?
Comment on The first GPT-4-class AI model anyone can download has arrived: Llama 405B
admin@lemmy.my-box.dev 1 month ago
Technically correct ™
Before you get your hopes up: Anyone can download it, but very few will be able to actually run it.
coffee_with_cream@sh.itjust.works 1 month ago
5redie8@sh.itjust.works 1 month ago
“an order of magnitude” still feels like an understatement LOL
My 35b models come out at like Morse code speed on my 7800XT, but at least it does work?
LavenderDay3544@lemmy.world 1 month ago
When RTX 9090 Ti comes anyone who can afford it will be able to run it.
Contravariant@lemmy.world 1 month ago
That doesn’t sound like much of a change from the situation right now.
bitfucker@programming.dev 1 month ago
So does OSM data. Everyone can download the whole earth but to serve it and provide routing/path planning at scale takes a whole other skill and resources. It’s a good thing that they are willing to open source their model in the first place.
chiisana@lemmy.chiisana.net 1 month ago
What’s the resources requirements for the 405B model? I did some digging but couldn’t find any documentation during my cursory search.
modeler@lemmy.world 1 month ago
Typically you need about 1GB graphics RAM for each billion parameters (i.e. one byte per parameter). This is a 408B parameter model. Ouch.
cheddar@programming.dev 1 month ago
Image
Deceptichum@quokk.au 1 month ago
Or you could run it via cpu and ram at a much slower rate.
errer@lemmy.world 1 month ago
Yeah uh let me just put in my 512GB ram stick…
chiisana@lemmy.chiisana.net 1 month ago
Finally! My dumb dumb 1TB ram server (4x E5-4640 + 32x32GB DDR3 ECC) can shine.
Siegfried@lemmy.world 1 month ago
At work we habe a small cluster totalling around 4TB of RAM.
TipRing@lemmy.world 1 month ago
When the 8 bit quants hit, you could probably lease a 128GB system on runpod.
1984@lemmy.today 1 month ago
Can you run this in a distributed manner, like with kubernetes and lots of smaller machines?
obbeel@lemmy.eco.br 1 month ago
According to huggingface, you can run a 34B model using 22.4GBs of RAM max. That’s a RTX 3090 Ti.
Longpork3@lemmy.nz 1 month ago
Hmm, I probably have that much distributed across my network… maybe I should look into some way of distributing it across multiple gpu
arefx@lemmy.ml 1 month ago
Ypu mean my 4090 isn’t good enough 🤣😂
Blaster_M@lemmy.world 1 month ago
As a general rule of thumb, you need about 1 GB per 1B parameters, so you’re looking at about 405 GB for the full size of the model.
Quantization can compress it down to 1/2 or 1/4 that, but “makes it stupider” as a result.