Comment

Comment on The first GPT-4-class AI model anyone can download has arrived: Llama 405B

admin@lemmy.my-box.dev ⁨7⁩ ⁨months⁩ ago

Technically correct ™

Before you get your hopes up: Anyone can download it, but very few will be able to actually run it.

source

Sort:hotnew top

chiisana@lemmy.chiisana.net ⁨7⁩ ⁨months⁩ ago
What’s the resources requirements for the 405B model? I did some digging but couldn’t find any documentation during my cursory search.

source
- modeler@lemmy.world ⁨7⁩ ⁨months⁩ ago
  Typically you need about 1GB graphics RAM for each billion parameters (i.e. one byte per parameter). This is a 408B parameter model. Ouch.
  
  source
  - cheddar@programming.dev ⁨7⁩ ⁨months⁩ ago
    
    Typically you need about 1GB graphics RAM for each billion parameters (i.e. one byte per parameter). This is a 405B parameter model.
    
    Image
    
    source
  - Deceptichum@quokk.au ⁨7⁩ ⁨months⁩ ago
    Or you could run it via cpu and ram at a much slower rate.
    
    source
    errer@lemmy.world ⁨7⁩ ⁨months⁩ ago
    Yeah uh let me just put in my 512GB ram stick…
    
    source
    -> View More Comments
    chiisana@lemmy.chiisana.net ⁨7⁩ ⁨months⁩ ago
    Finally! My dumb dumb 1TB ram server (4x E5-4640 + 32x32GB DDR3 ECC) can shine.
    
    source
  - Siegfried@lemmy.world ⁨7⁩ ⁨months⁩ ago
    At work we habe a small cluster totalling around 4TB of RAM.
    
    source
  - TipRing@lemmy.world ⁨7⁩ ⁨months⁩ ago
    When the 8 bit quants hit, you could probably lease a 128GB system on runpod.
    
    source
  - 1984@lemmy.today ⁨7⁩ ⁨months⁩ ago
    Can you run this in a distributed manner, like with kubernetes and lots of smaller machines?
    
    source
  - obbeel@lemmy.eco.br ⁨7⁩ ⁨months⁩ ago
    According to huggingface, you can run a 34B model using 22.4GBs of RAM max. That’s a RTX 3090 Ti.
    
    source
  - Longpork3@lemmy.nz ⁨7⁩ ⁨months⁩ ago
    Hmm, I probably have that much distributed across my network… maybe I should look into some way of distributing it across multiple gpu
    
    source
  - arefx@lemmy.ml ⁨7⁩ ⁨months⁩ ago
    Ypu mean my 4090 isn’t good enough 🤣😂
    
    source
- Blaster_M@lemmy.world ⁨7⁩ ⁨months⁩ ago
  As a general rule of thumb, you need about 1 GB per 1B parameters, so you’re looking at about 405 GB for the full size of the model.
  
  Quantization can compress it down to 1/2 or 1/4 that, but “makes it stupider” as a result.
  
  source
coffee_with_cream@sh.itjust.works ⁨7⁩ ⁨months⁩ ago
This would probably run on a a6000 right?

source
- 5redie8@sh.itjust.works ⁨7⁩ ⁨months⁩ ago
  “an order of magnitude” still feels like an understatement LOL
  
  My 35b models come out at like Morse code speed on my 7800XT, but at least it does work?
  
  source
LavenderDay3544@lemmy.world ⁨7⁩ ⁨months⁩ ago
When RTX 9090 Ti comes anyone who can afford it will be able to run it.

source
- Contravariant@lemmy.world ⁨7⁩ ⁨months⁩ ago
  That doesn’t sound like much of a change from the situation right now.
  
  source
bitfucker@programming.dev ⁨7⁩ ⁨months⁩ ago
So does OSM data. Everyone can download the whole earth but to serve it and provide routing/path planning at scale takes a whole other skill and resources. It’s a good thing that they are willing to open source their model in the first place.

source