Comment

Comment on The first GPT-4-class AI model anyone can download has arrived: Llama 405B

chiisana@lemmy.chiisana.net ⁨7⁩ ⁨months⁩ ago

What’s the resources requirements for the 405B model? I did some digging but couldn’t find any documentation during my cursory search.

source

Sort:hotnew top

modeler@lemmy.world ⁨7⁩ ⁨months⁩ ago
Typically you need about 1GB graphics RAM for each billion parameters (i.e. one byte per parameter). This is a 408B parameter model. Ouch.

source
- cheddar@programming.dev ⁨7⁩ ⁨months⁩ ago
  
  Typically you need about 1GB graphics RAM for each billion parameters (i.e. one byte per parameter). This is a 405B parameter model.
  
  Image
  
  source
- Deceptichum@quokk.au ⁨7⁩ ⁨months⁩ ago
  Or you could run it via cpu and ram at a much slower rate.
  
  source
  - errer@lemmy.world ⁨7⁩ ⁨months⁩ ago
    Yeah uh let me just put in my 512GB ram stick…
    
    source
    Deceptichum@quokk.au ⁨7⁩ ⁨months⁩ ago
    Samsung do make them.
    
    Goodluck finding 512gb of VRAM.
    
    source
    bruhduh@lemmy.world ⁨7⁩ ⁨months⁩ ago
    www.ebay.com/p/116332559 lga2011 motherboards quite cheap, insert 2 xeon 2696v4 44 threads each totalling at 88 threads and 8 ddr4 32gb sticks, it comes quite cheap actually, you can also install Nvidia p40 with 24gb each, you can max out this build for ai for under 2000$
    
    source
  - chiisana@lemmy.chiisana.net ⁨7⁩ ⁨months⁩ ago
    Finally! My dumb dumb 1TB ram server (4x E5-4640 + 32x32GB DDR3 ECC) can shine.
    
    source
- Siegfried@lemmy.world ⁨7⁩ ⁨months⁩ ago
  At work we habe a small cluster totalling around 4TB of RAM.
  
  source
- TipRing@lemmy.world ⁨7⁩ ⁨months⁩ ago
  When the 8 bit quants hit, you could probably lease a 128GB system on runpod.
  
  source
- 1984@lemmy.today ⁨7⁩ ⁨months⁩ ago
  Can you run this in a distributed manner, like with kubernetes and lots of smaller machines?
  
  source
- obbeel@lemmy.eco.br ⁨7⁩ ⁨months⁩ ago
  According to huggingface, you can run a 34B model using 22.4GBs of RAM max. That’s a RTX 3090 Ti.
  
  source
- Longpork3@lemmy.nz ⁨7⁩ ⁨months⁩ ago
  Hmm, I probably have that much distributed across my network… maybe I should look into some way of distributing it across multiple gpu
  
  source
- arefx@lemmy.ml ⁨7⁩ ⁨months⁩ ago
  Ypu mean my 4090 isn’t good enough 🤣😂
  
  source
Blaster_M@lemmy.world ⁨7⁩ ⁨months⁩ ago
As a general rule of thumb, you need about 1 GB per 1B parameters, so you’re looking at about 405 GB for the full size of the model.

Quantization can compress it down to 1/2 or 1/4 that, but “makes it stupider” as a result.

source