[deleted]

⁨24⁩ ⁨likes⁩

Submitted ⁨⁨7⁩ ⁨months⁩ ago⁩ by ⁨gnutard@sh.itjust.works⁩ to ⁨selfhosted@lemmy.world⁩

[deleted]

source

Comments

Sort:hotnew top

april@lemmy.world ⁨7⁩ ⁨months⁩ ago
Only the GPU and primarily the vram matters for LLMs. So this wouldn’t help at all.

source
- mozz@mbin.grits.dev ⁨7⁩ ⁨months⁩ ago
  You’re the only one talking sense and you are sitting here with your 2 upvotes
  
  The AI company business model is 100% unsustainable. It’s hard to say when they will get sick of hemorrhaging money by giving away this stuff more or less for free, but it might be soon. That’s totally separate from any legal issues that might come up. If you care about this stuff, learning about doing it locally and having a self hosted solution in place might not be a bad idea.
  
  But upgrading anything aside from your GPU+VRAM is a pure and unfettered waste of money in that endeavor.
  
  source
- cybersandwich@lemmy.world ⁨7⁩ ⁨months⁩ ago
  GPU with a ton of vran is what you need, BUT
  
  An alternate solution is something like a Mac mini with an m series chip and 16gb of unified memory. The neural cores on apple silicon are actually pretty impressive and since they use unified memory the models would have access to whatever the system has.
  
  I only mention it because a Mac mini might be cheaper than GPU with tons of vram by a couple hundred bucks.
  
  And it will sip power comparatively.
  
  source
  - L_Acacia@lemmy.one ⁨7⁩ ⁨months⁩ ago
    Buying second hand 3090/7090xtx will be cheaper for better performances if you are not building the rest of the machine.
    
    source
- gnutard@sh.itjust.works ⁨7⁩ ⁨months⁩ ago
  I thought you need tons of RAM to run LLMs? I thought the newer models needed up to 64GB RAM?
  
  source
  - atzanteol@sh.itjust.works ⁨7⁩ ⁨months⁩ ago
    VRAM. Not system RAM. LLMs run best entirely on the GPU.
    
    source
  - april@lemmy.world ⁨7⁩ ⁨months⁩ ago
    Ram is important but it has to be vram not system ram.
    
    Only MacBooks can use the system ram because they have an integrated GPU rather than a dedicated one.
    
    source
  - PumpkinEscobar@lemmy.world ⁨7⁩ ⁨months⁩ ago
    Taking ollama for instance, either the whole model runs in vram and compute is done on the gpu, or it runs in system ram and compute is done on the cpu. Running models on CPU is horribly slow. You won’t want to do it for large models
    
    LM studio and others allow you to run part of the model on GPU and part on CPU, splitting memory requirements but still pretty slow.
    
    Even the smaller 7B parameter models run pretty slow in CPU and the huge models are orders of magnitude slower
    
    So technically more system ram will let you run some larger models but you will quickly figure out you just don’t want to do it.
    
    source
  - Findmysec@infosec.pub ⁨7⁩ ⁨months⁩ ago
    They do, but VRAM. Unfortunately, the cards that do have that much of memory are used by OEMs/corporations and are insanely pricey
    
    source
- Enkers@sh.itjust.works ⁨7⁩ ⁨months⁩ ago
  One minor caveat where CPU does matter is AVX support. I couldn’t get ollama to run well on my system, despite having a decent GPU because I’m using an ancient processor.
  
  source
wildbus8979@sh.itjust.works ⁨7⁩ ⁨months⁩ ago
no.

source
Red_October@lemmy.world ⁨7⁩ ⁨months⁩ ago
no.

source
furzegulo@lemmy.dbzer0.com ⁨7⁩ ⁨months⁩ ago
no.

source
KillerTofu@lemmy.world ⁨7⁩ ⁨months⁩ ago
no.

source
flamingo_pinyata@sopuli.xyz ⁨7⁩ ⁨months⁩ ago
yes

source
- Churbleyimyam@lemm.ee ⁨7⁩ ⁨months⁩ ago
  bingbong.
  
  source
axzxc1236@lemm.ee ⁨7⁩ ⁨months⁩ ago
I9 14900k…bad news for you, 13th and 14th gen I9 is unstable, crashes.

Suggestion: Wait for 15th gen or AMD 9000 series CPU to come out.

source
- zer0squar3d@lemmy.dbzer0.com ⁨7⁩ ⁨months⁩ ago
  This. So many issues.
  
  source
SteveTech@programming.dev ⁨7⁩ ⁨months⁩ ago

Will I see any performance increase?

Like others have said LLMs mostly use VRAM, they can use system RAM if you’re running them on CPU, but that’s ridiculously slow.

It will however increase the speed of your compile times, which is especially useful if you’re compiling something large like the Linux kernel on a regular basis.

I’m also worried about not having ECC RAM.

If you are using it purely for LLMs, if it’s going to get bit flips, it’ll happen in VRAM.

If you are compiling large things for customers, I’d recommend ECC, just in case, e.g. you don’t want a bricking firmware from a bit flip. But according to EDAC and my TIG stack, my server’s ECC RAM has never even detected an error in the past year, if I understand EDAC properly, so it’s really not important.

source
anzo@programming.dev ⁨7⁩ ⁨months⁩ ago
Have you tried ollama ? Some (if not all) models would do inference just fine with your current specs. Of course, it all depends on how many queries per unit of time you need. And if you wanted to load a huge codebase and pass it as input. Anyway, go try out.

source
BarbecueCowboy@lemmy.world ⁨7⁩ ⁨months⁩ ago
Have you looked into specialized AI chips/accelerators at all if you really want to mess with it?

Way lower end than what you’re working with, but they have AI accelerator kits for something as small as a Raspberry Pi.

source
- L_Acacia@lemmy.one ⁨7⁩ ⁨months⁩ ago
  You are limited by bandwidth not compute with llm, so accelerator won’t change the interferance tp/s
  
  source