Comment

Comment on Nvidia delivers first Vera Rubin AI GPU samples to customers — 88-core Vera CPU paired with Rubin GPUs with 288 GB of HBM4 memory apiece

RegularJoe@lemmy.world ⁨2⁩ ⁨months⁩ ago

Nvidia’s Vera Rubin platform is the company’s next-generation architecture for AI data centers that includes an 88-core Vera CPU, Rubin GPU with 288 GB HBM4 memory, Rubin CPX GPU with 128 GB of GDDR7, NVLink 6.0 switch ASIC for scale-up rack-scale connectivity, BlueField-4 DPU with integrated SSD to store key-value cache, Spectrum-6 Photonics Ethernet, and Quantum-CX9 1.6 Tb/s Photonics InfiniBand NICs, as well as Spectrum-X Photonics Ethernet and Quantum-CX9 Photonics InfiniBand switching silicon for scale-out connectivity.

source

Sort:hotnew top

TropicalDingdong@lemmy.world ⁨2⁩ ⁨months⁩ ago

288 GB HBM4 memory

jfc…

Looking at the spec’s… fucking hell these things probably cost over 100k.

I wonder if we’ll see a generational performance leap with LLM’s scaling to this much memory.

source
- AliasAKA@lemmy.world ⁨2⁩ ⁨months⁩ ago
  Current models are speculated at 700 billion parameters plus. At 32 bit precision (half float), that’s 2.8TB of RAM per model, or about 10 of these units. There are ways to lower it, but if you’re trying to run full precision (say for training) you’d use over 2x this, something like maybe 4x depending on how you store gradients and updates, and then running full precision I’d reckon at 32bit probably. Possible I suppose they train at 32bit but I’d be kind of surprised.
  
  source
  - in_my_honest_opinion@piefed.social ⁨2⁩ ⁨months⁩ ago
    Sure, but giant context models are still more prone to hallucination and reinforcing confidence loops where they keep spitting out the same wrong result a different way.
    
    source
    AliasAKA@lemmy.world ⁨2⁩ ⁨months⁩ ago
    Sorry, I’m not saying that’s a good thing. It’s not just the context that’s expanding, but the parameter of the base model. I’m saying at some point you just have saved a compressed version of the majority of the content (we’re already kind of there) and you’d be able to decompress it even more losslessly. This doesn’t make it more useful for anything other than recreating copyrighted works.
    
    source
    -> View More Comments
- in_my_honest_opinion@piefed.social ⁨2⁩ ⁨months⁩ ago
  Fundamentally no, linear progress requires exponential resources. The below article is about AGI but transformer based models will not benefit from just more grunt. We’re at the software stage of the problem now. But that doesn’t sign fat checks, so the big companies are incentivized to print money by developing more hardware.
  
  https://timdettmers.com/2025/12/10/why-agi-will-not-happen/
  
  Also the industry is running out of training data
  
  https://arxiv.org/html/2602.21462v1
  
  What we need are more efficient models, and better harnessing. Or a different approach, reinforced learning applied to RNNs that use transformers has been showing promise.
  
  source
  - TropicalDingdong@lemmy.world ⁨2⁩ ⁨months⁩ ago
    Yeah I’ve read that before. I don’t necessarily agree with their framework. And even working within their framework, this article is about a challenge to their third bullet.
    
    I’m just not quite ready to rule out the idea that if you can scale single models above a certain boundary, you’ll get a fundamentally different/ novel behavior. This is consistent with other networked systems, and somewhat consistent with the original performance leaps we saw (the ones I think really matter are ones from 2019-2023, its really plateaued since and is mostly engineering tittering at the edges). It genuinely could be that 8 in a MoE configuration with single models maxing out each one could actually show a very different level of performance. We just don’t know because we just can’t test that with the current generation of hardware.
    
    Its possible there really is something “just around the corner”; possible and unlikely.
    
    source
    in_my_honest_opinion@piefed.social ⁨2⁩ ⁨months⁩ ago
    I mean what you’re proposing was the initial push of gpt3. All the experts said, these GPTs will only hallucinate more with more resources and they’ll never do anything more than repeat their training data as a word salad posing as novelty.
    
    https://gwern.net/scaling-hypothesis
    
    source
- boonhet@sopuli.xyz ⁨2⁩ ⁨months⁩ ago
  LLMs can already use way more I believe, they don’t really run them a single one of these things.
  
  The HBM4 would likely be great for speed though.
  
  source
- Cocodapuf@lemmy.world ⁨2⁩ ⁨months⁩ ago
  Lol, this was literally my exact response
  
  lemmy.world/comment/22356808
  
  I feel you man.
  
  source
- panda_abyss@lemmy.ca ⁨2⁩ ⁨months⁩ ago
  Yeah they’re going to cost as much as a house.
  
  I think we’ll see much larger active portions of larger MOEs, and larger context windows, which would be useful.
  
  The non LLM models I run would benefit a lot from this, but I don’t know of I’ll ever be able to justify the cost of how much they’ll be.
  
  source
yogurtwrong@lemmy.world ⁨2⁩ ⁨months⁩ ago
The buzzwords make my head hurt. Sounds like a copypasta

source
- in_my_honest_opinion@piefed.social ⁨2⁩ ⁨months⁩ ago
  Almost like an LLM wrote it…
  
  source