But it’s very convenient! When you have a BSOD, you don’t need your core dumped, you simply unplug your DRAM+ and send it to Microsoft using paper mail.
How is that any better than DRAM though? It would have to be much cheaper/GB, yet reasonably faster than the top-end SLC/MLC flash Samsung sells.
Another thing I don’t get… in all the training runs I see, dataset bandwidth needs are pretty small. Like, streaming images (much less like 128K tokens of text) is a minuscule drop in the bucket compared to how long a step takes, especially with hardware decoders for decompression.
Weights are an entirely different duck, and stud like Cerebras clusters do stream them, but they need the speed of DRAM.
pelya@lemmy.world 3 weeks ago
just_another_person@lemmy.world 3 weeks ago
I think you’re stuck in the traditional viewpoint of a computer being CPU+Mem+Storage. That’s fine for a single machine that a regular user would have.
This type of memory could essentially wipe out the need for traditional deployments in datacenters by having memory banks of this stuff operating with many CPUs as a client on a bus with no local storage needed, so just CPU+Mem and everything loaded into a known state via network storage that won’t go away if something loses power or crashes. It would definitely make the current idiotic use of GPUs more cost-effective and less wasteful.
If you try and take that down to a regular user needing a use-case, it’s really only going to matter for developers building things for such a system because it’s such a new idea having stateful memory. You may just be thinking about it like a single user, which is not what it would be used for at all (at first).
To your other question about the actual speed: current memory speeds only need to be that fast because of the storage involved and shuttling data across a bus between the three parts. Getting this new type of stateful memory to higher speeds than a current storage device would already show a performance benefit because you’re removing one step in the total transfer path between all three points and just having the two. So really a speed of something higher than SSD but slower than current DDR speeds should still see a benefit in theory.
Overall, this has been a path for things for quite awhile, and they’ve obviously got to get some sheets out to explain the performance and efficiency benefits still, and it will require a complete rework of how current CPUs and bridge controllers work…it’s quite a ways off from being an everyday product.
brucethemoose@lemmy.world 3 weeks ago
You are talking theoretical.
A big reason that supercomputers moved to a network of “commodity” hardware architecture is that its cost effective.
How would one build a giant unified pool of this memory? CXL, but how does it look physically? Maybe you get a lot of bandwidth in parallel, but how would it be even close to the latency of “local” DRAM busses on each node? Is that setup truly more power efficient than banks of DRAM backed by infrequently touched flash? If your particular workload needs fast random access to memory, even at scale the only advantage seems to be some fault tolerance at a huge speed cost, and if you just need bulk high latency bandwidth, flash has got you covered for cheaper.
…I really like the idea of non volatile unified memory, but ultimately architectural decisions come down to economics.
just_another_person@lemmy.world 3 weeks ago
It’s not theoretical, it’s just math. Removing 1/3 of the bus paths, and also removing the need to constantly keep RAM powered…it’s quite a reduction when you’re thinking at large scale. If AWS or Google could reduce their energy needs by 33% on anything, they’d take it in a heartbeat. Thats just assuming this would/could be used somehow as a drop-in replacement, which seems unlikely. Think of an SoC with this on board, or an APU. The premise itself reduces cost while increasing efficiency, but again, they really need to get some sheets out and productize it before most companies will do much more than simply do trial runs for such things.
brucethemoose@lemmy.world 3 weeks ago
And here’s the kicker.
You’re supposing it’s (given the no refresh bonus) 1/3 as fast as dram, similar latency, and cheap enough per gigabyte to replace flash. That is a tall order, and it would be incredible if it hits all three of those. I find that highly improbable.
Optane, for reference, was a lot slower than DRAM and a lot more expensive/less dense than flash even with all the work Intel put into it and busses built into then top end CPUs for direct access. And they thought that was pretty good.