You are talking theoretical.
A big reason that supercomputers moved to a network of “commodity” hardware architecture is that its cost effective.
How would one build a giant unified pool of this memory? CXL, but how does it look physically? Maybe you get a lot of bandwidth in parallel, but how would it be even close to the latency of “local” DRAM busses on each node? Is that setup truly more power efficient than banks of DRAM backed by infrequently touched flash? If your particular workload needs fast random access to memory, even at scale the only advantage seems to be some fault tolerance at a huge speed cost, and if you just need bulk high latency bandwidth, flash has got you covered for cheaper.
…I really like the idea of non volatile unified memory, but ultimately architectural decisions come down to economics.
just_another_person@lemmy.world 2 weeks ago
It’s not theoretical, it’s just math. Removing 1/3 of the bus paths, and also removing the need to constantly keep RAM powered…it’s quite a reduction when you’re thinking at large scale. If AWS or Google could reduce their energy needs by 33% on anything, they’d take it in a heartbeat. Thats just assuming this would/could be used somehow as a drop-in replacement, which seems unlikely. Think of an SoC with this on board, or an APU. The premise itself reduces cost while increasing efficiency, but again, they really need to get some sheets out and productize it before most companies will do much more than simply do trial runs for such things.
brucethemoose@lemmy.world 2 weeks ago
And here’s the kicker.
You’re supposing it’s (given the no refresh bonus) 1/3 as fast as dram, similar latency, and cheap enough per gigabyte to replace flash. That is a tall order, and it would be incredible if it hits all three of those. I find that highly improbable.
Optane, for reference, was a lot slower than DRAM and a lot more expensive/less dense than flash even with all the work Intel put into it and busses built into then top end CPUs for direct access. And they thought that was pretty good.
just_another_person@lemmy.world 2 weeks ago
No, you misunderstood. A current standard computer bus path is guaranteed to have at least 3 bus paths: CPU, RAM, Storage.
The amount of energy required to communicate between all three parts varies, but you can be guaranteed that removing just one PLUS removing the capacitor requirement for the memory will reduce power consumption by 1/3 of whatever that total bus power consumption is. This is ignoring any other additional buses and doing the bare minimum math.
The speed of this memory would matter less if you’re also reducing the static storage requirement. The speed at which it can communicate with the CPU only is what would matter, so if you’re not traversing CPU>RAM>SSD and only doing CPU>DRAM+, it’s going to be more efficient.
barsoap@lemm.ee 2 weeks ago
PCIe 5.0 x16 can match DDR5’s bandwidth, that’s not the issue, the question is latency. The only reason OSs cache disk contents in memory is because SSD latency is something like at least 30x slower, the data ends up in the CPU either way RAM can’t talk directly to the SSD, modern mainboards are very centralised and it’s all point-to-point connection, the only bus you’ll find will be talking i2c.
And I think it’s rather suspicious that none of those articles are talking about latency. Without that being at least in the ballpark of DDR5 all this is is an alternative to NAND which is of course also a nice thing but not a game changer.