Comment

AI inference is memory-bound. So, memory bus width is the main bottleneck. I also do AI on an (old) CPU, but the CPU itself is mainly idle and waiting for the memory. I'd say it'll likely be very slow, like waiting 10 minutes for a longer answer. I believe all the AI people use Apple silicon because of the unified memory and it's bus width. Or some CPU with several memory lanes.

source

Sort:hotnew top

Solaer@lemmy.world ⁨5⁩ ⁨months⁩ ago
The i9-10900 has 4 channels (Quadro-Channel DDR4-2933 (PC4-23466, 93.9GB/s). would this be better in this way than an i9-14xxx (Dual-Channel DDR5-5600 (PC5-44800, 89.6GB/s))?

does the numbers (93 GB/s and 89GB/s) mean the speed for a RAM-stick or the speed all together? maybe an old i9-10xxx with 4channel-ram was better than a new dual-channel.

source
- hendrik@palaver.p3x.de ⁨5⁩ ⁨months⁩ ago
  Seems to mean all together. (5600MT/s / 1000) x 2 sticks x 64bit / 8bits/Byte = 89.6 GB/s
  
  or 2933/1000 x 4 x 64bit / 8 = 93.9 GB/s
  
  so they calculated with double the DDR bus width, or 4times the bus width.
  
  source
neatobuilds@lemmy.today ⁨5⁩ ⁨months⁩ ago
So if I had more memory channels it would be better to have say ollama use the cpu versus the gpu?

source
- hendrik@palaver.p3x.de ⁨5⁩ ⁨months⁩ ago
  Well, the numbers I find on google are: a Nvidia 4090 can transfer 1008 GB/s. And a i9 does something like 90 GB/s. So you'd expect the CPU to be roughly 11 times slower than that GPU at fetching numbers from memory.
  
  source