Comment on Chips aren’t improving like they used to, and it’s killing game console price cuts

<- View Parent
Aceticon@lemmy.dbzer0.com ⁨1⁩ ⁨week⁩ ago

Hah, now you made me look that stuff up since I was talking anchored on my knowledge of systems with multiple CPUs and shared memory, since that was my expectation about the system architecture for the PS5, since in the past that’s how they did things.

So, for starters I never mentioned “integrated memory”, I wrote “integrated graphics”, i.e. the CPU chip comes together with a GPU, either in the same package (in two separete dies) or even the same die.

I think that when people talk about “integrated memory” what they mean is main memory which is soldered on the motherboard rather than coming as discrete memory modules. From the point of view of systems architecture it makes no difference, however from the point of view of electronics soldered memory can be made to run faster and soldered connections are much closer to perfect than the mechanical contact connections you have for memory modules inserted in slots.

(Quick explanation: at very high clock frequencies the electronics side starts to behave in funny ways as the frequency of the signal travelling on the circuit board gets so high and hence the wavelength size gets so small, that it’s down to centimeters or even milimeters - near the length of circuit board lines - and you start getting effects like signal reflections and interference between circuit lines - because they’re working as mini antennas so can induce effects on nearby lines - hence it’s all a lot more messy than if the thing was just running at a few MHz. Wave reflections can happen in connections which aren’t perfect, such as the mechanical contact of memory modules inserted into slots, so at higher clock speeds the signal integrity of the data travelling to and from the memory is worse than it is with soldered memory whose connections are much closer to perfect).

As far as I know nowadays L1, L2 and L3 caches are always part of the CPU/GPU die, though I vaguelly remember that in the old days (80s, 90s) memory cache might be in the form of dedicated SRAM modules on the motherboard.

As for integrated graphics, here’s some reference for an Intel SoC (system on a chip, in this case with the CPU and GPU together in the same die). If you look at page 5 you can see a nice architecture diagram. Notice how memory access goes via the memory controller (lower right, inside the System Agent block) and then the SoC Ring Interconnect which is an internal bus connecting everything to everything (so quite a lot of data channels). The GPU implementation is the whole left side, the CPU is top right and there is a cache slice (at first sight an L4 cache) shared by both.

As you see there, in integrated graphics the memory access doesn’t go via the CPU, rather there is a memory controller (and in this example a memory cache) for both and memory access for both the CPU and the GPU cores goes through the that single controller and shares that cache (but not at lower level caches: notice how the GPU implementation contains its own L3 cache (bottom left, labelled “L3$”)

With regards to the cache dirty and contention problems I mentioned in the previous post, at least that higher level (L4) cache is shared so instead of cache entries being made invalid because of the main memory being changed outside of it, what you get is a different performance problem were there is competiton for cache usage between the areas of memory used by the CPU and areas of memory used by the GPU (as the cache is much smaller than the actual main memory, it can only contain copies of part of the main memory, and if two devices are using different areas of the main memory they’re both causing those areas to get cached but the cache can’t fit both so it’s contantly ejecting entries for one area of memory and ejecting entries for the other area of memory, which massively slows it down - there are lots of tricks to make this less of a problem but it’s still slower than if there was just on processing device using that cache). As for contention problems, there are generally way more data channels in an internal interconnect as the one you see there than in the data bus to the main memory modules, plus that internal interconnect will be way faster, so the contention in memory access will be lower for cached memory but cache misses (that have to access main memory) will still suffer from two devices sharing the same number of main memory data channels.

source
Sort:hotnewtop