Comment

Comment on Chips aren’t improving like they used to, and it’s killing game console price cuts

Aceticon@lemmy.dbzer0.com ⁨6⁩ ⁨months⁩ ago

Just to add to this, the reason you only see shared memory setups on PCs with integrated graphics is because it lowers performance compared to dedicated memory which is less of a problem if your GPU is only being used in 2D mode (mainly because that uses little memory), but more of a problem when used in 3D mode as the PS5 is meant to be used most of the time.

So the PS5 having shared memory is not a good thing and actually makes it inferior compared to a PC made with a GPU and CPU of similar processing power using the dominant gaming PC architecture (separate memory).

source

Sort:hotnew top

addie@feddit.uk ⁨6⁩ ⁨months⁩ ago
You’ve got that a bit backwards. Integrated memory on a desktop computer is more “partitioned” than shared - there’s a chunk for the CPU and a chunk for the GPU, and it’s usually quite slow memory by the standards of graphics cards. The integrated memory on a console is completely shared, and very fast. The GPU works at its full speed, and the CPU is able to do a couple of things that are impossible to do with good performance on a desktop computer:

load and manipulate models which are then directly accessible by the GPU. When loading models, there’s no need to read them from disk into the CPU memory and then copy them onto the GPU - they’re just loaded and accessible.

manipulate the frame buffer using the CPU. Often used for tone mapping and things like that, and a nightmare for emulator writers. Something like RPCS3 emulating Dark Souls has to turn this off; a real PS3 can just read and adjust the output using the CPU with no frame hit, but a desktop would need to copy the frame from the GPU to main memory, adjust it, and copy it back, which would kill performance.
source
- Aceticon@lemmy.dbzer0.com ⁨6⁩ ⁨months⁩ ago
  When two processing devices try and access the same memory there are contention problems as the memory cannot be accessed by two devices at the same time (well, sorta: parallel reads are fine, it’s when one side is writing that there can be problems), so one of the devices has to wait, so it’s slower than dedicated memory but the slowness is not constant since it depends on the memory access patterns of both devices.
  
  There are ways to improve this (for example, if you have multiple channels on the same memory device then contention issues are reduced to the same memory block, which depends on the block-size, though this also means that parallel processing on the same device - i.e. multiple cores - cannot use the channels being used by a different device so it’s slower).
  
  There are also additional problems with things like memory caches in the CPU and GPU - if an area of memory cached in one device is altered by a different device that has to be detected and the cache entry removed or marked as dirty. Again, this reduces performance versus situations where there aren’t multiple processing devices sharing memory.
  
  In practice the performance impact is highly dependent on if an how the memory is partitioned between the devices, as well as by the amount of parallelism in both processing devices (this latter because of my point from above that memory modules have a limited number of memory channels so multiple parallel accesses to the same memory module from both devices can lead to stalls in cores of one or both devices since not enough channels are available for both).
  
  As for the examples you gave, they’re not exactly great:
  
  First, when loading models into the GPU memory, even with SSDs the disk read is by far the slowest part and hence the bottleneck, so as long as things are being done in parallel (i.e. whilst the data is loaded from disk to CPU memory, already loaded data is also being copied from CPU memory to GPU memory) you won’t see that much difference between loading to CPU memory and then from there GPU memory and direct loading to GPU memory. Further, the manipulation of models in shared memory by the CPU introduces the very performance problems I was explaining above, namely contention problems from both devices accessing the same memory blocks and GPU cache entries getting invalidated from the CPU having altered the main memory data.
  
  Second, if I’m not mistaken tone mapping is highly parallelizable (as pixels are independent - I think, but not sure since I haven’t actually implemented this kind of post processing), which means that best by far device at parallel processing of both of those - the GPU - should be handling it in a shader, not the CPU.
  
  I don’t think that direct access by the CPU to manipulate GPU data is at all a good thing (by the reasons given on top) and to get proper performance out of a shared memory setup at the very least the programming must done in a special way that tries to reduce collisions in memory access, or the whole thing must be setup by the OS like it’s done on PCs with integrated graphics, were a part of the main memory is reserved for the GPU by the OS itself when it starts and the CPU won’t touch that memory after that.
  
  source
  - sp3ctr4l@lemmy.dbzer0.com ⁨6⁩ ⁨months⁩ ago
    Can you explain to me what the person you are replying to meant by ‘integrated memory on a desktop pc’?
    
    I tried to explain why this phrase makes no sense, but apparently they didn’t like it.
    
    …Standard GPUs and CPUs do not share a common kind of RAM that gets balanced between space reserved for CPU-ish tasks and GPU-ish tasks… that only happens with an APU that uses LPDDR RAM… which isn’t at all a standard desktop PC.
    
    It is as you say, a hierarchy of assets being called into the DDR RAM by the CPU, then streamed or shared into the GPU and its GDDR RAM…
    
    But the GPU and CPU are not literally, directly using the actual same physical RAM hardware as a common shared pool.
    
    Yes, certain data is… shared… in the sense that it is or can be, to some extent, mirrored, parellelized, between two distinct kinds of RAM… but… not in the way they seem to think it works, with one RAM pool just being directly accessed by both the CPU and GPU at the same time.
    
    … Did they mean ‘integrated graphics’ when they … said ‘integrated memory?’
    
    L1 or L2 or L3 caches?
    
    ???
    
    I still do not understand how any standard desktop PC has ‘integrated memory’.
    
    What kind of ‘memory’ on a PC… is integrated into the MoBo, unremovable?
    
    ???
    
    source
    Aceticon@lemmy.dbzer0.com ⁨6⁩ ⁨months⁩ ago
    Hah, now you made me look that stuff up since I was talking anchored on my knowledge of systems with multiple CPUs and shared memory, since that was my expectation about the system architecture for the PS5, since in the past that’s how they did things.
    
    So, for starters I never mentioned “integrated memory”, I wrote “integrated graphics”, i.e. the CPU chip comes together with a GPU, either in the same package (in two separete dies) or even the same die.
    
    I think that when people talk about “integrated memory” what they mean is main memory which is soldered on the motherboard rather than coming as discrete memory modules. From the point of view of systems architecture it makes no difference, however from the point of view of electronics soldered memory can be made to run faster and soldered connections are much closer to perfect than the mechanical contact connections you have for memory modules inserted in slots.
    
    (Quick explanation: at very high clock frequencies the electronics side starts to behave in funny ways as the frequency of the signal travelling on the circuit board gets so high and hence the wavelength size gets so small, that it’s down to centimeters or even milimeters - near the length of circuit board lines - and you start getting effects like signal reflections and interference between circuit lines - because they’re working as mini antennas so can induce effects on nearby lines - hence it’s all a lot more messy than if the thing was just running at a few MHz. Wave reflections can happen in connections which aren’t perfect, such as the mechanical contact of memory modules inserted into slots, so at higher clock speeds the signal integrity of the data travelling to and from the memory is worse than it is with soldered memory whose connections are much closer to perfect).
    
    As far as I know nowadays L1, L2 and L3 caches are always part of the CPU/GPU die, though I vaguelly remember that in the old days (80s, 90s) memory cache might be in the form of dedicated SRAM modules on the motherboard.
    
    As for integrated graphics, here’s some reference for an Intel SoC (system on a chip, in this case with the CPU and GPU together in the same die). If you look at page 5 you can see a nice architecture diagram. Notice how memory access goes via the memory controller (lower right, inside the System Agent block) and then the SoC Ring Interconnect which is an internal bus connecting everything to everything (so quite a lot of data channels). The GPU implementation is the whole left side, the CPU is top right and there is a cache slice (at first sight an L4 cache) shared by both.
    
    As you see there, in integrated graphics the memory access doesn’t go via the CPU, rather there is a memory controller (and in this example a memory cache) for both and memory access for both the CPU and the GPU cores goes through the that single controller and shares that cache (but not at lower level caches: notice how the GPU implementation contains its own L3 cache (bottom left, labelled “L3$”)
    
    With regards to the cache dirty and contention problems I mentioned in the previous post, at least that higher level (L4) cache is shared so instead of cache entries being made invalid because of the main memory being changed outside of it, what you get is a different performance problem were there is competiton for cache usage between the areas of memory used by the CPU and areas of memory used by the GPU (as the cache is much smaller than the actual main memory, it can only contain copies of part of the main memory, and if two devices are using different areas of the main memory they’re both causing those areas to get cached but the cache can’t fit both so it’s contantly ejecting entries for one area of memory and ejecting entries for the other area of memory, which massively slows it down - there are lots of tricks to make this less of a problem but it’s still slower than if there was just on processing device using that cache). As for contention problems, there are generally way more data channels in an internal interconnect as the one you see there than in the data bus to the main memory modules, plus that internal interconnect will be way faster, so the contention in memory access will be lower for cached memory but cache misses (that have to access main memory) will still suffer from two devices sharing the same number of main memory data channels.
    
    source
    -> View More Comments
- sp3ctr4l@lemmy.dbzer0.com ⁨6⁩ ⁨months⁩ ago
  I… uh… what?
  
  Integrated memory, on a desktop PC?
  
  Genuinely: What are you talking about?
  
  Typical PCs (and still many laptops)… have a CPU that uses the DDR RAM that is… plugged into the Mobo, and can be removed. Even many laptops allow the DDR RAM to be removed and replaced, though working on a laptop can often be much, much more finnicky.
  
  GPUs have their own GDDR RAM, either built into the whole AIB in a desktop, or inside of or otherwise a part of a laptop GPU chip itself.
  
  These are totally different kinds of RAM, they are accessed via distinct busses, they are not shared, they are not partitioned, not on desktop PCs and most laptops.
  
  They are physically and design distinct, set aside, and specialized to perform with their respective processor.
  
  The kind of RAM you are talking about, that is shared, partitioned, is LPDDR RAM… and is incompatible with 99% of desktop PCs
  
  …
  
  Also… anything, on a desktop PC, that gets loaded and processed by the GPU… does at some point, have to go through the CPU and its DDR RAM first.
  
  The CPU governs the actual instructions to, and output from, the GPU.
  
  A GPU on its own cannot like, ask an SSD or HDD for a texture or 3d model or shader.
  
  Normally, compressed game assets are loaded from the SSD to RAM via the Win32 API. Once in RAM, the CPU then decompresses those assets. The decompressed game assets are then moved from RAM to the graphics card’s VRAM (ie, GDDR RAM), priming the assets for use in games proper.
  
  (addition to the quote is mine)
  
  Like… there is GPU Direct Storage… but basically nothing actually uses this.
  
  pcworld.com/…/what-happened-to-directstorage-why-…
  
  Maybe it’ll take off someday, maybe not.
  
  Nobody does dual GPU SLI anymore, but I also remember back when people thought multithreading and multicore CPUs would never take off, because coding for multiple threads is too haaaaarrrrd, lol.
  
  …
  
  Anyway, the reason that emulators have problems doing the things you describe consoles a good at… is because consoles have finetuned drivers that work with only a specific set of hardware, and emulators have to reverse engineer ways of doing the same, which will work on all possible pc hardware configurations.
  
  People who make emulators generally do not have direct access to the actual proprietary driver code used by console hardware.
  
  If they did, they would much, much more easily be able to… emulate… similar calls and instruction sets on other PC hardware.
  
  But they usually just have to make this shit up on the fly, with no actual knowledge of how the actual console drivers do it.
  
  Reverse engineering is astonishingly more difficult when you don’t have the source code, the proverbial instruction manual.
  
  Its not that desktop PC architecture … just literally cannot do it.
  
  If that were the case, all the same issues you bring up that are specific to emulators… would also be present with console games that have proper ports to PC.
  
  While occasionally yes, this is sometimes the case, for some specific games with poor quality ports… generally no, not this is not true.
  
  Try running say, an emulated Xbox version of Deus Ex: Invisible war, a game notoriously handicapped by its console centric design… try comparing the PC version of that, on a PC… to that same game, but emulating the Xbox version, on the same exact PC.
  
  You will almost certainly, for almost every console game with a PC port… find that the proper PC version runs better, often much, much better.
  
  The problem isn’t the PC’s hardware capabilities.
  
  The problem is that emulation is inefficient guesswork.
  
  Like, no shade at emulator developers whatsoever, its a miracle any of that shit works at all, reverse engineering is astonishingly difficult, but yeah, reverse engineering driver or lower level code, without any documentation or source code, is gonna be a bunch of bullshit hacks that happen to not make your PC instantly explode, lol.
  
  source
sp3ctr4l@lemmy.dbzer0.com ⁨6⁩ ⁨months⁩ ago
Basically this is true, yes, without going into an exhaustive level of detail as to very, very specific subtypes of different RAM and mobo layouts.

Shared memory setups generally are less powerful, but, they also usually end up being overall cheaper, as well as having a lower power draw… and being cooler, temperature wise.

Which are all legitimate reasons those kinds of setups are used in smaller form factor ‘computing devices’, because heat managment, airflow requirements… basically rule out using a traditional architecture.

…

Though, recently, MiniPCs are starting to take off… and I am actually considering doing a build based on the Minisforum BD795i SE… which could be quite a powerful workstation/gaming rig.

Aside about interesting non standard 'desktop' potential build
This is a Mobo with a high end integrated AMD mobile CPU (7945hx)… that all together, costs about $430. And the CPU in this thing… has a PassMark score… of about the same as an AMD 9900X… which itself, the CPU alone, MSRPs for about $400. So that is kind of bonkers, get a high end Mobo and CPU… for the price of a high end CPU. Oh, I forgot to mention: This BD795iSE board? Yeah it just has a standard PCI 16 slot. So… you can plug in any 2 slot width standard desktop GPU into it… and all of this either literally is, or basically is the ITX form factor. So, you could make a whole build out of this that would be ITX form factor, and also absurdly powerful, or a budget version with a dinky GPU. I was talking in another thread a few days ago, snd somekne said PC architecture may be headed toward… basically you have the entire PC, and the GPU, and thats the new paradigm, instead of the old school view of: you have a mobo, and you pick it based on its capability to support future cpus in the same socket type, future ram upgrades, etc… And this intrigued me, I looked into it, and yeah, this concept does have cost per performance merit at this point. So this uses a split between the GPU having its GDDR RAM and the… CPU using DDDR SODIMM (laptop form factor) RAM. But its also designed such that you can actually fit huge standard PC style cooling fans… into quite a compact form factor. From what I can vaguely tell as a non Chinese speaker… it seems like there are many more people over in China who have been making high end, custom, desktop gaming rigs out of this laptop/mobile style architecture for a decent while now, and only recently has this concept even really entered into the English speaking world/market, that you can actually build your own rig this way.

source
- lka1988@lemmy.dbzer0.com ⁨6⁩ ⁨months⁩ ago
  Fascinating discourse here. Love it.
  
  What about a Framework laptop motherboard in a mini PC case? Do they ship with AMD APUs equivalent to that?
  
  source
  - sp3ctr4l@lemmy.dbzer0.com ⁨6⁩ ⁨months⁩ ago
    Hrm uh… Framework laptops… seem to be configurable as having a mobile grade CPU with integrated graphics… and also an optional, additional mobile grade, dedicated GPU.
    
    So, not really an APU… unless you really want to haggle over definitions and say ‘technically, a CPU with pathetic integrated graphics still counts as a GPU and is thus an APU’.
    
    Framework laptop boards don’t have the PCI-E 16x slot for a traditional desktop GPU. As far as I am aware, Minisforum are the only people that do that, along with a high powered mobile CPU.
    
    Note that the Minisforum Mobo model I am talking about, the AMD chip is not really an APU, its also a CPU with integrated graphics. Its a Radeon 610M, basically the bare minimum to be able to render and output very basic 2d graphics.
    
    True APUs are … things like what more modern consoles use, what a steam deck uses. They are still usually custom specs, proprietary to their vendor.
    
    The Switch 2 will have a custom Nvidia APU, which is the first Nvidia APU of note to my knowledge, and it will be very interesting to learn more about it from teardowns and benchmarks.
    
    …
    
    Currently, the most powerful, non custom, generally publically available, compatible with standard PC mobos… arguably an APU, arguably not… is the AMD 8700G.
    
    Its about $315 bucks, is a pretty decent CPU, but as a GPU… its less powerful than a standard desktop RX 6500 from AMD… which is the absolute lowest tier AMD GPU from now two generations back from current.
    
    You… might be able to run … basically games older than 5ish years, at 1080p, medium graphics, at 60fps. I guess it would maybe be a decent option if you… wanted to build a console emulator machine, roughly for consoles … N64/PS1/Dreamcast, and older, as well as being able to play older PC games, or PC games at lower settings/no more than 1080p.
    
    I am totally just spitballing with that though, trying to figure out all that exactly would be quite complicated.
    
    …
    
    But now, back to Framework.
    
    Framework is soon to be releasing the Framework Desktop.
    
    This is a small form factor PC… which uses an actual proper APU, either the AMD AI Max 385 or 395.
    
    Its listed as MSRP of $1100, they say it can run Cyberpunk at 1440p on high settings at about 75 fps… thats with no ray tracing, no framegen… and I think also no frame upscaling being used.
    
    So, presumably, if you turned on upscaling and framegen, you’d be able to get similar fps at ultra and psycho settings, and/or some amount of raytracing.
    
    There are also other companies that offer this kind of true APU, MiniPC style architecture, such as EvoTek, though it seems like most of them are considerably more expensive.
    
    wccftech.com/amd-ryzen-ai-max-395-strix-halo-mini…
    
    … And finally, looks like Minisforum is sticking with the laptop CPU + desktop GPU design, and is soon going to be offering even more powerful CPU+Mobo models.
    
    wccftech.com/minisforum-ryzen-9-9955hx-x870m-motd…
    
    So yeah, this is actually quite an interesting time of diversification away from … what have basically been standard desktop mobo architectures… for … 2, 3? decades…
    
    …shame it all also coincides with Trump throwing a literally historically unprecedented senilic temper tantrum, and fucking up prices and logistics for… basically the whole world, though of course much, much more seriously for the US.
    
    source