On the SeriesX The CPU and GPU can reference data from anywhere, its just the maximum speed at which the data can be referenced is different (560 vs 336). The CPU has an upper bound far lower than the slower memory pool anyways.
But the devs need to be aware of where the memory they're using is at if its frequently used by the GPU.
Entirely different story on PC with phycally separate memory pools. Thiugh some of that is slightly mitigated by the Resizable BAR implementations, but its still slower than if the memory was directly accessible by both CPU and GPU.
I think what's being referred to isn't even really related to bandwidth, of how the data for the GPU pool of the memory is actually getting into that pool. Since the CPU only has access to 6 GB from 6 of the 10 chips (and of that 2.5 GB is reserved for OS anyway), I've seen that as basically a 3.5 GB physical space of RAM to shuffle data for the GPU memory through, but realistically it would be even "smaller" because some of that 3.5 is going to also be used for CPU code and audio data. At least, if the way I'm seeing it there is close to accurate.
Devs are likely aware of where the data is in memory once it's populated, it's more the question of how quickly that data can actually get to where it needs between the two pools. Once the GPU data is in the GPU memory pool it's basically smooth sailing but the differing pools of size & bandwidth might be creating some growing pains for devs in actually getting GPU-bound data into the GPU part of the memory pool. All that aside though compared to PC an APU design like Series X still has a clear advantage over what PC does with stuff like BAR or shuffling data over a PCIe bus, since it's still essentially a hUMA design in other aspects. This partitioning of memory pools into fast & slow blocks which also affects the physical capacity of both, feels like it kind of virtualizes some nUMA quirks into the package though, at least IMO.
edit: ignore this reply lol btw. it's completely based on wrong information.
Right, memory that is allocated to the GPU may actually need to be allocated to the CPU, but because of the split pool, developers can't use it and need work arounds to reallocate memory to it. The way memory is mapped, developers can't just freely use what's available without planning or designing around it. Instead of filling a single bucket, now they got two separate buckets to play with. Most of the earlier discussion around the disadvantages around split pool tended to formulate around the 'average' bandwidth between two pools, but size considerations were never really discussed (and honestly, without access to documentation I wouldn't have suspected this either)
If the GPU could directly access storage (in practice), would that resolve a lot of this? That's what DirectStorage is supposed to help with: directly accessing data in the storage to populate VRAM, bypassing system RAM and copy process. So we know the Series X is capable of this. However, it's also a part of Velocity Architecture and DirectStorage won't start deploying until later this year. If the VA timescale is to what you're speculating then this feature probably won't even be leveraged for a while even if the hardware is capable of it.
Which might be a bit of an issue going forward until it can actually be leveraged, but I guess we'll have to wait and see.
The issue is likely going to be cropping up the most during the transition period of games. These are games that still traditionally use the CPU for a lot of GPU functions, like culling, animation etc. So if the CPU needs to access these memory locations, this information needs to sit in the slow pool for it to do it's updates etc. I think the traditional thought process here is that GPU needs a huge amount of memory, which it may in the future, but as of this moment with cross gen, you may not see so much budget being placed towards super high quality assets, so the amount of VRAM required by the GPU may be lower, like 7-8 GB. And the CPU may use the rest. But with Series X|S you are locked with how much you have in both areas, so you careful planning on how to do it which is difficult when you need to make considerations for last generation.
Sounds about right, also comes with some inconvenient timing for MS tho I would guess. Again I think this is a reason they scaled back so much on their own 1P cross-gen support; if some of the features critical to VA can't really be leveraged by games still being coded to 8th-gen consoles as their base, then the only way to break out of that early is to cut off 8th-gen support. Which for Microsoft in particular, I always felt was the better option for a lot of reasons.
But then they also have to consider their Xbox/PC/mobile (through streaming) cross-platform initiative though and that kind of acts as a bit of an anchor in this because solutions like DirectStorage won't even be supported among most PCs except in virtually-impossible-to-buy RDNA2 and RTX 30 series GPUs, which won't make up a significant chunk of that market gaming-wise for a very long time. So while they could push their 1P ahead to focus on the 9th-gen and accelerate use of VA and its features if they'd like or needed, that still probably creates some lag for the PC side of things.
Altho I don't want to sound like I'm putting all of it on VA myself; seems that utilization of VA isn't potentially at the heart of things regarding certain performance metrics in various 3P games on the system as of this moment.
This may explain why there are random major drops on XSX with some of these launch titles. They simply ran out of room on the CPU or GPU side and needed to perform some terrible workaround to get it all to fit. ie, relying on the SSD to stream level/animation/vertices data in for new monsters etc while slowly unloading parts of the level out.
That being said however, the most critical features for GPU dispatch are included in the older generations of hardware (at least for Xbox One it is confirmed and for PS4 sort of assumed), so it's really about re-writing their rendering pipelines as PC is holding them back in this regard.
Yeah, and if it's a game that needs to also work on 8th-gen platforms any sort of SSD-level streaming of data (particularly non-texture data) needs a mirrored equivalent for the older systems and their HDD I/O being much more limited. I know games like Until Dawn got a recent update to significantly cut loading times on PS4, but loading data isn't the same thing as streaming it in, and this highlights that.
And like you also say, PC would be a limiting factor as well, partly because stuff like DirectStorage and GPUDirectStorage are either too niche to really program around, or simply aren't even available to use yet.
In the end it's only speculation, but just my thoughts on the performance on Series X|S so far. If (or rather) the games that can get to/using GPU driven pipelines: animation, vertex, culling, ordering etc, can all be performed by the GPU, improving the bandwidth by moving that particular data to GPU optimal memory, removing it from the standard memory pool, and freeing the CPU up to do other things or do a better job at holding higher framerates, working on AI or processing other things.
It's not Velocity Architecture that needs to be adopted really. I mean, that's one way to attack the issue, but that's a texturing solution. What about mesh information? Animation information? What needs to be addressed is the move to GPU driven rendering pipelines.
So it's more basically coming down to reliance by cross-gen software to rely on CPU for pipelines related to rendering setup (is this another way of phrasing drawlists/drawcall instructions? XBO had support for executeIndirect which might be some of the GPU-orientated task support in 8th-gen systems you were suggesting), and in that scenario a fully unified pool is always going to win out.
With something like Series X there's just a large chunk of physical memory exclusively set for a processor component that these cross-gen games aren't even designed to leverage for that type of task work and given the CPU speeds between Sony and MS's systems are virtually identical in SMT mode, that's going to affect MS moreso since the lesser physical RAM amount (and therefore bandwidth) dedicated to the CPU in their setup is magnified with no additional CPU clockspeed to power through that and make up the difference (again, in SMT).
Some or all of this stuff I figure also applies when it comes to BC games too, but in that case games can rely on non-SMT clock for the CPU plus those games using less physical memory, XBO/OneX games not running as fast on the CPU side of things in the first place and probably several other things I don't understand the intricate details on to talk about.