Digital Foundry Article Technical Discussion [2023]

Status
Not open for further replies.
But the quote says it was in relation to unified memory.

"When we talked to the system team there were a lot of issues around the complexity of signal integrity and what-not. As you know, with the Xbox One X, we went with the 384[-bit interface] but at these incredible speeds - 14gbps with the GDDR6 - we've pushed as hard as we could and we felt that 320 was a good compromise in terms of achieving as high performance as we could while at the same time building the system that would actually work"

14gbps GDDR6 introduced signal issues that warranted a more complex memory solution. PS5's GGDR6 is 14gbps - 256 bit bus is what enables this? So above 320, your GDDR6 @ 14gbps is 'unstable'? What do GPUs do about this for higher BW than XBSX?

It's a good question and I don't have the answer!

But ... whatever it is, it probably comes down to money. Perhaps are more expensive board with more metal layers or higher quality material, perhaps extra components like filtering caps, perhaps more logic on the chip to determine what's interference or signal degradation, and of course PC GPU's have the nuclear option of disabling a memory controller (like the 320-bit 7900 XT does) and regaining integrity and still selling the GPU at a hoofing great markup.

We've seen consoles with disabled CUs, but we've never seen one with part of the memory bus intended to be redundant for yield issues... :unsure:
 
But the quote says it was in relation to unified memory.

"When we talked to the system team there were a lot of issues around the complexity of signal integrity and what-not. As you know, with the Xbox One X, we went with the 384[-bit interface] but at these incredible speeds - 14gbps with the GDDR6 - we've pushed as hard as we could and we felt that 320 was a good compromise in terms of achieving as high performance as we could while at the same time building the system that would actually work"

14gbps GDDR6 introduced signal issues that warranted a more complex memory solution. PS5's GGDR6 is 14gbps - 256 bit bus is what enables this? So above 320, your GDDR6 @ 14gbps is 'unstable'? What do GPUs do about this for higher BW than XBSX?
Hmmm. Maybe worthwhile to look at what Nvidia has done then with their GPUs because I recall them discussing signalling for their memory at one point in time.
 
Besides the obvious being cost, were there any other reasons why X engineers went with a mixed memory configuration? Was (or are) there some benefits performance wise for such a memory setup?

If you believe the rumours they were forced too.

XSX was supposed to have 20GB of RAM (Dev kits have 40GB and all Xbox dev kits have always had double the console RAM and 320mbit bus is for 20GB as seen in PC GPU's)

But they dropped it due to the cost reasons but they also couldn't just half the amount to 10GB, so they had to come up with some way of getting at least equal with what PS5 has.
 
What's the limiting factor on 'signal noise issues' and how do other setups with higher BW deal with it?
In most high performance systems, the components running at higher frequencies generating electrical noise (and everything creates noise) are relatively far apart and can be shielded from each other. Now look at Series X and PS5 with everything crammed on one APU. Oh...
 
Last edited by a moderator:
What exactly is async compute?

It's a means of processing additional workloads at the same time as normal synchronous workloads. The idea is that you can, potentially, use more of the resources on the GPU at once and increase overall performance.

There seem to be a lot that factors in to how or when to use it best, and architecture will play into that along with what APIs offer. It appears that it can gain performance or cost it depending on what you do.

Nvidia have some guidance for their architectures on their developer pages. Some of it is pretty high level and complex, but some of the guidance they give should make sense to folks with a more general knowledge of how GPUs work so it's worth a nosy:

https://developer.nvidia.com/blog/advanced-api-performance-async-compute-and-overlap/
 
What's the limiting factor on 'signal noise issues' and how do other setups with higher BW deal with it?
This anandtech article touches on this a bit:
The challenge in doing this, of course, is that the more you pump a memory bus, the tighter the signal integrity requirements. So while it’s simple to say “let’s just double the memory bus bandwidth”, doing it is another matter. In practice a lot of work goes into the GPU memory controller, the memory itself, and the PCB to handle these transmission speeds.

 
Yeah, I don't buy the whole lead platform shtick. If you look at the prior generation, developers had no problems flexing the XB1X prowess over Pro, and PS4/Pro led that generation as the "lead platform." If anything, Series X seems more bottlenecked (software or hardware wise, or a combination of both) more so than it's market position in the console space.

If the XSX vs PS5 spec gap was similar to the X1X vs Pro, the XSX would be a 14.6 TF console with 24 GB of RAM at a bandwidth of 670 GBs. Or if the X1X was only privy to a XSX like spec increase it would have been a 4.7 TF console with 8 Gb of RAM with 5 GB having 273 GBps in bandwidth while the other 3 GBs would have been limited to 163 GBps.

A performance gap seen by the X1X/Pro is large enough to give easy access to higher frames, better resolution, or an increase in performance for a host of other metrics. XSX/PS5 not so much.

And you are right there is a bottleneck for the XSX and its DirectX 12. While the consoles sport a custom version, MS still designed it with ease of portability in mind. It probably creates inflexibilities that devs have to work around which doesn't help if you aren't the lead platform.
 
Last edited:
If the XSX vs PS5 spec gap was similar to the X1X vs Pro, the XSX would be a 14.6 TF console with 24 GB of RAM at a bandwidth of 670 GBs. Or if the X1X was only privy to a XSX like spec increase it would have been a 4.7 TF console with 8 Gb of RAM with 5 GB having 273 GBps in bandwidth while the other 3 GBs would have been limited to 163 GBps.


A performance gap seen by the X1X/Pro is large enough to give easy access to higher frames, better resolution, or an increase in performance for a host of other metrics. XSX/PS5 not so much.
IIRC PS4Pro had 9GB of RAM, or something like that. Games could use an extra 512 and the other half-gig was system reserved for the higher res dashboard and stuff like that. But yeah, the paper specs on PS5 and Series X are closer than PS4Pro and X1X.
 
IIRC PS4Pro had 9GB of RAM, or something like that. Games could use an extra 512 and the other half-gig was system reserved for the higher res dashboard and stuff like that. But yeah, the paper specs on PS5 and Series X are closer than PS4Pro and X1X.

I remember there was an extra GB of DDR3 and a 512 MB bit. I think the 512 MB of the 8 GBs could flip back and forth between being used by the system and being used by the application like the hardware would flush and load from DDR3 to create 8.5 GB virtually.
 
IIRC PS4Pro had 9GB of RAM, or something like that. Games could use an extra 512 and the other half-gig was system reserved for the higher res dashboard and stuff like that. But yeah, the paper specs on PS5 and Series X are closer than PS4Pro and X1X.
Yes in total but not for games. Pro had 5.5GB available for games (while PS4 had 5GB). X1X had 9GB for games. Much more than what Pro had. In many games Pro couldn't compete in max resolution DRS because of its lower amount of memory.
 
Last edited:
Honestly, Series X should be wiping the floor with PS5 in the vasty majority of these games (if not all) when it comes to framerate and less torn frames
Well, it really shouldn't. Due to potential faster clocks, the PS5 can have faster triangle setup (primitive rate, culling, tessellation, etc), pixel fill rate, texture filtering rate, thread scheduling (front end), and memory subsystem (caches). Also as we've learned recently: faster Async Compute.

Series X is definitely faster in Compute, Ray Tracing and memory bandwidth. So in the end, they both end up as equals on their own ways.
 
Well, it really shouldn't. Due to potential faster clocks, the PS5 can have faster triangle setup (primitive rate, culling, tessellation, etc), pixel fill rate, texture filtering rate, thread scheduling (front end), and memory subsystem (caches). Also as we've learned recently: faster Async Compute.

Series X is definitely faster in Compute, Ray Tracing and memory bandwidth. So in the end, they both end up as equals on their own ways.
Series x at ps5 clocks woukd be winning easily but Microsoft struggled with cooling..
 
What's the limiting factor on 'signal noise issues' and how do other setups with higher BW deal with it?
The number of layers in your base PCB has a lot to do with it.
More layers = more cost too, especially if you only need those extra layers for 10% of the board...

Check out the PCB layer count for modern GPUs, I bet it scales down with price.
and I bet it's also more than MB PCB's.

One of the big costs moving to PCIE-5 and DDR5, was that is require more board layers,
especially with long traces for many PCIE-5 slots, like on workstation boards.
 
Well, it really shouldn't. Due to potential faster clocks, the PS5 can have faster triangle setup (primitive rate, culling, tessellation, etc), pixel fill rate, texture filtering rate, thread scheduling (front end), and memory subsystem (caches). Also as we've learned recently: faster Async Compute.
Why so? XSX have more Unified shader units and texture units. Doesn't that make them at least on par?

Also is info here legit?
https://www.techpowerup.com/gpu-specs/xbox-series-x-gpu.c3482
https://www.techpowerup.com/gpu-specs/playstation-5-gpu.c3480
 
Why so? XSX have more Unified shader units and texture units. Doesn't that make them at least on par?

Also is info here legit?
https://www.techpowerup.com/gpu-specs/xbox-series-x-gpu.c3482
https://www.techpowerup.com/gpu-specs/playstation-5-gpu.c3480
Many of the front end components of Series X and PS5 are near identical, at least in performance per clock. So having higher clocks means that those components perform better. A narrower design is easier to saturate as well.

It's sort of like the CUs are lanes of a highway with the speed limit being the clock speed. If you only need to use 30 CUs to render/compute your job, the rest of Series X's 52 go unused, while only 6 of PS5's 6 are unused. If you need 60, PS5 does that in 2 clocks (which happen up to 25% faster because it's clocked up to 25% higher) with 12 CUs going unused across those 2 clocks. Series X would also take 2 clocks but waste 44 CUs. For Series X to win in these cases, you would have to saturate more than 36 CUs but less than 52, and probably more in the 45 range to adjust for the higher clocks. While I'm not 100% sold on the idea that PS5 being the lead platform for most games causes performance issues on Series consoles, you can see how it could happen if a developer was tailoring the workload to saturate PS5's CU count, and how the same could be true if efforts were made to saturate Series X's. But then, you have a situation where you are using 45-52 CUs and Series S has 20 at a lower clock speed. Which means it's taking 3+ clocks to do what Series X does in 1 and PS5 does in 2, and those Series S clocks are even slower.

This is all an oversimplified example that I don't believe reflects a real workload on a console game, but more of an extreme example to show how a narrower design at higher clock speed can perform better because it's easier to fill that pipeline with work.
 
Many of the front end components of Series X and PS5 are near identical, at least in performance per clock. So having higher clocks means that those components perform better. A narrower design is easier to saturate as well.

It's sort of like the CUs are lanes of a highway with the speed limit being the clock speed. If you only need to use 30 CUs to render/compute your job, the rest of Series X's 52 go unused, while only 6 of PS5's 6 are unused. If you need 60, PS5 does that in 2 clocks (which happen up to 25% faster because it's clocked up to 25% higher) with 12 CUs going unused across those 2 clocks. Series X would also take 2 clocks but waste 44 CUs. For Series X to win in these cases, you would have to saturate more than 36 CUs but less than 52, and probably more in the 45 range to adjust for the higher clocks. While I'm not 100% sold on the idea that PS5 being the lead platform for most games causes performance issues on Series consoles, you can see how it could happen if a developer was tailoring the workload to saturate PS5's CU count, and how the same could be true if efforts were made to saturate Series X's. But then, you have a situation where you are using 45-52 CUs and Series S has 20 at a lower clock speed. Which means it's taking 3+ clocks to do what Series X does in 1 and PS5 does in 2, and those Series S clocks are even slower.

This is all an oversimplified example that I don't believe reflects a real workload on a console game, but more of an extreme example to show how a narrower design at higher clock speed can perform better because it's easier to fill that pipeline with work.

Work loads are never that small :) but on the trailing portion of the job, this would apply.

Bigger cards are faster because they can do more work in parallel and it works well with GDDR because you will build a bigger bus to feed it and thus request more data at once.

Narrow means you have to hit memory more often and that’s a much larger hit to speed than running a cycle or so behind.
 
Easier to explain.

Vega 64 has more compute units than Vega 56.

And with both GPU's running at the same clock speed you would expect Vega 64 to be faster but they're identical in performance.

Why?

Because there's more to GPU performance that simply adding more compute units.

It's a balancing act as each CU requires a certain amount of cache, bandwidth and other things to run optimally, and it's a balancing act that Microsoft got wrong in my opinion.
 
Last edited:
It's a balancing act as each CU requires a certain amount of cache, bandwidth and other things to run optimally, and it's a balancing act that Microsoft got wrong in my opinion.

So, then, is it also your opinion that Sony got the "balancing act" wrong in this particular case?


Series X pretty much murders PS5 in this game. So did Sony screw up their hardware design?

Or is it possible that games differ, engines differ, and development priorities differ?
 
Status
Not open for further replies.
Back
Top