Digital Foundry Article Technical Discussion [2021]

Deleted member 11852 · May 21, 2021

Kugai Calo said:
Hmm it really isn't?

It really is. Microsoft's entire technical stack for Windows, and now Xbox, is about abstracting software from hardware so that when you run older software on new hardware you leverage the capabilities of new hardware.

I completely understand why this doesn't happen on PlayStation 5 because Sony has long seemingly encouraged devs to "hit the metal" but I entirely expected older Xbox games to benefit more from Series S|X hardware because of the virtualisation. But as I said above, I think this is a work-in-progress and that this will change going forward.

But I wonder if there are some innate incompatibilities between GCN and RDNA that are just over cumbersome to overcome. Hopefully DF will be able to share more in future.

Kugai Calo · May 21, 2021

DSoup said:
It really is. Microsoft's entire technical stack for Windows, and now Xbox, is about abstracting software from hardware so that when you run older software on new hardware you leverage the capabilities of new hardware.

I completely understand why this doesn't happen on PlayStation 5 because Sony has long seemingly encouraged devs to "hit the metal" but I entirely expected older Xbox games to benefit more from Series S|X hardware because of the virtualisation. But as I said above, I think this is a work-in-progress and that this will change going forward.

But I wonder if there are some innate incompatibilities between GCN and RDNA that are just over cumbersome to overcome. Hopefully DF will be able to share more in future.

I think it’s better to base our analysis on technical facts rather than stereotypical assumptions about a corporation. It’s very practical that API calls, who create GPU commands (NOT CU instructions) and objects, gets virtualized and abstracted. But since current RDNA GPUs can execute CGN code natively (much like how x64 CPUs can run x86 code natively), naturally they let it run it. They might totally have the technical prowess to create a JIT that translates GCN binary to RDNA binary at PSO creation time, but is it really worth the effort for that 25% improvement right now?

Deleted member 11852 · May 21, 2021

Kugai Calo said:
I think it’s better to base our analysis on technical facts rather than stereotypical assumptions about a corporation.

I don't know what you're referring with this. It's a 'technical fact' that devs developing for PS4 have direct access to GPU command buffers and this is how you extract performance versus Microsoft's higher level APIs. 4A Games clarified this in their interview with Digital Foundry in 2014.

Kugai Calo said:
It’s very practical that API calls, who create GPU commands (NOT CU instructions) and objects, gets virtualized and abstracted. But since current RDNA GPUs can execute CGN code natively (much like how x64 CPUs can run x86 code natively), naturally they let it run it. They might totally have the technical prowess to create a JIT that translates GCN binary to RDNA binary at PSO creation time, but is it really worth the effort for that 25% improvement right now?

Why do Microsoft need to translate GCN to RDNA? Games are calling APIs, I would expect an API running on a Series S|X to calling calling the native hardware functions not the hardware functions of last generation hardware. What's the logic here? Perhaps this is some limitation of the virtualisation layer that Xbox games exist in. If so, this seems a bit of an own goal. Going to such lengths to make games run on future hardware but not benefiting from architectural improvements? If it's just getting the higher clocks that's the same approach as PS5 ¯\_(ツ)_/¯

PSman1700 · May 21, 2021

Kugai Calo said:
I think it’s better to base our analysis on technical facts rather than stereotypical assumptions about a corporation.

Unfortunately, thats unavoidable these days on (technical) forums. Everyone has their own, well, thoughts and opinions, and thats what your seeing here.

Allandor · May 21, 2021

DSoup said:
I don't know what you're referring with this. It's a 'technical fact' that devs developing for PS4 have direct access to GPU command buffers and this is how you extract performance versus Microsoft's higher level APIs. 4A Games clarified this in their interview with Digital Foundry in 2014.

Why do Microsoft need to translate GCN to RDNA? Games are calling APIs, I would expect an API running on a Series S|X to calling calling the native hardware functions not the hardware functions of last generation hardware. What's the logic here? Perhaps this is some limitation of the virtualisation layer that Xbox games exist in. If so, this seems a bit of an own goal. Going to such lengths to make games run on future hardware but not benefiting from architectural improvements? If it's just getting the higher clocks that's the same approach as PS5 ¯\_(ツ)_/¯

It is BC like this. Just through the API layer. The hardware is compatible enough. But the code that was optimized for GCN and might not work that well on RDNA (with 2x the expected performance), so getting double the "performance" (xox -> xsx) might be harder to achieve than some may think.
I still think that many underestimated what GCN can do. It is just hard to get the performance on the road. Now with RDNA it is much easier to get the performance on the road but therefore it has less "unused" potential for the future. E.g. you don't really need excessive async compute with RDNA to get almost everything out of the chip. With GCN this was the way to go because else there was always a big overhead of "power" left. With RDNA this automatically leads to less optimizations needed and I really except developers will also optimize much less for the architecture because less optimization is needed and the potential gains are much smaller than with GCN before.

So back now to the the GCN optimized code on RDNA. As async compute is no longer needed as much as it was needed before, now this may lead to a small overhead and might reduce some of the ipc gains RDNA had over GCN. So double theoretical performance might just not translate to double performance.
And than there is the memory bandwidth ....

see colon · May 21, 2021

mr magoo said:
completely OT but "fat APIs" would be awesome name for punk group

I would have gone techno-industrial. Then you could have a scsi beats.

DSoup said:
https://twitter.com/x/status/1310638993973878785

This is a little surprising. But it seems clear that Microsoft's efforts on improving performance in previous generation games is a work in progress.

I never understood this "You get 12TF but no IPC improvements" statement. If TF are calculated by instructions per clock multiplied by clock, then how can you get 12TF on a 12TF machine if you aren't getting the same instructions per clock.

Dictator · May 21, 2021

see colon said:
I would have gone techno-industrial. Then you could have a scsi beats.

I never understood this "You get 12TF but no IPC improvements" statement. If TF are calculated by instructions per clock multiplied by clock, then how can you get 12TF on a 12TF machine if you aren't getting the same instructions per clock.

I guess one thing is that with GCN BC is that there will be differences with wavefront management vs. RDNA 2, which if native, could definitely lead to some efficiency of utilising that 12 TF.

Deleted member 11852 · May 21, 2021

Allandor said:
It is BC like this. Just through the API layer. The hardware is compatible enough. But the code that was optimized for GCN and might not work that well on RDNA (with 2x the expected performance), so getting double the "performance" (xox -> xsx) might be harder to achieve than some may think.

The whole point of APIs and virtualisation is that the hardware doesn't need to be compatible. PS4's GNM API was really damn thin, in some cases it's not an API as you think of then in a traditional operating system sense because you're directing the GPU command buffer directly, not asking the OS to do X, Y or Z.

see colon said:
I never understood this "You get 12TF but no IPC improvements" statement. If TF are calculated by instructions per clock multiplied by clock, then how can you get 12TF on a 12TF machine if you aren't getting the same instructions per clock.

You and me both. I don't understand with their virtualised game environment and APIs why last-gen game code isn't taking advantages of architectural improvements of RDNA versus GCN. But the 12Tf but no IPC improvements is perplexing. How does it make sense? I don't get it.

BRiT · May 21, 2021

DSoup said:
The whole point of APIs and virtualisation is that the hardware doesn't need to be compatible. PS4's GNM API was really damn thin, in some cases it's not an API as you thin of then in a traditional operating system sense because you're directing ten GPU command buffer directly, not asking the OS to do X, Y or Z.

You an me both. I don't understand with their virtualised game environment and APIs why lasagne game code isn't taking advantages of architectural improvements of RDNA versus GCN. But the 12Tf but no IPC improvements is perplexing. How does it make sense? I don't get it.

Maybe it's a matter of lack of quality assurance time needed to guarantee the consumers will not encounter any issues whatsoever, so they took a safer approach?

Obligatory vehicles analogy with totally made up numbers instruction mix ratios. Both vehicles are identical in their total volume capacity and the speed in which it can travel from Point A to Point B. What's critical is how the capacity can be scheduled for use. The passenger to luggage mix will determine how much can be moved from Point A to Point B.

Vehicle A) A GCN vehicles with 7 capacity slots per row, and only 4 of the slots can be used by passengers with the remaining 3 slots used for luggage.
Vehicle B) A RDNA2 vehicles with 7 capacity slots per row, and all slots can be used by passengers or luggage (this is the IPC increase ?).

iroboto · May 21, 2021

see colon said:
I would have gone techno-industrial. Then you could have a scsi beats.

I never understood this "You get 12TF but no IPC improvements" statement. If TF are calculated by instructions per clock multiplied by clock, then how can you get 12TF on a 12TF machine if you aren't getting the same instructions per clock.

I'm not entirely sure how much of the virtualization is having an effect here. Each game ships linked to the XDK it was built with, so as the XDK improves those improvements don't go backwards and break older titles. I don't know how much of this is an issue of complexity for just running every single type of XDK under a single banner, even X1X doesn't do this. So it's hard to say.

Games that don't run the X1X code path ultimately have to emulate esram in there. Some form of emulation is required eventually, so I suspect largely the way that MS packages titles and games may have to do with the fact that they can't run things natively. A good example is 2013 titles, in which they have DX11 fast semantics which is a specific format of XBO that lasted only for several months before the overhaul to DX12. Emulation is going to have to be required for those APIs I believe.

Drivers may also be significantly more optimized for the hardware when in comes to interpretation of the APIS. Which is probably the critical translation point of where emulation is probably required.

So while I do agree that APIs should be universal as they are on PC, the reality is that the drivers are the ones that bring performance with respect to API calls. On consoles, drivers will get extremely discrete with only having to deal with 1 or 2 profiles, so this may be a impediment to directly running the same APIs on newer hardware.. ie: the way the driver performs with respect to API calls may cause stability issues and the most important aspect for BC is to keep legacy performance as it was coded.

Deleted member 11852 · May 21, 2021

BRiT said:
Maybe it's a matter of lack of quality assurance time needed to guarantee the consumers will not encounter any issues whatsoever, so they took a safer approach?

Yeah, it could well be it was quicker/safer to port whatever Xbox-equivalent of the Xbox One GPU driver (Mono?) to Series S|X and a native modern GPU driver is the work-in-progress. At some point you just expect that any old game that called a GCN functions will call a better-performaning RDNA function that is compatible. And by better-performing I mean other than just higher clocks. There are piles of architectural improvements between GCN1.1 and RDNA that should lead to even better performance than just block-boosting offers.

iroboto said:
I'm not entirely sure how much of the virtualization is having an effect here. Each game ships linked to the XDK it was built with, so as the XDK improves those improvements don't go backwards and break older titles. I don't know how much of this is an issue of complexity for just running every single type of XDK under a single banner, even X1X doesn't do this.

Better IPC is often a nascent improvement generally leveraged through better architecture. I wouldn't expect all games to suddenly take advantage of more CUs although some might, depending on how they're managing compute jobs and how many are now accessible/visible to old code running on new hardware. But internal architectural improvements of the CPU and GPU? ¯\_(ツ)_/¯

iroboto · May 21, 2021

DSoup said:
Yeah, it could well be it was quicker/safer to port whatever Xbox-equivalent of the Xbox One GPU driver (Mono?) to Series S|X and a native modern GPU driver is the work-in-progress. At some point you just expect that any old game that called a GCN functions will call a better-performaning RDNA function that is compatible. And by better-performing I mean other than just higher clocks. There are piles of architectural improvements between GCN1.1 and RDNA that should lead to even better performance than just block-boosting offers.

Better IPC is often a nascent improvement generally leveraged through better architecture. I wouldn't expect all games to suddenly take advantage of more CUs although some might, depending on how they're managing compute jobs and how many are now accessible/visible to old code running on new hardware. But internet architectural improvements of the CPU and GPU? ¯\_(ツ)_/¯

Certainly an interesting discussion with no real answer I can think of. I wonder if @Dictator could bring this up next time they have a chat with Ronald/Xbox team, to find out why these BC games can't just all run native considering how the APIs should work. I'm not expecting them to answer, but I'd be curious to see if they'd be interesting in try to clarify it.

Kugai Calo · May 21, 2021

DSoup said:
You and me both. I don't understand with their virtualised game environment and APIs why last-gen game code isn't taking advantages of architectural improvements of RDNA versus GCN. But the 12Tf but no IPC improvements is perplexing. How does it make sense? I don't get it.

It seems your confusion is stemmed from a lack of understanding of GPU and particularly RDNA GPUs, let me explain:

Using the GPU is no longer like the early days of 3D graphics API where it’s all API calls, these days you write little programs that run on the GPU CUs/SMs called shader/kernel.

RDNA GPUs have two execution modes, wave32 and wave64, where the latter imitates GCN semantics and the former is RDNA-native. In wave32 mode the SIMD lane count in a thread(wavefront, or in Nvidia term, warp) is only 32 as opposed to GCN’s 64, and that’s the most critical part of how RDNA achieves 25% IPC improvement. The two modes are not binary compatible, in fact the RDNA 2 ISA manual states the very thing:
https://developer.amd.com/wp-content/resources/RDNA2_Shader_ISA_November2020.pdf#page15

Both wave sizes are supported for all operations, but shader programs must be compiled for a particular wave size.

Again this is similar to (but not the same as) ARMv8 CPUs having both 64-bit mode and 32-bit mode or x64 CPUs having both x64 mode and x86 mode. Binary incompatibility is the key here.

You can always learn more about RDNA’s architectural improvements by reading the RDNA Whitepaper: https://www.amd.com/system/files/documents/rdna-whitepaper.pdf

Kugai Calo · May 21, 2021

To add to my previous post:

GCN code running on RDNA GPU may receive some IPC improvement due to the miscellaneous architectural improvements such as the redesigned cache hierarchy, but not all of the fat 25%. Therefore it's entirely accurate for DF to state "you *aren't* seeing the architectural improvements of RDNA 2".

Why have you never heard about such thing on PC? That's because on PC games are pretty much always shipped with shader IR (Intermediate Representation, e.g. DXIL, which is in fact LLVM bitcode) or even shader source, not binary. At load time, the game calls the user mode GPU driver through the 3D API to compile the shader IR into GPU native instructions and possibly upload it to the GPU. For example on D3D12 and Metal, this is done by creating a Pipeline State Object (ID3D12PipelineState / MTLRenderPipelineState). To compile from source, you'll call some other D3D12 utility function(s). This is obviously necessary as on PC you have to deal with all kind of GPUs, one example is Nvidia Pascal vs. Nvidia Volta, the latter may be perceived by the user as a Zen 2 to Zen 1 kind of architectural improvement, but actually the ISAs are totally different beasts.

When Microsoft refers to its approach as virtualization, it's more of a Hyper-V / KVM / ESXi kind of virtualization, rather than QEMU kind of virtualization. FreeBSD's jail provides a Docker kind of virtualization and is used on the PS4.

see colon · May 22, 2021

BRiT said:
Obligatory vehicles analogy with totally made up numbers instruction mix ratios. Both vehicles are identical in their total volume capacity and the speed in which it can travel from Point A to Point B. What's critical is how the capacity can be scheduled for use. The passenger to luggage mix will determine how much can be moved from Point A to Point B.

Vehicle A) A GCN vehicles with 7 capacity slots per row, and only 4 of the slots can be used by passengers with the remaining 3 slots used for luggage.
Vehicle B) A RDNA2 vehicles with 7 capacity slots per row, and all slots can be used by passengers or luggage (this is the IPC increase ?).

Hmm... I guess. Not sure I would count that as an IPC increase if we are talking purely theoretical performance, which would be implied if you are using the 12TF number anyway. But yeah, an IPC increase in achievement would clearly be possible.

Deleted member 11852 · May 22, 2021

Kugai Calo said:
GCN code running on RDNA GPU may receive some IPC improvement due to the miscellaneous architectural improvements such as the redesigned cache hierarchy, but not all of the fat 25%. Therefore it's entirely accurate for DF to state "you *aren't* seeing the architectural improvements of RDNA 2".

Thank you for both posts - really interesting stuff. I guess I'm showing my GPU programming age!

Kugai Calo · May 22, 2021

DSoup said:
Thank you for both posts - really interesting stuff. I guess I'm showing my GPU programming age!

Ah I see. These days it's really more about the shader/kernel. The term "shader" is usually used under a graphics context and "kernel" is used under a compute context. If you code in Nvidia's CUDA, it's basically all about writing kernels, the compiler and runtime manage so much for you.

iroboto · May 22, 2021

It does just sound more an issue that shaders are compiled and shipped with their games as opposed to PC where shaders compile at run time.

And so the reason for BC is because there’s no way to recompile shaders at run time so you’re forced to use the compiled shaders for an older architecture and therefore forced to run BC modes.

PSman1700 · May 22, 2021

Amid Evil Developer Stream - RTX Upgrades Tested + Discussed!

Looks amazing nice stream/analysis.

London Geezer · May 22, 2021

PSman1700 said:
Amid Evil Developer Stream - RTX Upgrades Tested + Discussed!

Looks amazing nice stream/analysis.

Super cool game I’d love to play but I didn’t find the RT implementation super impressive. Strange, maybe it’s because of the otherwise simplistic old-school graphics? Could also be the quality of the stream.

Digital Foundry Article Technical Discussion [2021]

Deleted member 11852

Guest

Kugai Calo

Deleted member 11852

Guest

PSman1700

Allandor

see colon

All Ham & No Potatos

Dictator

Deleted member 11852

Guest

BRiT

(>• •)>⌐■-■ (⌐■-■)

iroboto

Daft Funk

Deleted member 11852

Guest

iroboto

Daft Funk

Kugai Calo

Kugai Calo

see colon

All Ham & No Potatos

Deleted member 11852

Guest

Kugai Calo

iroboto

Daft Funk

PSman1700

London Geezer

Similar threads