Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

j^aws · Nov 29, 2020

Shortbread said:
CU count x Ops per second x ROPs x Core Frequency = total floating points operations

XBSX: 52*2*64*1.825= 12.15TF (rounded)
PS5: 36*2*64*2.23= 10.28TF (rounded)

The final numbers are correct, however, the formula is incorrect.

ROPs aren't used to calculate CU floating point capabilities as they are separate fixed function execution units.

The 2x64 component comes from:

- 64 as the number of shader cores per CU
- 2 as 2 operations counted from Fused Multiply and Accumulate instruction - a multiply and add, aka FMAC, FMA, FMADD

Shompola said:
So the narrative now is xsx uses zen 2 without modifications beyond l3 reduced to 8mb and its gpu is an rdna1 derivative, i.e. Cache and cu setup while ps5 uses zen3 alike l3 cache and it's gpu is rdna2 derivativ e i.e. Cache and cu setup?????

fehu said:
No, it's wrong, it's the rdna3 that's a ps5 derivate.

We are discussing driver leaks and patents. Can you be more technical and specific?

Shompola · Nov 29, 2020

J^aws, I understand. But when you discuss these driver and patent info it sounds like the narrative now is what I summarized above. You are not the only one who believes this is the case and could explain the perf advantage ps5 currently has in multiplat games. Do you agree?

j^aws · Nov 29, 2020

Shompola said:
J^aws, I understand. But when you discuss these driver and patent info it sounds like the narrative now is what I summarized above. You are not the only one who believes this is the case and could explain the perf advantage ps5 currently has in multiplat games. Do you agree?

No, I don't agree. This thread has in its title "speculation".

I haven't seen any discussion around the driver leak and relevant patents (mostly because they are technical). But when Github revealed a simplified metric - the infamous Tera Flop numbers, everyone on the Internet ran with it.

We don't have a block diagram for PS5 yet, but we do for XSX and Navi21. There are plenty of details not confirmed for PS5.

PS5 has 22% faster GPU clocks for its fixed-function units, so there are other explanations. Also Amdahl's Law kicks in for Asynchronous Compute and fewer cores.

Shompola · Nov 29, 2020

J^aws, Thanks for taking your time and replying. Been absent from gaming sphere for almost 15 years and good to see people taking their time to explain things.

iroboto · Nov 29, 2020

j^aws said:
No, I don't agree. This thread has in its title "speculation".

I haven't seen any discussion around the driver leak and relevant patents (mostly because they are technical). But when Github revealed a simplified metric - the infamous Tera Flop numbers, everyone on the Internet ran with it.

We don't have a block diagram for PS5 yet, but we do for XSX and Navi21. There are plenty of details not confirmed for PS5.

PS5 has 22% faster GPU clocks for its fixed-function units, so there are other explanations. Also Amdahl's Law kicks in for Asynchronous Compute and fewer cores.

post the driver leak. This may have just flew below the radar. If it can't be explained easily, most people will tune out. Github leaks moved forwards because DF was able to explain the whole story end to end and verified information with those that obtained it. It made it an easier leak to follow along. I was largely ignoring the github leaks until then because you needed to follow all sorts of Codenames and I just wasn't going to bother.

Shortbread · Nov 29, 2020

j^aws said:
The final numbers are correct, however, the formula is incorrect.

ROPs aren't used to calculate CU floating point capabilities as they are separate fixed function execution units.

The 2x64 component comes from:

- 64 as the number of shader cores per CU
- 2 as 2 operations counted from Fused Multiply and Accumulate instruction - a multiply and add, aka FMAC, FMA, FMADD

We are discussing driver leaks and patents. Can you be more technical and specific?

Fun that's how I always calculated TF from prior AMD/Nvidia web-based discussions/docs. But hey, you learn something new every day.

j^aws · Nov 29, 2020

iroboto said:
post the driver leak. This may have just flew below the radar. If it can't be explained easily, most people will tune out. Github leaks moved forwards because DF was able to explain the whole story end to end and verified information with those that obtained it. It made it an easier leak to follow along. I was largely ignoring the github leaks until then because you needed to follow all sorts of Codenames and I just wasn't going to bother.

I posted details a few pages back.

Here's my post, where I started discussing:
https://forum.beyond3d.com/posts/2178977/

Poster, @Digidi summarised the driver leaks here:
https://forum.beyond3d.com/posts/2176653/

Poster, @tinokun made a nice table below:

Code:

                Property Navi10 Navi14 Navi12 Navi21Lite Navi21 Navi22 Navi23 Navi31
                  num_se      2      1      2          2      4      2      2      4
           num_cu_per_sh     10     12     10         14     10     10      8     10
           num_sh_per_se      2      2      2          2      2      2      2      2
           num_rb_per_se      8      8      8          4      4      4      4      4
                num_tccs     16      8     16         20     16     12      8     16
                num_gprs   1024   1024   1024       1024   1024   1024   1024   1024
         num_max_gs_thds     32     32     32         32     32     32     32     32
          gs_table_depth     32     32     32         32     32     32     32     32
       gsprim_buff_depth   1792   1792   1792       1792   1792   1792   1792   1792
   parameter_cache_depth   1024    512   1024       1024   1024   1024   1024   1024
double_offchip_lds_buffer     1      1      1          1      1      1      1      1
               wave_size     32     32     32         32     32     32     32     32
      max_waves_per_simd     20     20     20         20     16     16     16     16
max_scratch_slots_per_cu     32     32     32         32     32     32     32     32
                lds_size     64     64     64         64     64     64     64     64
           num_sc_per_sh      1      1      1          1      1      1      1      1
       num_packer_per_sc      2      2      2          2      4      4      4      4
                num_gl2a    N/A    N/A    N/A          4      4      2      2      4
                unknown0    N/A    N/A    N/A        N/A     10     10      8     10
                unknown1    N/A    N/A    N/A        N/A     16     12      8     16
                unknown2    N/A    N/A    N/A        N/A     80     40     32     80
      num_cus (computed)     40     24     40         56     80     40     32     80
                Property Navi10 Navi14 Navi12 Navi21Lite Navi21 Navi22 Navi23 Navi31

There was a Tweet by a famous leaker, Yuko Yoshida (@KityYYuko):

Code:

XSX
Front-End: RDNA 1
Render-Back-Ends: RDNA 2
Compute Units: RDNA1
RT: RDNA2

Navi21 Lite is considered XSX. And the driver is comparing it to RDNA1 and RDNA2 GPUs. Front-end for XSX matches RDNA1 - Scan Converters and Packers per Scan Converters (rasterisation); and SIMD waves (CUs) are RDNA1 for XSX and change for RDNA2 GPUs (Navi2x). Navi21 Lite (XSX) has same Render Backends per Shader Engine as RDNA2.

Shortbread · Nov 29, 2020

j^aws said:

I posted details a few pages back.

Here's my post, where I started discussing:
https://forum.beyond3d.com/posts/2178977/

Poster, @Digidi summarised the driver leaks here:
https://forum.beyond3d.com/posts/2176653/

Poster, @tinokun made a nice table below:

Code:

                Property Navi10 Navi14 Navi12 Navi21Lite Navi21 Navi22 Navi23 Navi31
                  num_se      2      1      2          2      4      2      2      4
           num_cu_per_sh     10     12     10         14     10     10      8     10
           num_sh_per_se      2      2      2          2      2      2      2      2
           num_rb_per_se      8      8      8          4      4      4      4      4
                num_tccs     16      8     16         20     16     12      8     16
                num_gprs   1024   1024   1024       1024   1024   1024   1024   1024
         num_max_gs_thds     32     32     32         32     32     32     32     32
          gs_table_depth     32     32     32         32     32     32     32     32
       gsprim_buff_depth   1792   1792   1792       1792   1792   1792   1792   1792
   parameter_cache_depth   1024    512   1024       1024   1024   1024   1024   1024
double_offchip_lds_buffer     1      1      1          1      1      1      1      1
               wave_size     32     32     32         32     32     32     32     32
      max_waves_per_simd     20     20     20         20     16     16     16     16
max_scratch_slots_per_cu     32     32     32         32     32     32     32     32
                lds_size     64     64     64         64     64     64     64     64
           num_sc_per_sh      1      1      1          1      1      1      1      1
       num_packer_per_sc      2      2      2          2      4      4      4      4
                num_gl2a    N/A    N/A    N/A          4      4      2      2      4
                unknown0    N/A    N/A    N/A        N/A     10     10      8     10
                unknown1    N/A    N/A    N/A        N/A     16     12      8     16
                unknown2    N/A    N/A    N/A        N/A     80     40     32     80
      num_cus (computed)     40     24     40         56     80     40     32     80
                Property Navi10 Navi14 Navi12 Navi21Lite Navi21 Navi22 Navi23 Navi31

There was a Tweet by a famous leaker, Yuko Yoshida (@KityYYuko):

Code:

XSX
Front-End: RDNA 1
Render-Back-Ends: RDNA 2
Compute Units: RDNA1
RT: RDNA2

Navi21 Lite is considered XSX. And the driver is comparing it to RDNA1 and RDNA2 GPUs. Front-end for XSX matches RDNA1 - Scan Converters and Packers per Scan Converters (rasterisation); and SIMD waves (CUs) are RDNA1 for XSX and change for RDNA2 GPUs (Navi2x). Navi21 Lite (XSX) has same Render Backends per Shader Engine as RDNA2.

I saw that tweet re-trending again.

https://twitter.com/x/status/1317054744607617025

t0mb3rt · Nov 29, 2020

How can the Xbox use RDNA 1 CUs when the ray tracing hardware is directly tied to the CUs and was non-existent in RDNA 1? Why would Microsoft pay to create an entirely new hardware block (RDNA 1 CU with ray tracing) when RDNA 2 CUs with ray tracing hardware built in had already been designed? Use some common sense.

j^aws · Nov 29, 2020

t0mb3rt said:
How can the Xbox use RDNA 1 CUs when the ray tracing hardware is directly tied to the CUs and was non-existent in RDNA 1? Why would Microsoft pay to create an entirely new hardware block (RDNA 1 CU with ray tracing) when RDNA 2 CUs with ray tracing hardware built in had already been designed? Use some common sense.

Well, CUs do SIMD and Scalar computation and are fully programmable. The Ray Accelerator is a fixed function block and acts as the Intersection engine, works alongside the TMUs for addressing and filtering - and both RA and TMUs can't operate concurrently. So act as a separate block from the SIMD and Scalar blocks (CU), which are responsible for BVH traversal with shaders and can operate independently from TMUs/ RA.

flutter · Nov 29, 2020

t0mb3rt said:
How can the Xbox use RDNA 1 CUs when the ray tracing hardware is directly tied to the CUs and was non-existent in RDNA 1? Why would Microsoft pay to create an entirely new hardware block (RDNA 1 CU with ray tracing) when RDNA 2 CUs with ray tracing hardware built in had already been designed? Use some common sense.

The non-technical answer is that the GPU is custom to meet MS' cost and needs. Expecting something similar with PS5.

The issue is console warring has gotten into such a way that this is seen as bad and as a point to lose in internet arguments. Who's to say it is? Both are made to hit a 499 price point with a certain performance target.

Deleted member 7537 · Nov 29, 2020

Another RGT video.

I think this is the most interesting bits:

Sony doesn't want to talk too much about tech after the Road to PS5 talk backlash.
About the CPU
- PS5's CPU L3 cache is unified but it's only 8MB (confirmed by 2 sources according to him). ~~That's half what Zen2 has~~ , -> Zen2 has 16MB per CCX, so it's actually 1/4 according to him. edit: the desktop version, the mobile cpu has indeed 8MB of L3, 4MB per CCX, which I believe also came up during the hot chips XB presentation.
- 3.5Ghz is with SMT enabled, there is no option to disable multi threading.
- One core is dedicated to operating system
The DDR4 chip is for SSD caching and OS tasks, developers will completely ignore this.
About the GPU
- RDNA2 based.
- Does a good job staying at peak frequencies, around 95% of the time, even when the CPU is peaking as well.
- PS5's compute unit architecture is pretty much the same as that in the desktop implementation of RDNA 2 and the Series X.
- Sampler Feedback is missing, saying this is what the Italian Sony engineers meant when he said "It's based on RDNA2, but it has more features and I think one less".
  - He says there are other tools/methods with similar results but harder to implement.
- About the Geometry Engine.
  - Manages geometry and shading.
  - The primitive shaders are not the sames as RNDA1.
  - They can use Mesh Shaders.
  - The GE allows for a lot of optimization like culling very early in the pipeline.
  - It's critical to achieve performance, VRS runs with "extreme precision" on GE.
  - Double edge sword, can be transparent for the developers but if the engines are not customized to exploit its capabilities a lot of performance is left on the table.
  - Supposedly, UE5 (nanite) makes good use of the GE.
Caches scrubber on the GPU boot invalid or old instructions automatically, freeing up cache space very efficiently and provide improved performance.
The Tempest Engine is being used not only for audio but for physics as well.
OS and API are very similar, with minimal changes to include new PS5 features.
3rd parties received early devkits on Q3 2019.

Shompola · Nov 29, 2020

Another question is how big of a deal it is that those parts are rdna1 derivates. I guess there are in depth articles about the differences between rdna 1 and rdna2? Also makes that twit comment about ps5 gpu is a mixture of rdna 1 and rdna 2 tech a bit more realistic. xsx gpu seems to be that. However I am a bit confused also. In some ms xsx presentation the L0 cache was per CU. is that the case with rdna1? Maybe I misread.

Deleted member 7537 · Nov 29, 2020

Shompola said:
Another question is how big of a deal it is that those parts are rdna1 derivates. I guess there are in depth articles about the differences between rdna 1 and rdna2? Also makes that twit comment about ps5 gpu is a mixture of rdna 1 and rdna 2 tech a bit more realistic. xsx gpu seems to be that. However I am a bit confused also. In some ms xsx presentation the L0 cache was per CU. is that the case with rdna1? Maybe I misread.

L0 is per CU, L1 is per SA. I think in both architectures.

Shompola · Nov 29, 2020

I went back and checked. You are right.

PSman1700 · Nov 29, 2020

flutter said:
The non-technical answer is that the GPU is custom to meet MS' cost and needs. Expecting something similar with PS5.

The issue is console warring has gotten into such a way that this is seen as bad and as a point to lose in internet arguments. Who's to say it is? Both are made to hit a 499 price point with a certain performance target.

What you mean, that both consoles are not fully RDNA2?

chris1515 · Nov 29, 2020

jayco said:
Another RGT video.

I think this is the most interesting bits:

Sony doesn't want to talk too much about tech after the Road to PS5 talk backlash.

About the CPU

PS5's CPU L3 cache is unified but it's only 8MB (confirmed by 2 sources according to him). ~~That's half what Zen2 has~~ , -> Zen2 has 16MB per CCX, so it's actually 1/4 according to him. edit: the desktop version, the mobile cpu has indeed 8MB of L3, 4MB per CCX, which I believe also came up during the hot chips XB presentation.

3.5Ghz is with SMT enabled, there is no option to disable multi threading.

One core is dedicated to operating system

The DDR4 chip is for SSD caching and OS tasks, developers will completely ignore this.

About the GPU

RDNA2 based.

Does a good job staying at peak frequencies, around 95% of the time, even when the CPU is peaking as well.

PS5's compute unit architecture is pretty much the same as that in the desktop implementation of RDNA 2 and the Series X.

Sampler Feedback is missing, saying this is what the Italian Sony engineers meant when he said "It's based on RDNA2, but it has more features and I think one less".

He says there are other tools/methods with similar results but harder to implement.

About the Geometry Engine.

Manages geometry and shading.

The primitive shaders are not the sames as RNDA1.

They can use Mesh Shaders.

The GE allows for a lot of optimization like culling very early in the pipeline.

It's critical to achieve performance, VRS runs with "extreme precision" on GE.

Double edge sword, can be transparent for the developers but if the engines are not customized to exploit its capabilities a lot of performance is left on the table.

Supposedly, UE5 (nanite) makes good use of the GE.

Caches scrubber on the GPU boot invalid or old instructions automatically, freeing up cache space very efficiently and provide improved performance.

The Tempest Engine is being used not only for audio but for physics as well.

OS and API are very similar, with minimal changes to include new PS5 features.

3rd parties received early devkits on Q3 2019.

At least this is more precise. We just need to wait a photo of the APU if the sources are wrong we will have two 4MB of SRAM module for the L3 on the CPU.

Deleted member 7537 · Nov 29, 2020

chris1515 said:
At least this is more precise. We just need to wait a photo of the APU if the sources are wrong we will have two 4MB of SRAM modile on the CPU.

I cannot imagine how you can have unified L3 with AMD's chiplet design. Seems like a massive customization of the Zen2 CCX. I guess we'll know soon enough.

_Enigma_ · Nov 29, 2020

http://www.pcmanias.com/patente-de-...cados-e-com-um-equivalente-ao-infinity-cache/

Deleted member 7537 · Nov 29, 2020

_Enigma_ said:
http://www.pcmanias.com/patente-de-...cados-e-com-um-equivalente-ao-infinity-cache/

Beyond3D -> Webpage in Portuguese -> Beyond3D

Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

j^aws

Shompola

j^aws

Shompola

iroboto

Daft Funk

Shortbread

Island Hopper

j^aws

Shortbread

Island Hopper

t0mb3rt

j^aws

flutter

Deleted member 7537

Guest

Shompola

Deleted member 7537

Guest

Shompola

PSman1700

chris1515

Deleted member 7537

Guest

_Enigma_

Deleted member 7537

Guest

Similar threads