Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Discussion in 'Console Technology' started by Proelite, Mar 16, 2020.

Thread Status:
Not open for further replies.
  1. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,992
    Likes Received:
    137
    The final numbers are correct, however, the formula is incorrect.

    ROPs aren't used to calculate CU floating point capabilities as they are separate fixed function execution units.

    The 2x64 component comes from:

    - 64 as the number of shader cores per CU
    - 2 as 2 operations counted from Fused Multiply and Accumulate instruction - a multiply and add, aka FMAC, FMA, FMADD

    We are discussing driver leaks and patents. Can you be more technical and specific?
     
    thicc_gaf, DSoup, Shortbread and 2 others like this.
  2. Shompola

    Newcomer

    Joined:
    Nov 14, 2005
    Messages:
    197
    Likes Received:
    40
    J^aws, I understand. But when you discuss these driver and patent info it sounds like the narrative now is what I summarized above. You are not the only one who believes this is the case and could explain the perf advantage ps5 currently has in multiplat games. Do you agree?
     
  3. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,992
    Likes Received:
    137
    No, I don't agree. This thread has in its title "speculation".

    I haven't seen any discussion around the driver leak and relevant patents (mostly because they are technical). But when Github revealed a simplified metric - the infamous Tera Flop numbers, everyone on the Internet ran with it.

    We don't have a block diagram for PS5 yet, but we do for XSX and Navi21. There are plenty of details not confirmed for PS5.

    PS5 has 22% faster GPU clocks for its fixed-function units, so there are other explanations. Also Amdahl's Law kicks in for Asynchronous Compute and fewer cores.
     
    Shompola likes this.
  4. Shompola

    Newcomer

    Joined:
    Nov 14, 2005
    Messages:
    197
    Likes Received:
    40
    J^aws, Thanks for taking your time and replying. Been absent from gaming sphere for almost 15 years and good to see people taking their time to explain things.
     
  5. iroboto

    iroboto Daft Funk
    Legend Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    14,834
    Likes Received:
    18,634
    Location:
    The North
    post the driver leak. This may have just flew below the radar. If it can't be explained easily, most people will tune out. Github leaks moved forwards because DF was able to explain the whole story end to end and verified information with those that obtained it. It made it an easier leak to follow along. I was largely ignoring the github leaks until then because you needed to follow all sorts of Codenames and I just wasn't going to bother.
     
    thicc_gaf likes this.
  6. Shortbread

    Shortbread Island Hopper
    Legend

    Joined:
    Jul 1, 2013
    Messages:
    5,632
    Likes Received:
    4,921
    Fun that's how I always calculated TF from prior AMD/Nvidia web-based discussions/docs. But hey, you learn something new every day.
     
    j^aws likes this.
  7. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,992
    Likes Received:
    137
    I posted details a few pages back.

    Here's my post, where I started discussing:
    https://forum.beyond3d.com/posts/2178977/

    Poster, @Digidi summarised the driver leaks here:
    https://forum.beyond3d.com/posts/2176653/

    Poster, @tinokun made a nice table below:
    Code:
                    Property Navi10 Navi14 Navi12 Navi21Lite Navi21 Navi22 Navi23 Navi31
                      num_se      2      1      2          2      4      2      2      4
               num_cu_per_sh     10     12     10         14     10     10      8     10
               num_sh_per_se      2      2      2          2      2      2      2      2
               num_rb_per_se      8      8      8          4      4      4      4      4
                    num_tccs     16      8     16         20     16     12      8     16
                    num_gprs   1024   1024   1024       1024   1024   1024   1024   1024
             num_max_gs_thds     32     32     32         32     32     32     32     32
              gs_table_depth     32     32     32         32     32     32     32     32
           gsprim_buff_depth   1792   1792   1792       1792   1792   1792   1792   1792
       parameter_cache_depth   1024    512   1024       1024   1024   1024   1024   1024
    double_offchip_lds_buffer     1      1      1          1      1      1      1      1
                   wave_size     32     32     32         32     32     32     32     32
          max_waves_per_simd     20     20     20         20     16     16     16     16
    max_scratch_slots_per_cu     32     32     32         32     32     32     32     32
                    lds_size     64     64     64         64     64     64     64     64
               num_sc_per_sh      1      1      1          1      1      1      1      1
           num_packer_per_sc      2      2      2          2      4      4      4      4
                    num_gl2a    N/A    N/A    N/A          4      4      2      2      4
                    unknown0    N/A    N/A    N/A        N/A     10     10      8     10
                    unknown1    N/A    N/A    N/A        N/A     16     12      8     16
                    unknown2    N/A    N/A    N/A        N/A     80     40     32     80
          num_cus (computed)     40     24     40         56     80     40     32     80
                    Property Navi10 Navi14 Navi12 Navi21Lite Navi21 Navi22 Navi23 Navi31
    There was a Tweet by a famous leaker, Yuko Yoshida (@KityYYuko):

    Code:
    XSX
    Front-End: RDNA 1
    Render-Back-Ends: RDNA 2
    Compute Units: RDNA1
    RT: RDNA2
    Navi21 Lite is considered XSX. And the driver is comparing it to RDNA1 and RDNA2 GPUs. Front-end for XSX matches RDNA1 - Scan Converters and Packers per Scan Converters (rasterisation); and SIMD waves (CUs) are RDNA1 for XSX and change for RDNA2 GPUs (Navi2x). Navi21 Lite (XSX) has same Render Backends per Shader Engine as RDNA2.
     
    #5347 j^aws, Nov 29, 2020
    Last edited: Nov 29, 2020
  8. Shortbread

    Shortbread Island Hopper
    Legend

    Joined:
    Jul 1, 2013
    Messages:
    5,632
    Likes Received:
    4,921
    I saw that tweet re-trending again.
     
  9. t0mb3rt

    Newcomer

    Joined:
    Jun 8, 2020
    Messages:
    59
    Likes Received:
    121
    How can the Xbox use RDNA 1 CUs when the ray tracing hardware is directly tied to the CUs and was non-existent in RDNA 1? Why would Microsoft pay to create an entirely new hardware block (RDNA 1 CU with ray tracing) when RDNA 2 CUs with ray tracing hardware built in had already been designed? Use some common sense.
     
    McHuj likes this.
  10. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,992
    Likes Received:
    137
    Well, CUs do SIMD and Scalar computation and are fully programmable. The Ray Accelerator is a fixed function block and acts as the Intersection engine, works alongside the TMUs for addressing and filtering - and both RA and TMUs can't operate concurrently. So act as a separate block from the SIMD and Scalar blocks (CU), which are responsible for BVH traversal with shaders and can operate independently from TMUs/ RA.
     
  11. flutter

    Newcomer

    Joined:
    Apr 25, 2019
    Messages:
    20
    Likes Received:
    22
    The non-technical answer is that the GPU is custom to meet MS' cost and needs. Expecting something similar with PS5.

    The issue is console warring has gotten into such a way that this is seen as bad and as a point to lose in internet arguments. Who's to say it is? Both are made to hit a 499 price point with a certain performance target.
     
    thicc_gaf and PSman1700 like this.
  12. Another RGT video.



    I think this is the most interesting bits:
    • Sony doesn't want to talk too much about tech after the Road to PS5 talk backlash.
    • About the CPU
      • PS5's CPU L3 cache is unified but it's only 8MB (confirmed by 2 sources according to him). That's half what Zen2 has , -> Zen2 has 16MB per CCX, so it's actually 1/4 according to him. edit: the desktop version, the mobile cpu has indeed 8MB of L3, 4MB per CCX, which I believe also came up during the hot chips XB presentation.
      • 3.5Ghz is with SMT enabled, there is no option to disable multi threading.
      • One core is dedicated to operating system
    • The DDR4 chip is for SSD caching and OS tasks, developers will completely ignore this.
    • About the GPU
      • RDNA2 based.
      • Does a good job staying at peak frequencies, around 95% of the time, even when the CPU is peaking as well.
      • PS5's compute unit architecture is pretty much the same as that in the desktop implementation of RDNA 2 and the Series X.
      • Sampler Feedback is missing, saying this is what the Italian Sony engineers meant when he said "It's based on RDNA2, but it has more features and I think one less".
        • He says there are other tools/methods with similar results but harder to implement.
      • About the Geometry Engine.
        • Manages geometry and shading.
        • The primitive shaders are not the sames as RNDA1.
        • They can use Mesh Shaders.
        • The GE allows for a lot of optimization like culling very early in the pipeline.
        • It's critical to achieve performance, VRS runs with "extreme precision" on GE.
        • Double edge sword, can be transparent for the developers but if the engines are not customized to exploit its capabilities a lot of performance is left on the table.
        • Supposedly, UE5 (nanite) makes good use of the GE.
    • Caches scrubber on the GPU boot invalid or old instructions automatically, freeing up cache space very efficiently and provide improved performance.
    • The Tempest Engine is being used not only for audio but for physics as well.
    • OS and API are very similar, with minimal changes to include new PS5 features.
    • 3rd parties received early devkits on Q3 2019.
     
    #5352 Deleted member 7537, Nov 29, 2020
    Last edited by a moderator: Nov 29, 2020
    Pete, RagnarokFF, j^aws and 7 others like this.
  13. Shompola

    Newcomer

    Joined:
    Nov 14, 2005
    Messages:
    197
    Likes Received:
    40
    Another question is how big of a deal it is that those parts are rdna1 derivates. I guess there are in depth articles about the differences between rdna 1 and rdna2? Also makes that twit comment about ps5 gpu is a mixture of rdna 1 and rdna 2 tech a bit more realistic. xsx gpu seems to be that. However I am a bit confused also. In some ms xsx presentation the L0 cache was per CU. is that the case with rdna1? Maybe I misread.
     
    thicc_gaf likes this.
  14. L0 is per CU, L1 is per SA. I think in both architectures.
     
    thicc_gaf and Shompola like this.
  15. Shompola

    Newcomer

    Joined:
    Nov 14, 2005
    Messages:
    197
    Likes Received:
    40
    I went back and checked. You are right.
     
  16. PSman1700

    Legend

    Joined:
    Mar 22, 2019
    Messages:
    7,119
    Likes Received:
    3,093
    What you mean, that both consoles are not fully RDNA2?
     
  17. chris1515

    Legend

    Joined:
    Jul 24, 2005
    Messages:
    7,158
    Likes Received:
    7,966
    Location:
    Barcelona Spain
    At least this is more precise. We just need to wait a photo of the APU if the sources are wrong we will have two 4MB of SRAM module for the L3 on the CPU.
     
    #5357 chris1515, Nov 29, 2020
    Last edited: Nov 30, 2020
    function and thicc_gaf like this.
  18. I cannot imagine how you can have unified L3 with AMD's chiplet design. Seems like a massive customization of the Zen2 CCX. I guess we'll know soon enough.
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...