AMD: Navi Speculation, Rumours and Discussion [2019-2020]

Discussion in 'Architecture and Products' started by Kaotik, Jan 2, 2019.

Thread Status:
Not open for further replies.
  1. Frenetic Pony

    Regular

    Joined:
    Nov 12, 2011
    Messages:
    807
    Likes Received:
    478
    Maybe a big departure in terms of how each ALU is built, re-arrange the gate configurations for better instruction throughput. The instructions per cycle to each compute unit have gone from up to 4 to up to 7. So one supposes either there was some limitation with instruction issue or instructions now retire faster so the limitation can be lifted usefully. Either way that suggest an efficiency gain already.

    As for the difference in CUs per engine, well I don't know what to make of it because it could be a Series X only decision. The memory bus is unified obviously so one assumes it's more about scheduling and other whatnot, it may not have an effect on other RDNA2 dies. Though heck, maybe AMD will use GDDR6x as well and just throw 7 dcus per, the bandwidth should be there if the clockspeed is kept within reason. Sidenote: thank you MS employee that agreed with me on that terminology, fucking shoutout for DCU/Double Compute Unit.

    Anyway, the only other interesting bit is the cost of the die per mm^2. The die sizes for both consoles are within ballpark of the previous gen, and the PS4 die was estimated at $100. With inflation that's $111, but I wonder how much more it costs? This estimates that it costs a bit more than 150% relative. Do the dies cost $170 or so then? Whatever, not directly relevant to RDNA2 arch, but it shows that with dies not getting super small or anything RDNA2 isn't going to be any cheaper than 1, but that wasn't hard to guess.

    Most of the rest of the stuff seems to be already known or too hard to parse. "Rays per second max" versus Nvidia's arbitrary "About this many rays" is like, whatever. Even looking at this and trying to work it out made me shake my head, too many unknowns. As for some of the other bulletpoints, rather boring. I guess the new HDR format is neat for obsessive 120hz reaching, but otherwise is rather useless. Please don't force correlate my color channels with a shared exponent, that's just weird. Besides, there's already usage of extended precision tricks for 10 bit mantissa of 16bit floats, losing another bit isn't helping.
     
  2. yuri

    Regular

    Joined:
    Jun 2, 2010
    Messages:
    283
    Likes Received:
    296
    There was a slide stating the power consumption of the SoC was equal to 2x XBox One X resulting in 200W TDP. Given the presence of 8c Zen 2 eating ~50W and other XBox-specific stuff maybe 5W.

    This gives us a RDNA2 52CUs at 1.825GHz using only less than 150W TDP. This is pretty good, given the used process is marginally better than 40CU Navi 10 featuring over 200W TDP.

    What is even more weird is the high level architecture is like RDNA1 + RT. No major changes, apparently. So is this all the low level circuity magic?
     
  3. Bondrewd

    Veteran

    Joined:
    Sep 16, 2017
    Messages:
    1,682
    Likes Received:
    846
    It's a bit lower than that.
    They're statically allocated to top out at 57-60W worst case.
    Not precisely.
    Not talking about them yet.
    A lot of it, yes.
     
    function likes this.
  4. Rootax

    Veteran

    Joined:
    Jan 2, 2006
    Messages:
    2,401
    Likes Received:
    1,845
    Location:
    France
    I believe we will have to wait for the pc part, it's not out of the question that consoles SoC and PC part are differents, even if it's not by much. But to be fair, I always thought magic was a big part of GPUs. I was laughed at for thinking that.
     
  5. xpea

    Regular

    Joined:
    Jun 4, 2013
    Messages:
    551
    Likes Received:
    783
    Location:
    EU-China
    Of course they are different. RDNA2 in consoles is a crippled version. On desktop, the cache is magnitude bigger. In fact, same as the CPU. That's why we can't extrapolate desktop RDNA2 from Console RDNA2...
     
  6. Bondrewd

    Veteran

    Joined:
    Sep 16, 2017
    Messages:
    1,682
    Likes Received:
    846
    No, just a bit more ALU-heavy than usual.
    It's just Renoir there.
    Yes you can.
     
    disco_ and function like this.
  7. xpea

    Regular

    Joined:
    Jun 4, 2013
    Messages:
    551
    Likes Received:
    783
    Location:
    EU-China
    You're wrong. Cache plays a big role in TDP.

    No you can't. Stop your FUD.
     
  8. Bondrewd

    Veteran

    Joined:
    Sep 16, 2017
    Messages:
    1,682
    Likes Received:
    846
    Oh lord.
    It gets worse.
     
    disco_ and function like this.
  9. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,242
    Likes Received:
    3,405
    XBX is 100W?
     
  10. Nebuchadnezzar

    Legend

    Joined:
    Feb 10, 2002
    Messages:
    1,061
    Likes Received:
    328
    Location:
    Luxembourg
    Bigger caches reduce TDP if anything.
    You provide no technical reason as to why you can't do that. It's the same silicon process, same process flows, designs, with only a minor microarchitectural unit balance change. Extensive re-use is why AMD can even put out 4 different designs within the same 6 month timespan in the first place.
     
  11. yuri

    Regular

    Joined:
    Jun 2, 2010
    Messages:
    283
    Likes Received:
    296
    My bad, the original XBox One had 95W TDP.

     
    Lightman and DegustatoR like this.
  12. Krteq

    Newcomer

    Joined:
    May 5, 2020
    Messages:
    149
    Likes Received:
    263
    1. According to those Microsoft's slides, BVH traversal is sunning in parallel to other shaders operations - so, no utilization conflicts or "slowdown" there
    2. Yes, but where in the current DXR graphics pipeline you need to do RT operations at the same time as texturing operations?
     
  13. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,242
    Likes Received:
    3,405
    175W x2 seems a bit excessive but 250-300W looks more reasonable for XSX. That's total though so the SoC should be some 50-75W lower, I guess, which gives us what, 175-250W? So similar ballpark.

    Ray intersection testing is performed in parallel, BVH traversal is handled by a shader. This is different to Turing where BVH traversal is handled by RT h/w too.
     
    pharma, Krteq and DavidGraham like this.
  14. Krteq

    Newcomer

    Joined:
    May 5, 2020
    Messages:
    149
    Likes Received:
    263
    Yep, my bad

    From the AMD RT patent
     
  15. function

    function None functional
    Legend

    Joined:
    Mar 27, 2003
    Messages:
    5,854
    Likes Received:
    4,406
    Location:
    Wrong thread
    Apologies if I've misunderstood what you're saying here, but I think it works as following:

    XSX has a 320-bit bus, with 5 x 64-bit controllers. Each controller has it's own L2 with 4 slices, so 5 x 4 makes the 20 slices. Which also fits the 20 memory channels MS described.

    5MB total L2 fits with 1MB L2 for each of the five controllers.

    Even if the L1's can't request more than 4 accesses per cycle, you'll still need the 5 controllers, each with their four L2 slices to manages the 320-bit bus. Couldn't compute can bypass the L1 and make full use of the L2 bandwidth though ... (genuine question)?

    I haven't seen anything about Big Navi L2 cache, but if it's a 384-bit bus shouldn't it be 6 controllers and therefore 24 L2 slices?
     
    BRiT likes this.
  16. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,242
    Likes Received:
    3,405
    Couple other interesting points from Hot Chips presentation:

    * "CUs have 25% better perf/clock compared to last gen" - that's compared to GCN so doesn't look like there will be anything more than a single digit perf/clock gain between RDNA1 and 2.
    * VRS (tier 2) support is limited to 2x2 coarse shading. Turing supports up to 4x4.
     
  17. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    #2577 CarstenS, Aug 18, 2020
    Last edited: Aug 18, 2020
  18. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
    Does that mean the RT units in RDNA2 also share common data path with the TMUs, i.e. blocking each other?

    Turing's RT core apparently sits on it's own bypass network:

     
    disco_, pharma and DavidGraham like this.
  19. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,465
    Location:
    Finland
    Is it though? XSX for something along the lines of Polaris, Vega IIRC had better IPC than Polaris and RDNA1 offered ~25% IPC over Vega, didn't it? So it could very well be compared to RDNA1 instead of GCN~4
     
  20. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,242
    Likes Received:
    3,405
    This was stated in the presentation - you can either do a texture fetch or ask the RT h/w to trace a ray through a box/scene. The TMU/RT h/w itself is likely independent but its clearly using the same data path and is probably using same caches. It's likely a good trade off between h/w complexity and s/w flexibility for consoles in particular.

    Vega had worse IPC than Polaris IIRC.
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...