AMD: Navi Speculation, Rumours and Discussion [2019]

Discussion in 'Architecture and Products' started by Kaotik, Jan 2, 2019.

  1. Jay

    Jay
    Veteran Regular

    Joined:
    Aug 3, 2013
    Messages:
    3,065
    Likes Received:
    2,292
    You think just being cheaper is enough?
    It's not really helped then in the past.
     
  2. SimBy

    Regular Newcomer

    Joined:
    Jun 21, 2008
    Messages:
    613
    Likes Received:
    279
    How about just being actually able to buy one.
     
    Kej, Picao84, no-X and 2 others like this.
  3. Rootax

    Veteran Newcomer

    Joined:
    Jan 2, 2006
    Messages:
    1,710
    Likes Received:
    1,072
    Location:
    France
    I disagree here, not in 2020-2021... And with consoles having RT too.
     
  4. tEd

    tEd Casual Member
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,104
    Likes Received:
    70
    Location:
    switzerland
    Ampere 2 triangles/clock and shading concurrently.
    Turing 1 triangle/clock
    Navi 1 triangle/clock and also shading concurrently

    ..but there is much more to RT performance than just those numbers i guess
     
  5. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    11,477
    Likes Received:
    6,216
    Then your suggestion would be that Navi 21 quadruples the compute units over Navi 10, but not all the other execution units in the GPU? That would make RDNA2 a compute-centric architecture, but that's most probably not going to happen.
    AMD isn't going to focus RDNA2 on compute throughput because they already have CDNA / Arcturus for that.

    The way I see it, RDNA2 has a lot of on-chip cache to compensate for a lower bandwidth towards the VRAM, whereas CDNA focuses more die area on compute units with less on-chip cache because it uses HBM2.
     
  6. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    1,795
    Likes Received:
    713
    Location:
    msk.ru/spb.ru
    So about 650W for 160 CUs at 2GHz GPU then? Cmon, guys, die area hasn't been the main limiter of a GPU performance for years now.
     
  7. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,100
    Likes Received:
    1,186
    Location:
    London
    NVidia can build a GPU with 10752 FP32 ALU lanes but AMD can't build a GPU with 10240 ALU lanes on a better node?
     
    Krteq, PSman1700 and Lightman like this.
  8. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    14,406
    Likes Received:
    6,030
    5700XT w/ 1755 MHz game clock
    9 TFLOPS
    112 GPixel/s

    72 CU w/ 2100 MHz game clock
    19.4 TFLOPS (2.2x 5700XT)
    268 GPixel/s (128 rops, 2.4x 5700XT)

    80 CU w/ 2100 MHz game clock
    21.5 TFLOPS (2.4x 5700XT)
    268 GPixel/s (128 rops, 2.4x 5700XT)

    5700XT benchmarks:
    borderlands 3 4k ultra ~33 fps
    gears 5 4k ultra ~39 fps

    Scaling from AMDs sample benchmarks:
    borderlands 3 4k badass 61 (greater than 1.85x scaling, because this is badass and not ultra)
    Gears 5 4k ultra 73 (1.88x scaling)

    If the samples are the 72CU unit then 80CU extrapolates to the following assuming perfect scaling:
    borderlands 3 4k badass 61 -> ~68 fps
    Modern Warfare 4k ultra 88 -> ~98 fps
    Gears 5 4k ultra 73 -> ~81 fps

    If the samples are the 80CU unit then the 72CU extrapolates to the following assuming perfect scaling:
    borderlands 3 4k badass 61 -> ~55 fps
    Modern Warfare 4k ultra 88 -> ~79 fps
    Gears 5 4k ultra 73 -> ~66 fps

    The numbers they showed match up pretty close to a 3080, so if it's a 72CU then the 80CU will match pretty closely with a 3090. If the numbers are for the 80CU, then the 72CU will be well behind the 3080.
     
  9. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,847
    Likes Received:
    1,044
    Location:
    New York
    I don’t think ALU lanes are the issue. It’s 160 CUs vs 84 SMs. Not even in the same ballpark.
     
  10. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    11,477
    Likes Received:
    6,216
    Arcturus has 8192 FP32 ALU lanes, probably with 1:2 FP64 throughput. And if RDNA2's clocks are any indication, it too should clock at around 2GHz for >32 TFLOPs.



    I honestly don't get why gaming Ampere has so many ALUs - much more than the GA100 that is compute-oriented. It definitely doesn't translate into gaming performance.
    That said, I don't know why AMD would follow suit. They tried their hand at using chips with lots of compute units to compete in the gaming market, and the result was a chip with comparatively low power efficiency (Vega 10).

    RDNA is gaming-centric, so it shouldn't have more compute resources than what the other execution units and effective memory bandwidth can keep up with, for rasterization.
     
    BRiT likes this.
  11. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,345
    Likes Received:
    174
    Location:
    San Francisco
    Yep, I’ve seen that image posted on twitter and some of the numbers were completely wrong.
     
  12. SimBy

    Regular Newcomer

    Joined:
    Jun 21, 2008
    Messages:
    613
    Likes Received:
    279


    Like what? More like Zen? Boost as high as possible as long as you're within temp/power envelope?
     
    Lightman likes this.
  13. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,100
    Likes Received:
    1,186
    Location:
    London
    80 WGPs with 4x SIMD-32s vs 84 SMs with 8x SIMD-16s. Hmm...

    Are you saying that 4 TMU lanes in an SM versus 8 TMU lanes in a WGP is a major factor here? Is there something else? I can't read your mind.

    There's a spec? There's a die size?

    RDNA looks like it is focussed on being bandwidth-efficient (in the CUs the focus is on minimised cache-thrashing). RDNA 2 looks like a major iteration on that concept, though perhaps the L1 and L2 papers and patent stuff is all already in RDNA.

    What other execution units? ALU:TEX in my proposal is unchanged. My hypothesis includes ALU:colour-fillrate ratio being either doubled or quadrupled. We don't know if it's 128 or 64 ROPs. ALU:zixel-rate is either the same or doubled, because my theory is that AMD will double zixel rate per ROP.

    Sounds like a gaming GPU to me. 80 WGPs is a lot of ray tracing, too.

    If you disagree with 80 WGPs, then you have to explain a massive die that's been seen in a gaming card with GDDR6. 128MB last level cache could be the answer, but it seems really unlikely to me, simply because a monster cache that is almost the same size as all of the CUs (which are about 112mm²) hasn't been seen in XSX (and PS5's die size corresponds to that). If RDNA 2 has a monster cache then that would appear to imply that the consoles have no RDNA 2 features except for ray tracing. That would make them even more horrible than I thought they were...

    Earlier I said XSX CUs are 15% of the die - that's wrong, they're about 31%. Also, I said that there's one L0 per WGP, but in the RDNA whitepaper it shows that 4x TMUs (per CU) have a dedicated L0 ("vector L0", which also caches non-textured memory reads). The instruction and scalar data caches are shared by both CUs in addition to LDS - because a workgroup is the general concept of a shared-state of computation, consisting of multiple wave64s (one to four) or wave32s (one to eight) of work-items.

    I don't know why this guy has credibility for AMD leaks, but here's a fresh one:



    236mm² is 79mm² larger than Navi 14:



    which is 157mm². Just in case you've not been paying attention, 40CUs in Navi 10 take 80mm², or if you prefer, 20 CUs in Navi 14 take 40mm² (there's actually 24).

    So, how the hell does a "rumoured 32 CU" Navi 23 spend 79mm², when 8 more CUs should be about 16mm²?:
    • It has half the huge last level cache size of Navi 21?
    • It has RDNA 2 CUs (not seen in XSX or PS5) which are twice the area of RDNA 1 CUs? Damn, that ray tracing had better be godlike.
     
    NightAntilli, Lightman and PSman1700 like this.
  14. PSman1700

    Veteran Newcomer

    Joined:
    Mar 22, 2019
    Messages:
    3,445
    Likes Received:
    1,364
    Indeed their roughly at half the raw power compared to AMD's own dGPUs at launch. Even less compared to NV's stuff.
     
  15. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,100
    Likes Received:
    1,186
    Location:
    London
    I did a Navi 14 analysis:

    [​IMG]

    It's interesting that 5mm² on such a small die is "edges" (perimeter of die). Capacitor ring? What else?

    Some of the "unknown interior elements" (3mm²) appears to be blank die, where there's no small rectangles of functionality that can squeeze in. Not sure...

    I've decided that "global control" is a better description than "uncore": graphics command processor, geometry processor, ACEs, HWS, DMA. I am suspicious that the real global control area is non-rectangular, "leaking" into the centre of the area that I think of as "(shader) engine common". I haven't found a Navi 10 die shot that has the stunning clarity of the Navi 14 die shot, so I can't make comparisons of function blocks...

    I haven't worked out a way to say what's MC and what's L2 in the die shot. I made some assumptions for Navi 10, but I haven't thought of a way to improve those.
     
    Lightman, PSman1700 and BRiT like this.
  16. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    1,795
    Likes Received:
    713
    Location:
    msk.ru/spb.ru
    It's not about how many lanes you have, it's about your power consumption. NV moved from 16 to 10nm and even with that they are well above 300W now. AMD doesn't even move anywhere with Navi2, it's the same process, and we already know their ballpark perf/watt gain. Lets be realistic here.
     
    A1xLLcqAgt0qc2RyMz0y likes this.
  17. PSman1700

    Veteran Newcomer

    Joined:
    Mar 22, 2019
    Messages:
    3,445
    Likes Received:
    1,364
    Yes, but think that we want or better said need amd to somewhat compete atleast, otherwise we will see NV upping their prices again. Atleast the 20+TF rumor seems very realistic. Which is great i think, before the NV ampere unveil we where guessing 18TF tops for next gen graphics processing units.
     
  18. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    9,332
    Likes Received:
    3,310
    Location:
    Finland
    Actually we don't know which process it is. We know it's 7nm, but we don't know whether it's enhanced N7P or N7+. It's different process compared to at least Xbox SoC, probably PS5 SoC too (at least for Xbox it seems quite clear the "AMD enhanced 7nm" means same node as Zen2 Refresh & Zen 3, which is "enhanced N7"). And I'm thinking it should be given it's "N7P or better" since Navi1x were N7P already.
     
    Lightman likes this.
  19. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    17,330
    Likes Received:
    6,965
    If RDNA2 adopts the changes that MS requested (4x int8 and 8x int4) for Anaconda and Lockhart in their CUs then that would provide more flexibility for ML workloads.

    Regards,
    SB
     
    PSman1700 likes this.
  20. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,100
    Likes Received:
    1,186
    Location:
    London
    I'm trying to be realistic about the use of die area. The only alternative being rumoured for the huge missing area is "massive cache". You have something better? Or do you think it's a monster cache?

    Even a 4096-bit HBM bus in addition to 256-bit GDDR6 leaves a gaping mismatch.

    All of this seems crazy. Clutching at straws, because there's no obviously "realistic" option.

    A 900MHz range in clocks (for non-idle scenarios!) tells us that the GPU will clock down massively when given a particular kind of workload. That seems very likely to be sustained compute, which games are extremely bad at, therefore game clocks will be high.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...