AMD: Navi Speculation, Rumours and Discussion [2019-2020]

Discussion in 'Architecture and Products' started by Kaotik, Jan 2, 2019.

Thread Status:
Not open for further replies.
  1. Jay

    Jay Veteran

    You think just being cheaper is enough?
    It's not really helped then in the past.
     
  2. SimBy

    SimBy Regular

    How about just being actually able to buy one.
     
  3. Rootax

    Rootax Veteran

    I disagree here, not in 2020-2021... And with consoles having RT too.
     
  4. tEd

    tEd Casual Member Veteran

    Ampere 2 triangles/clock and shading concurrently.
    Turing 1 triangle/clock
    Navi 1 triangle/clock and also shading concurrently

    ..but there is much more to RT performance than just those numbers i guess
     
  5. Then your suggestion would be that Navi 21 quadruples the compute units over Navi 10, but not all the other execution units in the GPU? That would make RDNA2 a compute-centric architecture, but that's most probably not going to happen.
    AMD isn't going to focus RDNA2 on compute throughput because they already have CDNA / Arcturus for that.

    The way I see it, RDNA2 has a lot of on-chip cache to compensate for a lower bandwidth towards the VRAM, whereas CDNA focuses more die area on compute units with less on-chip cache because it uses HBM2.
     
  6. DegustatoR

    DegustatoR Veteran

    So about 650W for 160 CUs at 2GHz GPU then? Cmon, guys, die area hasn't been the main limiter of a GPU performance for years now.
     
  7. Jawed

    Jawed Legend

    NVidia can build a GPU with 10752 FP32 ALU lanes but AMD can't build a GPU with 10240 ALU lanes on a better node?
     
    Krteq, PSman1700 and Lightman like this.
  8. Scott_Arm

    Scott_Arm Legend

    5700XT w/ 1755 MHz game clock
    9 TFLOPS
    112 GPixel/s

    72 CU w/ 2100 MHz game clock
    19.4 TFLOPS (2.2x 5700XT)
    268 GPixel/s (128 rops, 2.4x 5700XT)

    80 CU w/ 2100 MHz game clock
    21.5 TFLOPS (2.4x 5700XT)
    268 GPixel/s (128 rops, 2.4x 5700XT)

    5700XT benchmarks:
    borderlands 3 4k ultra ~33 fps
    gears 5 4k ultra ~39 fps

    Scaling from AMDs sample benchmarks:
    borderlands 3 4k badass 61 (greater than 1.85x scaling, because this is badass and not ultra)
    Gears 5 4k ultra 73 (1.88x scaling)

    If the samples are the 72CU unit then 80CU extrapolates to the following assuming perfect scaling:
    borderlands 3 4k badass 61 -> ~68 fps
    Modern Warfare 4k ultra 88 -> ~98 fps
    Gears 5 4k ultra 73 -> ~81 fps

    If the samples are the 80CU unit then the 72CU extrapolates to the following assuming perfect scaling:
    borderlands 3 4k badass 61 -> ~55 fps
    Modern Warfare 4k ultra 88 -> ~79 fps
    Gears 5 4k ultra 73 -> ~66 fps

    The numbers they showed match up pretty close to a 3080, so if it's a 72CU then the 80CU will match pretty closely with a 3090. If the numbers are for the 80CU, then the 72CU will be well behind the 3080.
     
  9. trinibwoy

    trinibwoy Meh Legend

    I don’t think ALU lanes are the issue. It’s 160 CUs vs 84 SMs. Not even in the same ballpark.
     
  10. Arcturus has 8192 FP32 ALU lanes, probably with 1:2 FP64 throughput. And if RDNA2's clocks are any indication, it too should clock at around 2GHz for >32 TFLOPs.



    I honestly don't get why gaming Ampere has so many ALUs - much more than the GA100 that is compute-oriented. It definitely doesn't translate into gaming performance.
    That said, I don't know why AMD would follow suit. They tried their hand at using chips with lots of compute units to compete in the gaming market, and the result was a chip with comparatively low power efficiency (Vega 10).

    RDNA is gaming-centric, so it shouldn't have more compute resources than what the other execution units and effective memory bandwidth can keep up with, for rasterization.
     
    BRiT likes this.
  11. nAo

    nAo Nutella Nutellae Veteran

    Yep, I’ve seen that image posted on twitter and some of the numbers were completely wrong.
     
  12. SimBy

    SimBy Regular



    Like what? More like Zen? Boost as high as possible as long as you're within temp/power envelope?
     
    Lightman likes this.
  13. Jawed

    Jawed Legend

    80 WGPs with 4x SIMD-32s vs 84 SMs with 8x SIMD-16s. Hmm...

    Are you saying that 4 TMU lanes in an SM versus 8 TMU lanes in a WGP is a major factor here? Is there something else? I can't read your mind.

    There's a spec? There's a die size?

    RDNA looks like it is focussed on being bandwidth-efficient (in the CUs the focus is on minimised cache-thrashing). RDNA 2 looks like a major iteration on that concept, though perhaps the L1 and L2 papers and patent stuff is all already in RDNA.

    What other execution units? ALU:TEX in my proposal is unchanged. My hypothesis includes ALU:colour-fillrate ratio being either doubled or quadrupled. We don't know if it's 128 or 64 ROPs. ALU:zixel-rate is either the same or doubled, because my theory is that AMD will double zixel rate per ROP.

    Sounds like a gaming GPU to me. 80 WGPs is a lot of ray tracing, too.

    If you disagree with 80 WGPs, then you have to explain a massive die that's been seen in a gaming card with GDDR6. 128MB last level cache could be the answer, but it seems really unlikely to me, simply because a monster cache that is almost the same size as all of the CUs (which are about 112mm²) hasn't been seen in XSX (and PS5's die size corresponds to that). If RDNA 2 has a monster cache then that would appear to imply that the consoles have no RDNA 2 features except for ray tracing. That would make them even more horrible than I thought they were...

    Earlier I said XSX CUs are 15% of the die - that's wrong, they're about 31%. Also, I said that there's one L0 per WGP, but in the RDNA whitepaper it shows that 4x TMUs (per CU) have a dedicated L0 ("vector L0", which also caches non-textured memory reads). The instruction and scalar data caches are shared by both CUs in addition to LDS - because a workgroup is the general concept of a shared-state of computation, consisting of multiple wave64s (one to four) or wave32s (one to eight) of work-items.

    I don't know why this guy has credibility for AMD leaks, but here's a fresh one:



    236mm² is 79mm² larger than Navi 14:



    which is 157mm². Just in case you've not been paying attention, 40CUs in Navi 10 take 80mm², or if you prefer, 20 CUs in Navi 14 take 40mm² (there's actually 24).

    So, how the hell does a "rumoured 32 CU" Navi 23 spend 79mm², when 8 more CUs should be about 16mm²?:
    • It has half the huge last level cache size of Navi 21?
    • It has RDNA 2 CUs (not seen in XSX or PS5) which are twice the area of RDNA 1 CUs? Damn, that ray tracing had better be godlike.
     
    NightAntilli, Lightman and PSman1700 like this.
  14. PSman1700

    PSman1700 Legend

    Indeed their roughly at half the raw power compared to AMD's own dGPUs at launch. Even less compared to NV's stuff.
     
  15. Jawed

    Jawed Legend

    I did a Navi 14 analysis:

    [​IMG]

    It's interesting that 5mm² on such a small die is "edges" (perimeter of die). Capacitor ring? What else?

    Some of the "unknown interior elements" (3mm²) appears to be blank die, where there's no small rectangles of functionality that can squeeze in. Not sure...

    I've decided that "global control" is a better description than "uncore": graphics command processor, geometry processor, ACEs, HWS, DMA. I am suspicious that the real global control area is non-rectangular, "leaking" into the centre of the area that I think of as "(shader) engine common". I haven't found a Navi 10 die shot that has the stunning clarity of the Navi 14 die shot, so I can't make comparisons of function blocks...

    I haven't worked out a way to say what's MC and what's L2 in the die shot. I made some assumptions for Navi 10, but I haven't thought of a way to improve those.
     
    Lightman, PSman1700 and BRiT like this.
  16. DegustatoR

    DegustatoR Veteran

    It's not about how many lanes you have, it's about your power consumption. NV moved from 16 to 10nm and even with that they are well above 300W now. AMD doesn't even move anywhere with Navi2, it's the same process, and we already know their ballpark perf/watt gain. Lets be realistic here.
     
    A1xLLcqAgt0qc2RyMz0y likes this.
  17. PSman1700

    PSman1700 Legend

    Yes, but think that we want or better said need amd to somewhat compete atleast, otherwise we will see NV upping their prices again. Atleast the 20+TF rumor seems very realistic. Which is great i think, before the NV ampere unveil we where guessing 18TF tops for next gen graphics processing units.
     
  18. Kaotik

    Kaotik Drunk Member Legend

    Actually we don't know which process it is. We know it's 7nm, but we don't know whether it's enhanced N7P or N7+. It's different process compared to at least Xbox SoC, probably PS5 SoC too (at least for Xbox it seems quite clear the "AMD enhanced 7nm" means same node as Zen2 Refresh & Zen 3, which is "enhanced N7"). And I'm thinking it should be given it's "N7P or better" since Navi1x were N7P already.
     
    Lightman likes this.
  19. Silent_Buddha

    Silent_Buddha Legend

    If RDNA2 adopts the changes that MS requested (4x int8 and 8x int4) for Anaconda and Lockhart in their CUs then that would provide more flexibility for ML workloads.

    Regards,
    SB
     
    PSman1700 likes this.
  20. Jawed

    Jawed Legend

    I'm trying to be realistic about the use of die area. The only alternative being rumoured for the huge missing area is "massive cache". You have something better? Or do you think it's a monster cache?

    Even a 4096-bit HBM bus in addition to 256-bit GDDR6 leaves a gaping mismatch.

    All of this seems crazy. Clutching at straws, because there's no obviously "realistic" option.

    A 900MHz range in clocks (for non-idle scenarios!) tells us that the GPU will clock down massively when given a particular kind of workload. That seems very likely to be sustained compute, which games are extremely bad at, therefore game clocks will be high.
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

Loading...