AMD: RDNA 3 Speculation, Rumours and Discussion

Discussion in 'Architecture and Products' started by Jawed, Oct 28, 2020.

Tags:
  1. Leoneazzurro5

    Leoneazzurro5 Regular

    The answer to that tweet referred to Nvidia first, then to all next-gen. And IIRC no one said Q1/22 for RDNA3. I'd say everyone pointed to 3rs/4th quarter 2022 since the beginning.
     
  2. LordEC911

    LordEC911 Regular

    The only talk of Q4 '21 or Q1 '22 I can remember is a few months back when the regular clickbait rumor sites were posting about tapeouts.

    Edit- Example from wccftech
     
    Lightman likes this.
  3. With so many not able to buy current generation GPUs, it is going to be very hard to release a refreshed line up. Especially if it will be just as hard to buy the new GPUs.
     
  4. Leoneazzurro5

    Leoneazzurro5 Regular



    Basically another confirmation of the specifications known so far
     
    Jawed and pjbliverpool like this.
  5. xpea

    xpea Regular

    n33 6nm 128bit gddr6 perf>6900xt
    With 128 bit bus thus half bandwidth? Hard to believe...
     
  6. Bondrewd

    Bondrewd Veteran

    Yes
    The magick of gigacache!
    Granted not much more perf than 6900XT, but the wattage is also way way lower.
     
    Man from Atlantis and Lightman like this.
  7. Leoneazzurro5

    Leoneazzurro5 Regular

    So in the same discussion the leaker says that the low end cards may appear as a N2X 6nm refresh, due in 2023. If this is true, I'd say that will include only the smallest cards.
     
  8. Bondrewd

    Bondrewd Veteran

    No such thing.
     
  9. Jawed

    Jawed Legend

    Would Navi 33 need 256MB of Infinity Cache to exceed 6900XT and compensate for half the GDDR bus width? I don't think so. 128MB, the same as Navi 21, merely clocked higher should be dominant in terms of overall memory efficiency. And if we assume this GPU is aimed at 1440p gaming (instead of 4K) then we could say that this amount of Infinity Cache is comfortable at this performance level.

    6600XT is essentially the same performance as 5700XT. The smaller bus (50%) and die (94%) along with 7% more transistors gives this performance despite using 71% of the power. Infinity Cache is somewhere in the region of 2% of the transistors if we use the 6 transistor per bit rule of thumb and add a bit for supporting hardware.

    We could conclude that the narrower memory bus is allowing Navi 2x to spend more power on die for actual graphics work and that the power consumption of Infinity Cache is practically negligible, since most of those transistors are "idle" at any one time.

    But Navi 3x won't get this Infinity Cache boost. So we come back to 128MB, on its own, as determining the performance of Navi 33.

    I think it's reasonable to expect Navi 33 to be about 320-350mm². This article about TSMC 6nm:

    TSMC Reveals 6 nm Process Technology: 7 nm with Higher Transistor Density (anandtech.com)

    implies 15% more transistors. I suppose that puts it in the region of 21-23B transistors assuming that some of those extra transistors would be available because of a reduction in GDDR PHY size from 192-bit to 128-bit (saves about 16mm²?). That's about 5B short of the transistor count of Navi 21 (6900XT)

    So the extra transistors available for GPU work, versus Navi 22 (6700XT), is about 5B. So that's not going to take Navi 33 to the equivalent of 64 CUs - only about 52. So that implies much higher clocks, e.g. 3.2GHz+.

    I do expect the ALU:TMU ratio to get doubled in Navi 3x, but that's not going to save a huge amount of transistors. 2% of an "equivalent-CU" saving?

    It seems reasonable that ALU:RA will get halved or quartered in Navi 3x, but I wouldn't be surprised if that eats up all the savings of reduced TMU count. I'm assuming RA math doesn't use TMU math units, merely that data-paths and scheduling (queuing and coalescing memory operations) are largely common. If the TMU math units have been generalised to allow them to also do ray-box and ray-triangle tests, then I guess texturing throughput will continue rising to beyond-crazy levels, just to advance ray-acceleration throughput.

    The big unknown is still the hints of a radically different WGP design. AMD's trend with RDNA has been to increase the quantity of "uncore" per WGP, so I think that would put a strong limit on the increase in "equivalent-CU" count.

    In other words I think brute clocks are going to be more significant than brute SIMD count, in getting to 6900XT performance at 350mm² or less. Extra uncore may well unlock yet better "SIMD IPC".

    We may also see that ">6900XT" actually only refers to ray-tracing performance. In pure rasterisation workloads Navi 33 might fall far short of 6900XT when ALU-limited.
     
    Lightman likes this.
  10. LordEC911

    LordEC911 Regular

    I find it hard to believe that they were cut down the bus on N32.


    So what is the minimum price now? $199 or $299?
    We have obviously seen the last of <$150 DGPUs that are current generation and in production.
     
  11. Bondrewd

    Bondrewd Veteran

    ?
    Should be like $450 or so.
    A wee bit more.
    ?
    The ALU count is identical to N21, which is 5120.
     
  12. I'm only assuming N3x is Q2 2022 because @Bondrewd claimed AMD is on a 6 quarter cadence between GPU families. I don't follow random rumors from wccftech.



    Going by AMD's famous graph of cache usage by target resolution, it does look like 256MB would fit some >75% of memory requests at 4K.

    If 75% of the requests are done at ~2TB/s, then the remaining 25% can probably come at 256GB/s because the effective bandwidth will still end at ~1.6TB/s.
     
    Last edited by a moderator: Sep 19, 2021
    Lightman likes this.
  13. no-X

    no-X Veteran

    One thing may be targeted candence, the other one is real situation affected by Covid, overload of TSMC, mining, etc. That could easily cause 4 months delay (Navi 23 was delayed by several months already).
     
  14. Then unless nvidia is getting special treatment by TSMC they're getting Lovelace pushed to 2023.
     
    Lightman likes this.
  15. Qesa

    Qesa Newcomer

    If there's a 25% miss rate and DRAM is 256 GB/s, the effective bandwidth can be no higher than 1 TB/s, no matter how fast the bandwidth is on hits.

    If the cache is serving up 2 TB/s from hits and 0.25 TB/s from misses, by definition that's an 89% hit rate.
     
  16. trinibwoy

    trinibwoy Meh Legend

    128-bit gddr6 > 6900xt. That would be some impressive voodoo if it’s true at higher resolutions.
     
    DegustatoR and Lightman like this.
  17. Bondrewd

    Bondrewd Veteran

    Still has the N21 limitation of sorta dies at 4k.

    Either way you're not running higher resolutions off an 8GB framebuffer.
     
    trinibwoy likes this.
  18. trinibwoy

    trinibwoy Meh Legend

    Makes sense.
     
  19. Jawed

    Jawed Legend

    Sigh, when I was writing that I felt something was wrong, but couldn't put my finger on it. Bed resolved the problem: I hadn't accounted for bytes! So 8x 2%. ARGH.

    Well, we can subtract 32mm² off for 128-bit GDDR6 and save ~15% area to translate 520mm² on 7nm to about 424mm².

    Perhaps the "new" WGP arrangement means there's only two shader engines, not four. This would reduce the count of ROPS, e.g. to 64. That would save a fair amount of die space... I reckon 32 ROPs are about the same area as a WGP, so just 64 ROPs saves in the region of 10mm². Perhaps 32mm² total saving with only two shader engines?

    So (520-64)/1.15 takes us to 396mm² (ignoring the non-scaling of 128-bit GDDR6).

    If this is really an 8GB card then it seems as if it would need to be positioned as a 1080p card, "7600XT".

    I think power consumption is actually a bigger problem than performance, if this is really a 1080p card and around 150W.
     
    Lightman likes this.
  20. Bondrewd

    Bondrewd Veteran

    Yep!
    Sorta-kinda there.
    Unfortunately yes, clamshells are wildly impractical for anything resembling a mainstream GPU and 24Gb DRAMs are DDR5 only for now.
     
Loading...

Share This Page

Loading...