AMD: RDNA 3 Speculation, Rumours and Discussion

Discussion in 'Architecture and Products' started by Jawed, Oct 28, 2020.

  1. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,286
    Likes Received:
    1,551
    Location:
    London
    So RDNA 2 has no chiplets. What about RDNA 3?

    If it's based on chiplets, will there be Infinity Cache?

    I've been theorising chiplets for a very long time. I don't want to be disappointed this time!
     
    no-X, Lightman and BRiT like this.
  2. Ethatron

    Regular Subscriber

    Joined:
    Jan 24, 2010
    Messages:
    922
    Likes Received:
    361
    128MB is identical to ThreadRipper's L3, make of it what you want. ;)
    That there is now a actually used (Smart Memory) IF link on the GPU must mean something for the future I guess.
     
    Lightman and BRiT like this.
  3. ethernity

    Newcomer

    Joined:
    May 1, 2018
    Messages:
    88
    Likes Received:
    207
    I will help you with the roadmap. Looks like ~H1 2022 RDNA3 would be coming.
    This time with a new node too.

    upload_2020-10-28_19-41-48.png
     
    Cyan and Jawed like this.
  4. pTmdfx

    Regular Newcomer

    Joined:
    May 27, 2014
    Messages:
    382
    Likes Received:
    345
    Previously on rumor mill: GCD + MCD. AMD also touted "X3D packaging" before.

    What if an MCD is a base die of more Infinity Cache with stacked memory above it? Then multiple MCDs are connected via next-gen on-package Infinity Fabric I/O to the monolithic GCD, which has now more space spared to pack even more CUs.

    Then the MCD can be reused for different GPU compute dies across the stack. Mobile APUs too, pretty please?

    :runaway:
     
    #4 pTmdfx, Oct 28, 2020
    Last edited: Oct 28, 2020
    Jawed likes this.
  5. fehu

    Veteran Regular

    Joined:
    Nov 15, 2006
    Messages:
    1,977
    Likes Received:
    935
    Location:
    Somewhere over the ocean
    If I've understood correctly, she implied that rdna3 will be preceded by a node shrink of the 2, so maybe late 2022?
     
  6. Esrever

    Regular Newcomer

    Joined:
    Feb 6, 2013
    Messages:
    822
    Likes Received:
    616
    With how much cache they are using at some point they are better off using dram. Wonder if we will ever get EDRAM caches like IBM does. They could probably fit something like 1GB of cache on next gen if they used EDRAM.
     
  7. SimBy

    Regular Newcomer

    Joined:
    Jun 21, 2008
    Messages:
    700
    Likes Received:
    391
    I'm not sure if anyone explained why they went with 128MB? Is that the sweet spot? Is more actually mo better? Also I'm guessing SRAM shrinks pretty damn well with node shrink, much better than memory interface. So pretty damn forward looking too.
     
  8. hkultala

    Regular

    Joined:
    May 22, 2002
    Messages:
    296
    Likes Received:
    37
    Location:
    Herwood, Tampere, Finland
    No.

    eDRAM does not scale well with manufacturing technology getting smaller, and new CMOS logic mfg processes do not support it at all.

    eDRAM is a thing of the past. IBM will also not be using it in Power10.
     
    Lightman, Jawed, Erinyes and 3 others like this.
  9. hkultala

    Regular

    Joined:
    May 22, 2002
    Messages:
    296
    Likes Received:
    37
    Location:
    Herwood, Tampere, Finland
    No "chiplets", unless they move into 3D packaging with the memory controller/IO die below and GPU die above.

    There is no (longer a) good way of splitting GPU onto multiple dies. All parts of the GPU need very high bandwidth to the memory and/or other parts of the GPU (much higher bandwidth than CPUs need).

    The required number of wires between the dies could not be (reasonably/cost-efficiently) made with similar packaging technologies than what they use in Ryzen and EPYC processors. And using an interposer like Fury and Vega do is also quite expensive.

    And AFAIK AMD does not have access to any EMIB-like packaging technology.

    But even if they would move the memory controller, other IO and the infinity cache to another die, below the main die, they would have a dilemma that would mfg tech to use for that chip:

    eDRAM does not work at all on new mfg processes.
    SRAM wants to be made with as new mfg tech as possibly to be dense.
    PHYs want to be made on old mfg tech to be cheaper, as they do not scale well.

    Ok, theoretically there could be the option of using a very old process for the IO die and eDRAM, but that would be then being stuck with obsolete tech.
     
    #9 hkultala, Oct 28, 2020
    Last edited: Oct 28, 2020
  10. pTmdfx

    Regular Newcomer

    Joined:
    May 27, 2014
    Messages:
    382
    Likes Received:
    345
    TSMC offers Local SI interconnect, which seems to be an EMIB competitor.
     
    Lightman and Alexko like this.
  11. LordEC911

    Regular

    Joined:
    Nov 25, 2007
    Messages:
    872
    Likes Received:
    204
    Location:
    'Zona
    Scales with the 4SE or with the 8 32bit IMCs would be my guess.
     
    Lightman likes this.
  12. Cyan

    Cyan orange
    Legend Veteran

    Joined:
    Apr 24, 2007
    Messages:
    9,374
    Likes Received:
    3,053
    shall be following this thread but looks like my next GPU is going to be an AMD one, it's about time. I am in saving mode already.
     
    Lightman likes this.
  13. vjPiedPiper

    Newcomer

    Joined:
    Nov 23, 2005
    Messages:
    118
    Likes Received:
    72
    Location:
    Melbourne Aus.
    As soon as i saw the 128MB Infinity cache, i thought it would be natural precursor to a chiplet like GPU arch.
    They claim, 1.66GBs effective bandwidth for 128MB, so i think that a chiplet with at least 128MB infinity cache, connected to a central IO die, which is mostly the DDR controller, Video enc/dec block, video output, and a bit of control / management stuff would work quite well. make the CPU chiplet -> IO die infinity fabric v3, and your done.

    Make each chiplet 40CU's + 128MB, and you can easily scale any design from 40, up to 160 CU's.
    And at 40CU per 128MB you get way more out of cache locality.

    but i'm not exactly a GPU expert, so there is probably a LOT i am missing here...
     
  14. Frenetic Pony

    Regular Newcomer

    Joined:
    Nov 12, 2011
    Messages:
    697
    Likes Received:
    382
    6nm refresh next year? They've got room to play with higher TDPs thanks to Nvidia's craziness.
     
  15. yuri

    Regular Newcomer

    Joined:
    Jun 2, 2010
    Messages:
    272
    Likes Received:
    280
    A shrink is quite probable given the MacOS leak suggests Navi 31 and Navi 21 configurations are identical.
     
    Lightman and SimBy like this.
  16. hkultala

    Regular

    Joined:
    May 22, 2002
    Messages:
    296
    Likes Received:
    37
    Location:
    Herwood, Tampere, Finland
    Yes, you are missing a lot here:

    1) Where to put the ROPs?
    2) The control things should be close enough to the cores.
    3) This kind of architecture would mean L3 caches near the dies and L4 cache on the memory controller die. But this would be VERY inefficient form the hit rate / total cache, because of multiple reasons:
    a) Multiple cores are operating same triangle or nrearby triangles will access the same area of the framebuffer, which wants to be in ONE cache, not multiple split ones.
    b) L4 cache which has similar size than L3 cache would be quite useless. Unless it's a victim cache, either you hit both or you hit neither.
    4) What manucactureign tech to use for the IO die? SRAM cache wants to be made on NEW dense mfg tech, PHYs with old cheap mfg tech. The big gain in zen2 matisse comes from the old mfg tech of the IO die, which does not make sense if you have SRAM L3 cache there.





    Moores law is going to exactly OTHER direction than towards "Chiplets" . We can afford to have MORE functionality in one die. MCMs were a good idea with Pentium Pro in 1995 and multiple chips were a good thing in Voodoo 1 and Voodoo2 in 1997 and 1998. Since them, Moores law has made them mostly obsolete.
     
    vjPiedPiper likes this.
  17. Erinyes

    Regular

    Joined:
    Mar 25, 2010
    Messages:
    808
    Likes Received:
    276
    It's also identical to the 128MB L4 in Crystalwell but I see no relevance of that to N21/31 really. The CPU-GPU IF link is possibly used for Smart Memory. Even otherwise, it will be more relevant for CDNA.
    I was curious about this even when they had first presented the roadmap but it's odd that they haven't mentioned the node, whereas for Zen 4 they explicitly say its on 5nm. So does that mean RDNA 3 is NOT on 5nm?
    Aside from the obvious die area concerns, more cache certainly does consume more power. So yeah I'm sure they arrived at what is likely a sweet spot in terms of PPA. It should be enough for the forseeable future.
    TSMC also has SoIC and a host of other packaging technologies all coming online 2021 and beyond. Interesting times for sure!
    6nm would be an easy die shrink as it has the same design rules as 7nm DUV (assuming that N21 is on N7 and not N7+) so that is definitely a possibility. Given that they've just introduced a whole host of new tech in Navi 2x, Navi 3x could be just a die shrink while they focus R&D towards the next gen. There's also the matter of 5nm yields taking time to reach sufficient levels to be used in large GPUs and possibly AMD deciding to use 5nm exclusively for CPUs initially.
     
  18. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,286
    Likes Received:
    1,551
    Location:
    London
    Hmm, a two year wait does seem likely... RDNA 2 is two years after RDNA was supposed to launch (and then the whole Vega 7 fiasco happened).

    Gulp, this could be very annoying: Navi 3x is refreshed RDNA 2. ARGH.
     
  19. giannhs

    Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    42
    Likes Received:
    46
    correct me if im wrong but cant they use that 3d mumbo jumbo of xillinx ?
     
  20. Leoneazzurro5

    Newcomer

    Joined:
    Aug 18, 2020
    Messages:
    226
    Likes Received:
    249
    They probably will, with time.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...