AMD: RDNA 3 Speculation, Rumours and Discussion

Status
Not open for further replies.

Jawed

Legend
So RDNA 2 has no chiplets. What about RDNA 3?

If it's based on chiplets, will there be Infinity Cache?

I've been theorising chiplets for a very long time. I don't want to be disappointed this time!
 
128MB is identical to ThreadRipper's L3, make of it what you want. ;)
That there is now a actually used (Smart Memory) IF link on the GPU must mean something for the future I guess.
 
So RDNA 2 has no chiplets. What about RDNA 3?

If it's based on chiplets, will there be Infinity Cache?

I've been theorising chiplets for a very long time. I don't want to be disappointed this time!

I will help you with the roadmap. Looks like ~H1 2022 RDNA3 would be coming.
This time with a new node too.

upload_2020-10-28_19-41-48.png
 
Previously on rumor mill: GCD + MCD. AMD also touted "X3D packaging" before.

What if an MCD is a base die of more Infinity Cache with stacked memory above it? Then multiple MCDs are connected via next-gen on-package Infinity Fabric I/O to the monolithic GCD, which has now more space spared to pack even more CUs.

Then the MCD can be reused for different GPU compute dies across the stack. Mobile APUs too, pretty please?

:runaway:
 
Last edited:
If I've understood correctly, she implied that rdna3 will be preceded by a node shrink of the 2, so maybe late 2022?
 
With how much cache they are using at some point they are better off using dram. Wonder if we will ever get EDRAM caches like IBM does. They could probably fit something like 1GB of cache on next gen if they used EDRAM.
 
With how much cache they are using at some point they are better off using dram. Wonder if we will ever get EDRAM caches like IBM does. They could probably fit something like 1GB of cache on next gen if they used EDRAM.

I'm not sure if anyone explained why they went with 128MB? Is that the sweet spot? Is more actually mo better? Also I'm guessing SRAM shrinks pretty damn well with node shrink, much better than memory interface. So pretty damn forward looking too.
 
With how much cache they are using at some point they are better off using dram. Wonder if we will ever get EDRAM caches like IBM does. They could probably fit something like 1GB of cache on next gen if they used EDRAM.

No.

eDRAM does not scale well with manufacturing technology getting smaller, and new CMOS logic mfg processes do not support it at all.

eDRAM is a thing of the past. IBM will also not be using it in Power10.
 
So RDNA 2 has no chiplets. What about RDNA 3?

If it's based on chiplets, will there be Infinity Cache?

I've been theorising chiplets for a very long time. I don't want to be disappointed this time!

No "chiplets", unless they move into 3D packaging with the memory controller/IO die below and GPU die above.

There is no (longer a) good way of splitting GPU onto multiple dies. All parts of the GPU need very high bandwidth to the memory and/or other parts of the GPU (much higher bandwidth than CPUs need).

The required number of wires between the dies could not be (reasonably/cost-efficiently) made with similar packaging technologies than what they use in Ryzen and EPYC processors. And using an interposer like Fury and Vega do is also quite expensive.

And AFAIK AMD does not have access to any EMIB-like packaging technology.

But even if they would move the memory controller, other IO and the infinity cache to another die, below the main die, they would have a dilemma that would mfg tech to use for that chip:

eDRAM does not work at all on new mfg processes.
SRAM wants to be made with as new mfg tech as possibly to be dense.
PHYs want to be made on old mfg tech to be cheaper, as they do not scale well.

Ok, theoretically there could be the option of using a very old process for the IO die and eDRAM, but that would be then being stuck with obsolete tech.
 
Last edited:
I'm not sure if anyone explained why they went with 128MB? Is that the sweet spot? Is more actually mo better? Also I'm guessing SRAM shrinks pretty damn well with node shrink, much better than memory interface. So pretty damn forward looking too.

Scales with the 4SE or with the 8 32bit IMCs would be my guess.
 
shall be following this thread but looks like my next GPU is going to be an AMD one, it's about time. I am in saving mode already.
 
As soon as i saw the 128MB Infinity cache, i thought it would be natural precursor to a chiplet like GPU arch.
They claim, 1.66GBs effective bandwidth for 128MB, so i think that a chiplet with at least 128MB infinity cache, connected to a central IO die, which is mostly the DDR controller, Video enc/dec block, video output, and a bit of control / management stuff would work quite well. make the CPU chiplet -> IO die infinity fabric v3, and your done.

Make each chiplet 40CU's + 128MB, and you can easily scale any design from 40, up to 160 CU's.
And at 40CU per 128MB you get way more out of cache locality.

but i'm not exactly a GPU expert, so there is probably a LOT i am missing here...
 
As soon as i saw the 128MB Infinity cache, i thought it would be natural precursor to a chiplet like GPU arch.
They claim, 1.66GBs effective bandwidth for 128MB, so i think that a chiplet with at least 128MB infinity cache, connected to a central IO die, which is mostly the DDR controller, Video enc/dec block, video output, and a bit of control / management stuff would work quite well. make the CPU chiplet -> IO die infinity fabric v3, and your done.

Make each chiplet 40CU's + 128MB, and you can easily scale any design from 40, up to 160 CU's.
And at 40CU per 128MB you get way more out of cache locality.

but i'm not exactly a GPU expert, so there is probably a LOT i am missing here...

Yes, you are missing a lot here:

1) Where to put the ROPs?
2) The control things should be close enough to the cores.
3) This kind of architecture would mean L3 caches near the dies and L4 cache on the memory controller die. But this would be VERY inefficient form the hit rate / total cache, because of multiple reasons:
a) Multiple cores are operating same triangle or nrearby triangles will access the same area of the framebuffer, which wants to be in ONE cache, not multiple split ones.
b) L4 cache which has similar size than L3 cache would be quite useless. Unless it's a victim cache, either you hit both or you hit neither.
4) What manucactureign tech to use for the IO die? SRAM cache wants to be made on NEW dense mfg tech, PHYs with old cheap mfg tech. The big gain in zen2 matisse comes from the old mfg tech of the IO die, which does not make sense if you have SRAM L3 cache there.





Moores law is going to exactly OTHER direction than towards "Chiplets" . We can afford to have MORE functionality in one die. MCMs were a good idea with Pentium Pro in 1995 and multiple chips were a good thing in Voodoo 1 and Voodoo2 in 1997 and 1998. Since them, Moores law has made them mostly obsolete.
 
128MB is identical to ThreadRipper's L3, make of it what you want. ;)
That there is now a actually used (Smart Memory) IF link on the GPU must mean something for the future I guess.

It's also identical to the 128MB L4 in Crystalwell but I see no relevance of that to N21/31 really. The CPU-GPU IF link is possibly used for Smart Memory. Even otherwise, it will be more relevant for CDNA.
I will help you with the roadmap. Looks like ~H1 2022 RDNA3 would be coming.
This time with a new node too.

View attachment 4834

I was curious about this even when they had first presented the roadmap but it's odd that they haven't mentioned the node, whereas for Zen 4 they explicitly say its on 5nm. So does that mean RDNA 3 is NOT on 5nm?
I'm not sure if anyone explained why they went with 128MB? Is that the sweet spot? Is more actually mo better? Also I'm guessing SRAM shrinks pretty damn well with node shrink, much better than memory interface. So pretty damn forward looking too.

Aside from the obvious die area concerns, more cache certainly does consume more power. So yeah I'm sure they arrived at what is likely a sweet spot in terms of PPA. It should be enough for the forseeable future.
TSMC offers Local SI interconnect, which seems to be an EMIB competitor.

TSMC also has SoIC and a host of other packaging technologies all coming online 2021 and beyond. Interesting times for sure!
6nm refresh next year? They've got room to play with higher TDPs thanks to Nvidia's craziness.

6nm would be an easy die shrink as it has the same design rules as 7nm DUV (assuming that N21 is on N7 and not N7+) so that is definitely a possibility. Given that they've just introduced a whole host of new tech in Navi 2x, Navi 3x could be just a die shrink while they focus R&D towards the next gen. There's also the matter of 5nm yields taking time to reach sufficient levels to be used in large GPUs and possibly AMD deciding to use 5nm exclusively for CPUs initially.
 
If I've understood correctly, she implied that rdna3 will be preceded by a node shrink of the 2, so maybe late 2022?
Hmm, a two year wait does seem likely... RDNA 2 is two years after RDNA was supposed to launch (and then the whole Vega 7 fiasco happened).

A shrink is quite probable given the MacOS leak suggests Navi 31 and Navi 21 configurations are identical.
Gulp, this could be very annoying: Navi 3x is refreshed RDNA 2. ARGH.
 
No "chiplets", unless they move into 3D packaging with the memory controller/IO die below and GPU die above.

There is no (longer a) good way of splitting GPU onto multiple dies. All parts of the GPU need very high bandwidth to the memory and/or other parts of the GPU (much higher bandwidth than CPUs need).

The required number of wires between the dies could not be (reasonably/cost-efficiently) made with similar packaging technologies than what they use in Ryzen and EPYC processors. And using an interposer like Fury and Vega do is also quite expensive.

And AFAIK AMD does not have access to any EMIB-like packaging technology.

But even if they would move the memory controller, other IO and the infinity cache to another die, below the main die, they would have a dilemma that would mfg tech to use for that chip:

eDRAM does not work at all on new mfg processes.
SRAM wants to be made with as new mfg tech as possibly to be dense.
PHYs want to be made on old mfg tech to be cheaper, as they do not scale well.

Ok, theoretically there could be the option of using a very old process for the IO die and eDRAM, but that would be then being stuck with obsolete tech.
correct me if im wrong but cant they use that 3d mumbo jumbo of xillinx ?
 
Status
Not open for further replies.
Back
Top