AMD: RDNA 3 Speculation, Rumours and Discussion

Status
Not open for further replies.
More like 30% denser. Or this is specific to AMDs SRAM implementation.
At IEDM 2019, the 5nm process was quoted to have 1.84x logic density improvement compared to 1.35x SRAM density improvement.

Since we only really have two mobile SoCs to go by when it comes to what this means for actual products, their improvement in overall transistor density was roughly 50%. What this means for HP designs is not clear.
 
GCDs not having IMCs is pretty obvious given that's the whole point of chiplets to put the badly scaling analog blocks on the larger cheaper process. Furthermore it would be a clusterfuck to have the L3 on the MCD and then have traffic from that interleave back through the GCDs to DRAM, it's utter nonsense.
 
GCDs not having IMCs is pretty obvious given that's the whole point of chiplets to put the badly scaling analog blocks on the larger cheaper process. Furthermore it would be a clusterfuck to have the L3 on the MCD and then have traffic from that interleave back through the GCDs to DRAM, it's utter nonsense.
Like the 3D V-cache you mean?
 
Like the 3D V-cache you mean?

In V-cache the cache is stacked on top of the oher cache, so the data path are the shortest possible. If the GCD had its own memory inteface and the stacked cache has to be positioned on the other side of the die like in your sketch, you will add a lot of distance for data paths that will limit frequency and add on power consumption.
 
Like the 3D V-cache you mean?
As mentioned, V-cache is an extension of the existing L3 - it's just additional banks and zero change to the data flow. That's not what's happening with the MCD. Incidentally I believe AMD will at some point move to a stacked giant L4 in the future, because it cannot be anything else but an L4 because it has to be centralised because of coherency.
 
Last edited:
We've discussed the active bridge chiplet before:

ACTIVE BRIDGE CHIPLET WITH INTEGRATED CACHE - ADVANCED MICRO DEVICES, INC. (freepatentsonline.com)

and we have seen other patent documents:

https://forum.beyond3d.com/posts/2212201/

that relate to distributed tasks and performing DMA operations across distributed processors and their respective PHYs.

MCD at around 300mm²:
It should be on MCD.
I imagine AMD would take the best of both worlds. N5P GCD for absolute logic density and performance and N6 MCD with HD/SRAM optimized libraries for lower cost per MB IC.
N5P SRAM density gain over N7/N6 is very mediocre.
512MB SRAM on N6 with optimized libraries would only be 280-300m2 (Figures estimated from wikichip data, behind paywall). On N5 hardly any better around 250+mm2 but much costlier.
But all those logic blocks can scale very high almost 1.48x with N5P (assuming AMD goes with N5P for GPUs else 1.85x on plain N5)
I suppose 2x GCD + 1x MCD would be closing in around 1000mm2 or maybe even more. Will cost a pretty penny.

will not have enough perimeter for 256-bit GDDR6 and all the other GPU IO and 2x 2TB/s (guess) L3 interfaces to each GCD.
 
Err, was basing the "high density sram" off: https://www.anandtech.com/show/15219/early-tsmc-5nm-test-chip-yields-80-hvm-coming-in-h1-2020

Was this wrong? I just assumed the calculations for sram density at 5nm were right, never bothered to check. But 128MB Sram from this is what, just over 20 mm^2 there, so... not huge?

N7 has a HD SRAM cell size of 27000 nm^2, so you're looking at a ~28% density improvement on paper.

Critically, this density is never even close to achieved IRL. With their zen 3 v-cache, AMD fit 64 MB of L3$ in a 36 mm^2 die. That's ~67000 nm^2 per bit, less than half the theoretical density. And, also from AMD, this was about twice the density of the L3 on the zen 3 CCD and RDNA2.
 
N7 has a HD SRAM cell size of 27000 nm^2, so you're looking at a ~28% density improvement on paper.

Critically, this density is never even close to achieved IRL. With their zen 3 v-cache, AMD fit 64 MB of L3$ in a 36 mm^2 die. That's ~67000 nm^2 per bit, less than half the theoretical density. And, also from AMD, this was about twice the density of the L3 on the zen 3 CCD and RDNA2.

So while TSMC provides standard reference libraries, actual implementation will be a question mark. Thus any estimation of SRAM size on RDNA3 is... kind of an open question.

Well, so much for early cost estimations then. Thanks for the info.
 
Greetings to all members of beyond3d forum.
.
Ehh, ballpark ~440mm^2 but it's also less mem than N22.
Feasiable for 450 buck.
So from your post I assume Navi 33 has only 128bit GDDR6 bus, right?
The question is who would want to buy Navi 33 for at least $450 with not even 12GB Vram next year? Even If It performs like RX 6900XT with only 8GB Vram It's a hard sell in my opinion.
 
Greetings to all members of beyond3d forum.
.

So from your post I assume Navi 33 has only 128bit GDDR6 bus, right?
The question is who would want to buy Navi 33 for at least $450 with not even 12GB Vram next year? Even If It performs like RX 6900XT with only 8GB Vram It's a hard sell in my opinion.
Make it 16 gigs then? GDDR6 supports clamshelling
 
Err, was basing the "high density sram" off: https://www.anandtech.com/show/15219/early-tsmc-5nm-test-chip-yields-80-hvm-coming-in-h1-2020

Was this wrong? I just assumed the calculations for sram density at 5nm were right, never bothered to check. But 128MB Sram from this is what, just over 20 mm^2 there, so... not huge?
I was going by this, which seems to be based on newer data:
https://en.wikichip.org/wiki/5_nm_lithography_process#N5
https://fuse.wikichip.org/news/3398/tsmc-details-5-nm/

And yes, this is reference implementation, that's why I asked, if this 15% figure maybe was based on AMD specific implementation.

edit: To which, funnily enough, I've not gotten an answer from you know who. Apparently not gotten cleared to leak juicy bits.
 
Last edited:
MCD at around 300mm²:

Probably bigger, It seems to be on N6 and not on N5 and if you integrate more things than only cache It could be over 400 mm^2

will not have enough perimeter for 256-bit GDDR6 and all the other GPU IO and 2x 2TB/s (guess) L3 interfaces to each GCD.

The whole purpose of stacking is increasing area density and interconnection bandwidth by using vertical connections, thus by not being limited by perimeter or such, that is, only the VRAM bus and I/O connections would be connected to the perimeter of the MCD, while the inter-GCD bandwidth would be achieved through the cache itself and the vertical interconnection paths
 
Status
Not open for further replies.
Back
Top