AMD: RDNA 3 Speculation, Rumours and Discussion

Status
Not open for further replies.
Plenty of opportunities for AMD to release a more mainstream GPU to replace the 5700 XT and compete with the RTX 3060.

This thing better be good...
Well that thing is coming and supposedly will compete with 3060 Ti (cut down model against 3060 then?), but it's unlikely related to this as we've seen several Navi2X-chip codenames and none are related to Nashira, summit or otherwise.
 
AMD apparently filed two provisional patents in 2019 for ML based chiplets
"CHIPLET APPROACH FOR COUPLING GPU WITH MACHINE LEARNING ACCELERATION AT HIGH POWER EFFICIENCY," filed on Jul. 22, 2019
"HIGH BW INTER-CONNECTED CHIPLETS AND GPU FOR HIGH PERFORMANCE GAMING AND MACHINE LEARNING WORKLOADS," filed on Jul. 22, 2019

Full patent application is this (within 12 months of Provisional)
Filed: July 20, 2020

20210026686 CHIPLET-INTEGRATED MACHINE LEARNING ACCELERATORS

Techniques for performing machine learning operations are provided. The techniques include configuring a first portion of a first chiplet as a cache; performing caching operations via the first portion; configuring at least a first sub-portion of the first portion of the chiplet as directly-accessible memory; and performing machine learning operations with the first sub-portion by a machine learning accelerator within the first chiplet.

upload_2021-1-28_12-15-46.png

Seems like a ML accelerator for gaming implemented in the Infinity Cache chiplets. Seems gaming oriented.

https://www.freepatentsonline.com/20210026686.pdf
 
In continuation to the patent post
The Memory in the chiplet is divided into two parts, one acting as LLC and another part use to sync between the CUs and the Accelerator

... the APD scheduler 136 is capable of scheduling shader programs for execution in the compute units 132 while also scheduling operations for execution on the cache/machine learning accelerator chiplets 404

the machine learning accelerators 502 are capable of, and sometimes do, perform machine learning operations such as matrix multiplications that consume data within the directly-accessible memory 508 of the same chiplet 404 and output results of the operations to the directly-accessible memory 508 of the same chiplet 404.

upload_2021-1-28_12-50-7.png
 
According to Paul at Redgamingtech, Navi 31 and 32 will use chiplets, Navi 33 will be a monolithic die. All coming in 2022.

Navi 41 is early, no real silicon yet, but progressing well.

 
According to Paul at Redgamingtech, Navi 31 and 32 will use chiplets, Navi 33 will be a monolithic die. All coming in 2022.

Navi 41 is early, no real silicon yet, but progressing well.


Like, I know the guy was right before. But isn't this the same leak from like a month ago? The one that doesn't make a lot of sense because why would anyone make a chiplet that big with that low of a yield if you could just cut it in half and see yields skyrocket, design costs plummet, and get whatever flexibility you want with binning. Besides which, they'd need to design multiple chiplets for this (rather than one and just reuse like with Zen), cut another 25% of power just to hit 360 watts for this "160cu" top end chip, and run a 512bit bus with 18gbps GDDR6 or HBM just to supply the thing.

I'm just going to go ahead and doubt this one a bit, at least until concrete information on how this is supposed to be supported at all emerges.
 
Like, I know the guy was right before. But isn't this the same leak from like a month ago? The one that doesn't make a lot of sense because why would anyone make a chiplet that big with that low of a yield if you could just cut it in half and see yields skyrocket, design costs plummet, and get whatever flexibility you want with binning. Besides which, they'd need to design multiple chiplets for this (rather than one and just reuse like with Zen), cut another 25% of power just to hit 360 watts for this "160cu" top end chip, and run a 512bit bus with 18gbps GDDR6 or HBM just to supply the thing.

I'm just going to go ahead and doubt this one a bit, at least until concrete information on how this is supposed to be supported at all emerges.

Why they'd need to make multiple chiplets? They could have an "80 CU" chiplet and one I/O scalable die, both with adequate cache amount on die, so they could go for 80-160 CU with two chiplets and one I/O, and 40-80 CUs with one chiplet and I/o with less RAM bus size, and monolithic for everything below.
 
Like, I know the guy was right before. But isn't this the same leak from like a month ago? The one that doesn't make a lot of sense because why would anyone make a chiplet that big with that low of a yield if you could just cut it in half and see yields skyrocket, design costs plummet, and get whatever flexibility you want with binning. Besides which, they'd need to design multiple chiplets for this (rather than one and just reuse like with Zen), cut another 25% of power just to hit 360 watts for this "160cu" top end chip, and run a 512bit bus with 18gbps GDDR6 or HBM just to supply the thing.

I'm just going to go ahead and doubt this one a bit, at least until concrete information on how this is supposed to be supported at all emerges.


I was thinking the same thing about the chiplet size. But then I thought, it's their first gpu chiplet design, maybe they will keep it simple ? Anyway, we'll see :)
 
GDDR bus size needs to scale with the count of graphics chiplets. An I/O chiplet providing GDDR doesn't do that.
 
GDDR bus size needs to scale with the count of graphics chiplets. An I/O chiplet providing GDDR doesn't do that.
Could an I/O chiplet be designed for the max bandwidth, then disable a proportion of the Infinity Fabric and GDDR channels for lesser designs?
 
Like, I know the guy was right before. But isn't this the same leak from like a month ago? The one that doesn't make a lot of sense because why would anyone make a chiplet that big…
7nm Navi 21 is ~520 mm². Let's say the IO die (PCIe/DP/HDMI/UVD/VCE/ETC) will reduce the die by ~60 mm² (just a quick guess). That's 460 mm². At 5 nm the resulting chiplet size could be around 255 mm². Maybe a bit bigger because of the interface for chiplet interconnection. Is it really that big? At the time of RV770 AMD called it "sweet spot".
 
Could an I/O chiplet be designed for the max bandwidth, then disable a proportion of the Infinity Fabric and GDDR channels for lesser designs?
Yes, but no.
edit:
To clarify, yes, it would be possible, but it would be beyond strange and stupid.
I/O wants to sit on the edge of the chip(let), scalable I/O die for GPU would need to be quite big just to accommodate wide enough bus for the high end GPUs, which would make it unpractical in anything lower end.
By having memory controller(s) in each compute chiplet the bus width would scale with GPU performance in sensible way.
 
Last edited:
a chiplet that big
Ugh broski it's 40 gfx11 WGPs on N5 without the fancy uncore even being there.
That's small.
At the time of RV770 AMD called it "sweet spot".
That was before costs exploded but yeah.
Could an I/O chiplet be designed for the max bandwidth, then disable a proportion of the Infinity Fabric and GDDR channels for lesser designs?
Pointless, instead they're throwing N6 tapeouts at the problem.
See: Genoa.
 
Ugh broski it's 40 gfx11 WGPs on N5 without the fancy uncore even being there.
That's small.

It would be around 160-240mm or so as a good guess. That's 2-3x the size of a zen 3 chiplet on a node that ideally shrinks things by almost half. But still, yields don't actually go up that much if you cut it in half. So you have a point.

What's more, just found TSMC's tiny sram bragging for their 5nm. 256mb cache is tiny tiny, 5mm. Suddenly I can see why AMD went with SRAM cache on RDNA2. It doesn't make a lot of sense at the moment, but as a future investment for 5nm it seems sensible.

Ok, bandwidth and power are still problems. They'd need to 18gbps+ on a 512bit bus, or HBM. They'll also need to increase architecture power efficiency again to get that huge chip in a reasonable tdp, as realistically the 5nm shrink won't be enough alone. Still, it all seems more reasonable now, and a potential monster of a chip, which would explain why Nvidia is urging on their "Ada" arch ASAP.

For future RDNA though I can see AMD going partially, or wholly, Samsung for their GPUs. They make more money on CPUs, so it'd make more sense for them to stick with the best foundry there even though supply from them is limited. But Samsung's gaafet is a transition that's coming one way or another. And it should put up a competitive fight against TSMC's 5nm and even their first 3nm, which looks to be such a disappointment for their customers that they immediately rushed out an announcement that they'd be transitioning to gate all around quickly after that as well.
 
Last edited:
Status
Not open for further replies.
Back
Top