AMD: RDNA 3 Speculation, Rumours and Discussion

trinibwoy · Jul 21, 2022

Kaotik said:
MI250X is treated as 2 completely separate GPUs, isn't it?

Yep, we have yet to see a chiplet GPU in action. The MI250X is just two GPUs in the same socket.

T2098 · Jul 22, 2022

vjPiedPiper said:
I wonder how much variability there is in MCD dies, vs GCD dies? Especially in terms of what speed they will run at.
if there is a lot of variability in MCD dies, then they can bin a lot of those MCD's very well.

Eg. Bundle the slower ones for slower cards with slower GDDR needs.
I'm still hoping we end up getting a monster config with multiple GCD as well.

It seems kinds pointless to go for such a small GCD die, if they are moving to a chiplet style arch.
Not taking advantage of being able to make much bigger compute dies seems silly....

I'm not sure if Navi3x gen is the GPU version of zen1 or zen2!

It does seem a bit odd to make them so small, but I guess they're limited by their placement options and which sides of the die the interconnects are on, as well as minimizing the area of the interposer or number of multiple EMIBs/similar technology to connect them.

Assuming you weren't packaging-constrained, 3x hypothetical 80mm2 128-bit MCDs, one on each of the 3 edges of the compute die would seem to make a lot more sense to me. It would still be easy to scale up and down as needed, lower end products can just have 2 for a 256-bit bus and if you really needed a chiplet SKU that had a 320-bit or 192-bit bus then you've got a place for your half-broken harvested ones to be used. You'd likely be saving some money on the chiplet end, but spending more on the packaging end in that case; I'm sure AMD's engineers already made the cost-benefit analysis of each approach and came to the proper conclusion.

vjPiedPiper · Jul 22, 2022

T2098 said:
It does seem a bit odd to make them so small, but I guess they're limited by their placement options and which sides of the die the interconnects are on, as well as minimizing the area of the interposer or number of multiple EMIBs/similar technology to connect them.

Assuming you weren't packaging-constrained, 3x hypothetical 80mm2 128-bit MCDs, one on each of the 3 edges of the compute die would seem to make a lot more sense to me. It would still be easy to scale up and down as needed, lower end products can just have 2 for a 256-bit bus and if you really needed a chiplet SKU that had a 320-bit or 192-bit bus then you've got a place for your half-broken harvested ones to be used. You'd likely be saving some money on the chiplet end, but spending more on the packaging end in that case; I'm sure AMD's engineers already made the cost-benefit analysis of each approach and came to the proper conclusion.

I was gonna respond saying your all wrong, but after giving it some more thought, I think you might be correct in that the underlying substrate, eg. the interposer might actually be a limiting factor here. Whilst they can dictate the size and shape of the "socket" for lack of a better term, it simply may not be feasible to use a much larger interposer underneath all the various chips.

You raise some interesting points about the 6 x 64bit controllers, vs 3 x 128bit.
Although the 6 x 64 setup does give them more flexibility going forward, and is probably a more natural match for the 64bit wide nature of GDDRx
(yeah i know GDDR5 is actually 2 x 32 internally, or is that DDR5 only? )
I'm not sure how well harvested MCD dies would work,
so after all that maybe 6 x 64 MCD's is the smarter move.

After reading the initial article they do mention the possibility of a larger GCD in the future, so perhaps i was too quick to dismiss that.

Kaotik · Jul 22, 2022

vjPiedPiper said:
I was gonna respond saying your all wrong, but after giving it some more thought, I think you might be correct in that the underlying substrate, eg. the interposer might actually be a limiting factor here. Whilst they can dictate the size and shape of the "socket" for lack of a better term, it simply may not be feasible to use a much larger interposer underneath all the various chips.

You raise some interesting points about the 6 x 64bit controllers, vs 3 x 128bit.
Although the 6 x 64 setup does give them more flexibility going forward, and is probably a more natural match for the 64bit wide nature of GDDRx
(yeah i know GDDR5 is actually 2 x 32 internally, or is that DDR5 only? )
I'm not sure how well harvested MCD dies would work,
so after all that maybe 6 x 64 MCD's is the smarter move.

After reading the initial article they do mention the possibility of a larger GCD in the future, so perhaps i was too quick to dismiss that.

AMD has already several more advanced 'interposer' technologies in their hands than huge interposer covering all chips

TopSpoiler · Jul 22, 2022

xpea said:
I also don't understand why AMD didn't go for the kill (if rumors are right). You know that practical limit on N5/4 is around 600mm2 for a gaming GPU, before cost explodes in datacenter territory. So why don't make a 600mm2 GDC and 6*40mm2 MCD (840mm2 total) and be sure to land well above AD102 ? In other words, take significantly the performance crown and charge whatever you want for it. In marketing, the rule is simple, people pay whatever for the best, even when it has horrible value for money. It has been Nvidia motto forever and it worked very well... So why AMD?

AMD has the dilemma that demand for their large (RDNA) chips is very low in the consumer and business markets. Conversely, for Nvidia, there is a high demand for ML and professional graphics, where a large chip can help a lot. I understand AMD's intention to keep their flagship GPUs as small as possible to save wafers. It's more reasonable that focusing on more 'guaranteed and easy' markets like CPU, APU and semi-customs.

Jawed · Jul 24, 2022

Qesa said:
and extra IO area for inter-chiplet communication

The extra I/O area for Zen 3 chiplets to communicate with 3D V Cache is so small, everyone missed it until AMD mentioned it.

Meanwhile, I have no idea why AMD would produce a chiplet based GPU where the cheap silicon, 6nm, appears as multiple small dies for I/O and cache and the expensive silicon, 5nm, appears as the single biggest die, by far, for compute.

Imagine AMD making a Zen processor where there's multiple I/O dies and a single core complex die?...

That's how ridiculous the rumours currently seem to me.

Further, AMD had two distinct designs of Zen I/O chiplets, one for consumer and the other for server. The server design has more connectivity options (more PCI Express lanes, more DDR channels).

Why wouldn't AMD follow the Zen pattern when making a chiplet based GPU?

TESKATLIPOKA · Jul 24, 2022

Jawed said:
The extra I/O area for Zen 3 chiplets to communicate with 3D V Cache is so small, everyone missed it until AMD mentioned it.

Meanwhile, I have no idea why AMD would produce a chiplet based GPU where the cheap silicon, 6nm, appears as multiple small dies for I/O and cache and the expensive silicon, 5nm, appears as the single biggest die, by far, for compute.

Imagine AMD making a Zen processor where there's multiple I/O dies and a single core complex die?...

That's how ridiculous the rumours currently seem to me.

Further, AMD had two distinct designs of Zen I/O chiplets, one for consumer and the other for server. The server design has more connectivity options (more PCI Express lanes, more DDR channels).

Why wouldn't AMD follow the Zen pattern when making a chiplet based GPU?

And what's so wrong or ridiculous about It?
The config so far looks like this.
You will have one 5nm N31 GCD paired with 6* IO chiplets (384bit, 192MB IC, 24GB).
A cutdown N31 GCD will be paired with 5* IO chiplets (320bit, 160MB IC, 20GB).
5nm N32 GCD will be paired with 4* IO chiplets (256bit, 128MB IC, 16GB).
A cutdown N32 GCD will be paired with 3* IO chiplets (192bit, 96MB IC, 12GB).
N33 a 6nm monolith with 128bit, 64MB and 8GB.
This way you save cost by using 6nm process, and you use as much chiplets as It is needed for the given product, the only thing you deactivate is a part of GCD, for the weaker model.

I don't understand your comparison to Zen.
Zen3 doesn't need more IO, which doesn't even have any cache. You have one CCD with 8C16T+Cache, why would they design another CCD with 16C24T when It's enough to put 2* CCD's to have a 16core CPU.
If you wonder why N31 is not a pair of N32 GCD, then ok, but that also has to have a reason. Maybe a problem with scaling or something like that, who knows. BTW AMD also design CDNA based GPUs.

Kaotik · Jul 24, 2022

TESKATLIPOKA said:
BTW AMD also design CDNA based GPUs.

They're still in monolith phase there too (sure there's now two of them in one packaging, but they're still treated as separate GPUs)

DavidGraham · Jul 24, 2022

trinibwoy said:
Yep, we have yet to see a chiplet GPU

There is the M1 Ultra from Apple, it failed to impresss GPU wise, even when working with the Metal API, scaling issues is rampant even in compute workloads, bad fps pacing in gaming is also an issue.

troyan · Jul 24, 2022

trinibwoy said:
Yep, we have yet to see a chiplet GPU in action. The MI250X is just two GPUs in the same socket.

A100? It has a sperated L2 Cache connected through an interconnect.

Jawed · Jul 24, 2022

TESKATLIPOKA said:
I don't understand your comparison to Zen.
Zen3 doesn't need more IO

Apparently you missed the bit where I talked about consumer and server variants: where the compute chiplets are all the same and it's the count of compute chiplets plus the size of the I/O die that varies according to SKU.

The compute chiplets are expensive, small and on the most advanced node.

The I/O chiplet is cheap, large and on a legacy node.

Kaotik · Jul 24, 2022

Jawed said:
Apparently you missed the bit where I talked about consumer and server variants: where the compute chiplets are all the same and it's the count of compute chiplets plus the size of the I/O die that varies according to SKU.

The compute chiplets are expensive, small and on the most advanced node.

The I/O chiplet is cheap, large and on a legacy node.

That's the ideal, but not necessarily feasible at this time. For Radeon & Instinct lines the compute chiplets can't* remain same though, unlike Ryzen & Epyc

*can but won't, they're different enough already and both lines seem to do fine enough

PSman1700 · Jul 24, 2022

DavidGraham said:
There is the M1 Ultra from Apple, it failed to impresss GPU wise, even when working with the Metal API, scaling issues is rampant even in compute workloads, bad fps pacing in gaming is also an issue.

CPU isn't outperforming anything else either, atleast not without media encoders/prores stuff etc.

DegustatoR · Jul 24, 2022

Jawed said:
The extra I/O area for Zen 3 chiplets to communicate with 3D V Cache is so small, everyone missed it until AMD mentioned it.

Meanwhile, I have no idea why AMD would produce a chiplet based GPU where the cheap silicon, 6nm, appears as multiple small dies for I/O and cache and the expensive silicon, 5nm, appears as the single biggest die, by far, for compute.

Imagine AMD making a Zen processor where there's multiple I/O dies and a single core complex die?...

That's how ridiculous the rumours currently seem to me.

Further, AMD had two distinct designs of Zen I/O chiplets, one for consumer and the other for server. The server design has more connectivity options (more PCI Express lanes, more DDR channels).

Why wouldn't AMD follow the Zen pattern when making a chiplet based GPU?

The most obvious reason is that you can actually scale MCDs rather easily (remains to be seen what such physical implementation would mean for L2 coherency though) while scaling GCDs would be pretty hard, above 2 especially I'd wager.
With CPUs it is rather easy to scale CCDs while scaling IODs is more complex.
All in all it looks like an easy way of making die costs smaller, not improving performance per se. In this the approach is very similar to Zen's.

Qesa · Jul 24, 2022

Jawed said:
The extra I/O area for Zen 3 chiplets to communicate with 3D V Cache is so small, everyone missed it until AMD mentioned it.

3D V cache sits on top of the compute die. With a memory controller, RDNA's cache/IMC dies would need to sit underneath the compute die, which would also necessitate passing through power and other connections. There is also a massive difference in the bandwidth requirements.

Jawed said:
Meanwhile, I have no idea why AMD would produce a chiplet based GPU where the cheap silicon, 6nm, appears as multiple small dies for I/O and cache and the expensive silicon, 5nm, appears as the single biggest die, by far, for compute.

Probably because it's really bloody hard to get multiple GCDs to work nicely with one another. CPUs already assume that cores are independent, and NUMA and NUCA are already solved problems, so scaling out multiple compute chiplets isn't really a problem. The same is not true of graphics workloads where a single kernel might be running across the entire chip.

Jawed · Jul 24, 2022

DegustatoR said:
The most obvious reason is that you can actually scale MCDs rather easily (remains to be seen what such physical implementation would mean for L2 coherency though) while scaling GCDs would be pretty hard, above 2 especially I'd wager.
With CPUs it is rather easy to scale CCDs while scaling IODs is more complex.
All in all it looks like an easy way of making die costs smaller, not improving performance per se. In this the approach is very similar to Zen's.

What is the point of scaling MCDs? How does that save money?

Kaotik · Jul 24, 2022

Qesa said:
3D V cache sits on top of the compute die. With a memory controller, RDNA's cache/IMC dies would need to sit underneath the compute die, which would also necessitate passing through power and other connections. There is also a massive difference in the bandwidth requirements.

Probably because it's really bloody hard to get multiple GCDs to work nicely with one another. CPUs already assume that cores are independent, and NUMA and NUCA are already solved problems, so scaling out multiple compute chiplets isn't really a problem. The same is not true of graphics workloads where a single kernel might be running across the entire chip.

It's not about i/o dies or computes dies or whatnot dies for 3d v-cache, it's about where your cache is. If we assume Infinity Cache is in memory dies, that's where the 3D V-Cache will be too (assuming of course they use 3D V-Cache on GPUs in the first place)

DegustatoR · Jul 24, 2022

Jawed said:
What is the point of scaling MCDs? How does that save money?

You get 200-300mm^2 main dies and X number of small MCDs instead of two big 400-600 mm^2 dies. Defects ratio is lower and you get more dies from a wafer.

Jawed · Jul 24, 2022

DegustatoR said:
You get 200-300mm^2 main dies and X number of small MCDs instead of two big 400-600 mm^2 dies. Defects ratio is lower and you get more dies from a wafer.

Defect rates on 6/7nm are vanishingly low and MC/PHY/Cache defect sensitivity is also extremely low. So small dies for these functions provide no advantage.

Seanspeed · Jul 24, 2022

Qesa said:
3D V cache sits on top of the compute die. With a memory controller, RDNA's cache/IMC dies would need to sit underneath the compute die, which would also necessitate passing through power and other connections.

I mean, that does seem to be what AMD are planning on doing with CDNA3. It's actually what I originally expected RDNA3 to be, but I guess AMD are just not quite there yet.

AMD: RDNA 3 Speculation, Rumours and Discussion

trinibwoy

Meh

T2098

vjPiedPiper

Kaotik

Drunk Member

TopSpoiler

Jawed

TESKATLIPOKA

Kaotik

Drunk Member

DavidGraham

troyan

Jawed

Kaotik

Drunk Member

PSman1700

DegustatoR

Qesa

Jawed

Kaotik

Drunk Member

DegustatoR

Jawed

Seanspeed

Similar threads