AMD RDNA3 Specifications Discussion Thread

Bondrewd · Nov 7, 2022

Instead of idiotic area calculations you should just count candidates per wafer.
Things can be real odd with aspect ratios, see CSX-SP XCC versus ICX-SP XCC dies per wafer comparison.

Frenetic Pony · Nov 7, 2022

OlegSH said:
Still trying to wrap my head how everyone here is getting 340$ (must be some magic number) for a chip 2x size of the GCD in NAVI31, $100-122 * 2x (some magic happens here) = 340!

It's just what the die size calculator shoots out. About 50 good chips, twice the size does mean half the yield, edge and defect loss increase as size goes up.

I really should be looking up the exact size ratios to get a more accurate assessment, the less square everything is the worse yield gets. But I'm too lazy to be bothered.

OlegSH · Nov 7, 2022

Frenetic Pony said:
About 50 good chips

Yes, and I asked it on previous page - what is a good chip?
4090 doesn't have a perfect die with everything enabled, and I don't see where it's said that defective dies (by these calcs) are unfunctional.
So I will repeat it, all these yield calculations do not make any sense

Frenetic Pony · Nov 7, 2022

OlegSH said:
Yes, and I asked it on previous page - what is a good chip?
4090 doesn't have a perfect die with everything enabled, and I don't see where it's said that defective dies (by these calcs) are unfunctional.
So I will repeat it, all these yield calculations do not make any sense

Doesn't really matter, the voltage thresholding for chips is different as well even if you get "good" defect free die yields, it's all an estimate and Nvidia is still losing out, and edge loss is still significant even if you assume defect free.

Nvidia is just eating a ton of cost here and AMD theoretically wouldn't be, if there chips were working. Its why everyone and their mum wants to go for chiplets, and why Nvidia is charging out the arse for their chips. They want their profit margins to stay high so lazy analysts keep recommending their stock even as their sales plummet, "because margins!".

I wonder if Battlemage will be any good, or if AMD will show up with performant RDNA3 and force those Nvidia margins down. Either way I'm probably gonna wait longer to upgrade, so it goes.

Digidi · Nov 8, 2022

Any news about that N31 has an issue with 3 ghz clock? A lot of people ar saying that a respin is solving it? How long does a reapin take I think the compiler issue with Ryzen 1800x took 2 month to solve.

vjPiedPiper · Nov 8, 2022

A Silicon respin takes months, likely many months. i would say 3-4 at best, probably 5-6 in reality, and thats all assuming they are able to find and fix the problem in good time.

On a separate topic, I'm sad to see that RDNA3 doesn't appear to have any true support for multi-chiplet GPU design.
By that i mean any features or a design that would appear to enable multi-GCD die designs, not just splitting off the GCD, and MCD.

I dont see them doing a 2 x GCD + 12 MCD product, with the current arch

Still going to need some pretty big changes to the front end, before we start seeing multiple GCD designs.

Henry swagger · Nov 8, 2022

https://twitter.com/x/status/1589823625162465282

DegustatoR · Nov 8, 2022

WGPs are there but I'm not really sure that "CU" is a thing in RDNA3 anymore?
The whole concept of CU in RDNA1/2 seem to have more to do with GCN compatibility for consoles than actual RDNA h/w.
RDNA3 seem to completely throw the CU out and from this perspective is an enhanced RDNA2 WGP, with more local memory and dual issue SIMDs.

entity279 · Nov 8, 2022

OlegSH said:
Still trying to wrap my head how everyone here is getting 340$ (must be some magic number) for a chip 2x size of the GCD in NAVI31, $100-122 * 2x (some magic happens here) = 340!

Cost absolutely does not increase liniarly with area increase, mainly because of yield will be lower for sure. The magic would be 2 x size = 2 x price
(Anyway if it would increase liniarly, AMD would be fools to go multi - die in this case, incurring extra packaging complexity and power consumption for little to no benefit)

OlegSH · Nov 8, 2022

entity279 said:
Cost absolutely does not scale liniary with area increase, mainly because of yield will be lower for sure.

Gosh, thanks for explaining the obvious stuff, what I am trying to tell you is different. How it scales depends not just on the yield of perfect dies, but mostly on how you bin chips for your products and on final products characteristics, isn't that obvious?

Leoneazzurro5 · Nov 8, 2022

Costs don't increase linearly for two reasons:

- defect rate affects dies proportionally to the area, so the yield decreases with the area increase. Of course it's not that every defect will make a die totally useless: you can sell a die with a defective shader engine, ROP, or group of shader processors. On the other side, there are defects that completely prevent a die to work properly.
- candidate dies decrease with the area increase, for the simple reason that the die is rectangular and the wafer is round.

The latter increases in importance the bigger is the die. It depends also on aspect ratio and wafer edge exclusion. For a 15x20mm die (300 mm^2) we have around 180 dies per wafer, with a 30x20mm die we have around 80 die per wafer, so approximately a 12,5% more than double the cost for the bigger die based on this alone.

OlegSH · Nov 8, 2022

Leoneazzurro5 said:
On the other side, there are defects that prevent a die to work properly.

That's an interesting stuff given that smaller chips usually have more FE, display and other logic (relative to SMs and other blocks that can be disabled) that, if defective, would make a GPU unusable.
Still there is nothing one can't solve by adding a bit of redundancy and more flexible floorplanning (if this was a problem indeed).

trinibwoy · Nov 8, 2022

DegustatoR said:
WGPs are there but I'm not really sure that "CU" is a thing in RDNA3 anymore? The whole concept of CU in RDNA1/2 seem to have more to do with GCN compatibility for consoles than actual RDNA h/w.
RDNA3 seem to completely throw the CU out and from this perspective is an enhanced RDNA2 WGP, with more local memory and dual issue SIMDs.

AMDs official 7900XTX specs say 96 CUs, 96 RT units and 384 TMUs. That amounts to 2xSIMD32, 4xTMU, 1xRT accelerator per CU just like RDNA2.

According to AMD CUs are most definitely still a thing. It seems to me that they’ve tossed WGPs, made the SIMDs dual-issue and doubled the local memory and cache per CU. So each CU is now as “fat” as a WGP but with lower achievable peak flops due to co-issue.

There was no mention of WGPs in AMD materials that I saw. The new blocks are “unified compute units” that fully share resources. It makes a whole lot more sense than continuing with WGPs.

DegustatoR · Nov 8, 2022

trinibwoy said:
According to AMD CUs are most definitely still a thing.

But can they still act as separate units? This is what made CUs "a thing" in RDNA1/2.

Leoneazzurro5 · Nov 8, 2022

OlegSH said:
That's an interesting stuff given that smaller chips usually have more FE, display and other logic (relative to SMs and other blocks that can be disabled) that, if defective, would make a GPU unusable.
Still there is nothing one can't solve by adding a bit of redundancy and more flexible floorplanning (if this was a problem indeed).

This is more than offset by the increase in area and area dedicated to interconnection among the various groups.
Also, how you plan to decrease the effect of reduction of candidate dies? Moreover, redundancy adds area, it is a compromise.

trinibwoy · Nov 8, 2022

DegustatoR said:
But can they still act as separate units? This is what made CUs "a thing" in RDNA1/2.

An RDNA 1/2 CU doesn’t technically act as a single unit as it shares the instruction cache and issue hardware with the 2nd CU in the WGP. However most execution resources were local to a CU (SIMDs, TMUs, RT, L0$). Aside from the shared front end the primary purpose of the WGP is to allow 2 CUs to share a larger, combined LDS.

The WGP is technically the “compute unit” as it encapsulates the instruction issue hardware and LDS. For example, OpenCL reports the 6900xt as having 40 compute units. The whole WGP/dual-compute unit nomenclature is unnecessarily confusing IMO and doesn’t align with how compute apis see the hardware.

DegustatoR · Nov 8, 2022

trinibwoy said:
An RDNA 1/2 CU doesn’t technically act as a single unit as it shares the instruction cache and issue hardware with the 2nd CU in the WGP. However most execution resources were local to a CU (SIMDs, TMUs, RT, L0$). Aside from the shared front end the primary purpose of the WGP is to allow 2 CUs to share a larger, combined LDS.

The WGP is technically the “compute unit” as it encapsulates the instruction issue hardware and LDS. For example, OpenCL reports the 6900xt as having 40 compute units. The whole WGP/dual-compute unit nomenclature is unnecessarily confusing IMO and doesn’t align with how compute apis see the hardware.

Right. Hence why I've said that it's there mostly for GCN compatibility (needed for consoles for the most part) and it remains to be seen if RDNA3 even expose "CU" in any capacity or just goes with "WGP" now as the basic shader processing unit.

In comparison Nvidia has "SM" which consists of 4 processing blocks with 2 SIMDs each all sharing the same L1/LDS. It is essentially the same design now in RDNA but with the capability to "split" WGP in halves to work in CU mode - which isn't really necessary for anything else but h/w level GCN compatibility?

pTmdfx · Nov 8, 2022

DegustatoR said:
WGPs are there but I'm not really sure that "CU" is a thing in RDNA3 anymore?
The whole concept of CU in RDNA1/2 seem to have more to do with GCN compatibility for consoles than actual RDNA h/w.
RDNA3 seem to completely throw the CU out and from this perspective is an enhanced RDNA2 WGP, with more local memory and dual issue SIMDs.

"CU mode" exists as an operational mode (versus "WGP mode"). It determines the upper bound of how much threadgroup memory a thread group can allocate. (64KB vs 128KB).

Edit: It also determines the cap on TLP i.e., how many active wavefronts your workgroup can run in parallel (i.e., one or both CUs).

DegustatoR · Nov 8, 2022

pTmdfx said:
"CU mode" exists as a operational mode (versus "WGP mode"). It determines the upper bound of how much threadgroup memory a thread group can allocate. (64KB vs 128KB)

Shouldn't this be 192KB now in RDNA3?

pTmdfx · Nov 8, 2022

DegustatoR said:
Shouldn't this be 192KB now in RDNA3?

Why? That's the vector register file getting 50% increase.

AMD RDNA3 Specifications Discussion Thread

Bondrewd

Frenetic Pony

OlegSH

Frenetic Pony

Digidi

vjPiedPiper

Henry swagger

DegustatoR

entity279

OlegSH

Leoneazzurro5

OlegSH

trinibwoy

Meh

DegustatoR

Leoneazzurro5

trinibwoy

Meh

DegustatoR

pTmdfx

DegustatoR

pTmdfx

Similar threads