AMD RDNA3 Specifications Discussion Thread

Where did you get that? Any proofs?

Again, not confirmed info on N33, so still highly inaccurate and speculative stuff

5$ turned into 20$ would change a lot

Are there any real info on prices rather than just rumors?
TSMC 7nm is well known to be ~$8k, TSMC N6 is a cost alternative so (IMO) likely a bit lower.
Most pricing estimates of TSMC 5nm is ~$16-18k, assume their custom 4nm would be at the highend of that, if not higher.
True, but in that case it is even worse then... for speculating without basic numbers.
$5 into $20 on a product that has a likely BOM of ~$250-$300, about 10%. And if it is $350-$400, then ~5%.
Packaging costs, being the wiley devil it is, it is (again IMO) generally better to assume a percent markup for advance package mass market products.
In most cases, actual analysts have a pretty good (~5%) MOE on their estimates. Volume discounts are usually where the big misses come from.
We had a legacy product that we had to double the price, back in 2019, because our main volume customer finally moved to an updated design... though to be fair they were buying wafers and handling packaging/assembly themselves.
 
If this is accurate, it means LDS has been practically doubled?
So besides 'no more register pressure' due to +50% VGPR, no more occupany drop due to high LDS usage either?
Kept me thinking.
But maybe, now that waves execute twice as fast due to dual issue, we also need twice the waves in flight to hide memory latency.
So we still have similar constraints on register and LDS usage as before, if so.
Not sure, but that's interesting.
 
There was a term in Erle Stanley Gardner's Perry Mason novels that he used a lot and I grew quite fond of, "take a button and sew a vest on it". I'm think I'm seeing a lot of that in this thread. :|
 
How are they getting the cache bandwidth btw? N21 is 16 slices of 8MB, 64B/clock, 1024B/clock total at "up to 1.94GHz", 1986.6GB/s. RDNA3 has 5.3TB/s bandwidth (rest of the slides) including VRAM, 4340GB/s without VRAM and 96MB vs 128MB (0.75x) which makes it 2.91x BW per slice, assuming same slice count. Did they reduce the clocks a lot (~1.4GHz), go 4MB slices and double width per slice (24 slices, 128B/clock)? 192B/clock, 12x 8MB (same as RDNA2), 1.88GHz? Because 2.82GHz would be needed if they only doubled one aspect and that sounds unrealistic given N31 clocks although if they fixed it that becomes realistic again

*That bandwidth increase is insane, don't think it's had much attention but it's very impressive

**Also means ~2.8TB/s for N32 and ~1.4TB/s for N33 assuming similar clocks
 

Attachments

  • rdna 2 infinity cache bandwidth.jpg
    rdna 2 infinity cache bandwidth.jpg
    305.3 KB · Views: 15
Last edited:
$5 into $20 on a product that has a likely BOM of ~$250-$300
I just spent a few minutes calculating this stuff and got 1.35x cost difference between AD102 and NAVI31 GPU packages with 2x 5nm wafer cost and 4x packaging cost you provided, so it seems there is a good deal of wishful thinking involved into your judgment because numbers are again far closer to what was in this table rather than to some random 2-3x guesses.
Also, it will be interesting to see whether 7900 XT physically lacks an MCD or if there are all MCDs in the package but with one disabled (which would mean more complex chiplet packaging also affects yields)
 
You're simply saying that the first option above is not a bad thing in your opinion because (perhaps more rationally than my argument) you don't care that the Series S might have better core graphics because you value the PC trade off of much higher frame rates and resolution more. For my part the minimum baseline would always be what I could get from a much cheaper piece of hardware, and then the extra value from the more expensive/powerful PC comes from building on that. That almost always translates as turning all settings to max (with the potential occasional exception of some invisible Ultra settings), and then adjusting my resolution accordingly to hit a suitable frame rate for that particular genre - which in many cases I define as solid and well frame paced 30fps or slightly above on my 1070 but would be looking to up that to a minimum of 60 or thereabouts on a modern GPU.
I recently moved to a 165 hz monitor, and I found the visual improvement from higher refresh rates very significant (much more than I expected). I wouldn't want to trade that for the (most likely subtle) improvements gained from enabling Series S level RT if that means dropping back down to 60fps. For the target of ">= 120 fps with whatever RT features I can get away with" it's not at all clear that the tradeoffs AMD's making are worse than Nvidia's.
 
I just spent a few minutes calculating this stuff and got 1.35x cost difference between AD102 and NAVI31 GPU packages with 2x 5nm wafer cost and 4x packaging cost you provided, so it seems there is a good deal of wishful thinking involved into your judgment because numbers are again far closer to what was in this table rather than to some random 2-3x guesses.
Also, it will be interesting to see whether 7900 XT physically lacks an MCD or if there are all MCDs in the package but with one disabled (which would mean more complex chiplet packaging also affects yields)
~$18k a wafer.
85-87 candidates a wafer.
Defect density of .07 for TSMC N5, so possibly a bit worse on 4nm since "the source" said it is only slightly better for Nvidia than the disaster that was Samsung 8LPP.
Worst case, only ~50 good die per wafer. $360 per die. I.E. my 3x comment.
Assuming you can save a good portion and keep yield at 80%, you are still looking at ~$260 per die. Hence, my 2x comment. (Edit-Whoops, guess I just thought it, looked back at my post and I didn't say anything about 2x.)
Even if you assume 100% yield, which is what they did, it is still $206 per die or about 1.6x more than my Navi31 estimate which is quite a bit off from the 1.35x AD102 in their chart while being extremely charitable to Nvidia.
My quick and dirty, being somewhat conservative and rounding up my estimates, of Navi31 die cost (GCD and MCDs) was ~$130, assuming 100% yield, it would be ~$100.

It is fine if our numbers don't jive, they are some very rough estimates.
I just don't understand how "the source" knows Nvidia has had yield issues in the past and this node is "slightly better for them" and then assume 100% yield.
It is just amusing to me, that's all.
 
Last edited:
How are they getting the cache bandwidth btw? N21 is 16 slices of 8MB, 64B/clock, 1024B/clock total at "up to 1.94GHz", 1986.6GB/s. RDNA3 has 5.3TB/s bandwidth (rest of the slides) including VRAM, 4340GB/s without VRAM and 96MB vs 128MB (0.75x) which makes it 2.91x BW per slice, assuming same slice count. Did they reduce the clocks a lot (~1.4GHz), go 4MB slices and double width per slice (24 slices, 128B/clock)? 192B/clock, 12x 8MB (same as RDNA2), 1.88GHz? Because 2.82GHz would be needed if they only doubled one aspect and that sounds unrealistic given N31 clocks although if they fixed it that becomes realistic again

*That bandwidth increase is insane, don't think it's had much attention but it's very impressive

**Also means ~2.8TB/s for N32 and ~1.4TB/s for N33 assuming similar clocks
A possibility is that they used a double data rate bus or, eh, SerDes?
64B bidirectional needs 1024 pins if run as parallel... 24 channels means 24576 pins. That's... a lot.

Whereas if you run with e.g. PCIe 5.0 speed SerDes (32GT/s), you can achieve 5TB/s bi-directional with 40x16x2=1280 data pins.
This seems more inline with the pinout density that an organic interposer can achieve.
 
Last edited:
64B bidirectional needs 1024 pins if run as parallel... 24 channels means 24576 pins. That's... a lot.
Which is why they're using industry densest fanout to get all those pins.
This seems more inline with the pinout density that an organic interposer can achieve
It's organic in a way that it's not a Si slab, but it's built with extremely dense RDLs so...
 
~$18k a wafer.
You move your goalposts with every posts.

Most pricing estimates of TSMC 5nm is ~$16-18k
I simply took a lower bound of your fork above for calculations because of your negative bias towards nvidia in this discussion

looked back at my post and I didn't say anything about 2x.
TSMC 7nm is well known to be ~$8k
Well, 16k is a 2x 8k

Defect density of .07 for TSMC N5, so possibly a bit worse on 4nm since "the source" said it is only slightly better for Nvidia than the disaster that was Samsung 8LPP.
Adding more speculative unknowns would obviously skew results even more than before

Worst case, only ~50 good die per wafer. $360 per die. I.E. my 3x comment.
Apparently, Navi31 is being produced on some aliens' 5nm and 6nm tech process with perfect yields while AD102 on some crappy peasants' 5 nm so that the 3x would make any sense.

Assuming you can save a good portion and keep yield at 80%, you are still looking at ~$260 per die.
Why shouldn't I assume a perfect yield of 100%? And what do you mean by yield at all, is it working dies? Then it would be way higher than 80%

or about 1.6x more than my Navi31 estimate which
Our numbers don't add up. With 6 nm MCDs, 5 nm GCD, 2.25x wafer cost (18K for a 5 nm wafer) and 4x packaging cost for chiplets, I am getting 1.39x. With 16K wafer price, that's 1.35x.
The 1.35x difference looks very reasonable to me since it accounts for MCD dies and for more expensive packaging.
Yields are pretty much unknown so I didn't account for them since doing so would introduce even more error.
 

As noted in the earlier blog, bump pad pitch is 25 x 35 μm, giving a potential 57,000 total pads on the die, although Apple only mentioned 10,000 I/Os (20,000 pads).

For homework, this article is spectacular:


:mrgreen:
 
Last edited:
Ian Cutress has a video on TSMC 5N costs here:


But to summarise, you can use the below calculator, with a defect density of 0.07 and an edge loss of 4, on a 300mm wafer. Ian suggests a ballpark cost of $17000 per wafer, based on feedback from his industry contacts. ($10,000 for 6N)

 
Last edited:
But to summarise, you can use the below calculator, with a defect density of 0.07 and an edge loss of 4
Yes, I used this one, but it does the same. Using CALY's calculator, difference shrinks to 1.32661x with $17000 for 5 nm wafer and $10,000 for 6N, pretty much in line with all my previous calculations.
 
Getting:
Navi 31 main die: $100-122 assuming perfect voltage yield. Remember AMD is a big, longtime customer that negotiates far in advance, so gets lower prices
MCD: $6.50 apiece
Packaging: $10-25

So, $149-187 BOM. Versus $340 for a 4090 for Nvidia. At $180 for the 4080 it's about the same cost though. It's really a shame for AMD that bug happened, a 33% performance boost would have it competing with the 4090, and at around half the price. Wonder how long it'll take to iron out.
 
Back
Top