RDNA4

Not the same node. N4P is half a shrink over 4N, you get both increased density and substantial perf/watt.

No, it's not.

TSMC N4 promises N4 to be about 6% denser than N5, and N4P to have same density as N4 but better performance (more clock rate or lower power consumption)

You can see this very well on Hopper vs Blackwell.

Nvidia does not use N4/N4P process. nVidia uses custom "4N" and "4NP" processes which are different than "N4" and "N4P" processes, and comparing transistor densities of chips of different architectures is really stupid anyway.
 
Why would you expect a N4 chip to have perf/xtor more in line with N6 than N5?

All of which are on the same node. Notably, while the high level architecture is extremely similar except for clocking much higher, they're all worse perf/xtor than Ampere on 8lpp.

Once again, all on the same node.

With 5nm, both Nvidia and AMD saw *huge* gains in transistor density - much more than the headline density figures from TSMC would indicate - but also big regressions in perf/transistor. To me that indicates that the ideal circuit layout for 5nm uses lots of high density transistors, whereas the same circuit on N7 might have used fewer higher performance transistors and/or lots of decap cells.

No. The reason for the much higher density was having much bigger caches. SRAM has much higher transistor density than logic.
Don't expect transistors to scale 1:1 with features or performance. You can lay out the same functional circuit with a hugely varying number of transistors of varying density, even on the same node, and what is optimal for one node won't be optimal for another.

Nobody expects them to scale exactly 1:1.

But when the architectures are similar, the correlation for GPUs is much closer to 1 than what you think.
 
We don't really have an N5 chip from AMD with which to make this comparison because N31/N32 are MCM. The closest is Zen 3 to Zen 4 which saw a 58% transistor count increase deliver 38% more performance in Spec Int nT but some of that budget was spent on AVX 512 where the performance increase is infinite since Zen 3 does not support it so its not quite as cut and dry but lets assume Spec Int is similar to raster and the additional transistors get spent on RT performance improvements just to give us a rough starting point.

That would mean scaling from N21 to a monolithic N5 class design 42.3B transistors should provide about 38% more performance which is inline with the 7900XTX if scaling up from the 6950XT. Issue is the 7900XTX uses 58B transistors. Even if you take off the approx 2.4B transistors used for the MCD GCD phys that would not be needed in a monolithic design it means you would still be at 55.6B transistors which is more than double for just 38% more performance. It is utterly terrible. If we take the 4K delta between the 7900XTX and the 7900XT then it would mean with linear transistor scaling 7900XT performance should be doable in 33.8B transistors if AMD can match the performance to transistor increase ratio of Zen 3 to Zen 4. In 240mm that would need 141M xtors /mm.

Trying to extrapolate GPU performance scaling from CPU performance scaling makes absolutely no sense at all. And it makes even less sense when doing while comparing two different microarchitectures.

Your estimations are just 100% rubbish.
 
Trying to extrapolate GPU performance scaling from CPU performance scaling makes absolutely no sense at all. And it makes even less sense when doing while comparing two different microarchitectures.

I did say as much.

I don't even know if that is a valid way to look at is given the differences between CPUs and GPUs but without a monolithic 5nm GPU from AMD as a starting point is the best we have.

So it is not as though I am unaware of the huge differences.

Your estimations are just 100% rubbish.

Of course they are. We have almost zero information at all on RDNA4 aside from this leak of 240mm and 64 CUs / 130mm and 32CUs with performance around 7900XT for the former and between 7600XT and 7700XT for the latter.

Until we get more concrete information from AMD or some code commits shed a bit more light on the matter the best we have is fuzzy guesses based on transistor density and die sizes. Even guessing from the spec sheet is pretty ropey because we don't know clock speeds or clock scaling or CU scaling. Taking any AMD announcements at face value is also a no no after their fabricated and twisted RDNA 3 reveal. 1.54x perf/w my foot.

If you want perfect accuracy then wait for benchmarks.
 
Until we get more concrete information from AMD or some code commits shed a bit more light on the matter the best we have is fuzzy guesses based on transistor density and die sizes.
We have abolutely nothing based on "transistor density". Any "transistor density" numbers of different chips do not give any meaningful numbers that can be used for estimating RDNA4 specifications or performance in any way.

Logic density and SRAM density numbers between processes are things that matter, but all "transistor density" comparisons between different chips are totally bogus.


And when the chips are being developed, transistor counts are practically never being discussed by the engineers. When the design is being synthesized from hardware description language into actual design, the synthesis tools don't give out any transistor count numbers. The synthesis tools give out area numbers and leaf cell counts, but not transistors counts. And a single leaf cell can mean anything from single inverter (two transistors on CMOS) to huge block of SRAM memory.

The transistor count number is just boasting number the marketing guys want to have. Then in the end the backend guys come up with some number and tell that to the marketing guys.
 
Last edited:
We have abolutely nothing based on "transistor density". Any "transistor density" numbers of different chips do not give any meaningful numbers that can be used for estimating RDNA4 specifications or performance in any way.

I think it would be obviously absurd if a leaker were to claim that N44 would get 7900XTX performance from a 100mm die on N4P. There is simply not enough space to fit enough stuff to achieve it, not even an RV770 moment would get you there. That is making assumptions about logic scaling / transistor count / transistor density and deciding such a feat is impossible on that node which clearly it is.

You won't get anything accurate with any of it but I am not going for accurate, I am just trying to get in the same postcode and for that I think it works okay if the underlying architecture is remaining fairly consistent. Also it is less about predicting the performance or spec of N44/N48 and more about trying to decipher if a proposed spec /performance claim is believable or not.

The reality is that AMD could give us the full spec and clockspeed and we still would not be able to make accurate predictions because we would need to make assumptions about clock scaling, performance of each CU and CU scaling. If you tried to predict 5700XT performance from CU count and clockspeed from the Vega 64 / Radeon 7 baseline you would be entirely incorrect (although to be fair so would transistor scaling given the scope of changes between the architectures).

Until we have 3rd party benchmarks any method is just a wild guess based on a combo of gut feeling and what has been done before.
 
Not arguing for any specific performance level of N48, but I'd like to make two observations:


One of the more counterintuitive findings from statistics is that you should always expect to revert to the mean. If two very tall people have kids, you should expect the kids to be shorter than them. Taller than the average person, but well shorter than their parents, for the simple reason that their parents were outliers and you should expect to revert to the mean.

If a vendor that has done decent generational improvements before, generation N is decent, and N+1 is a real stinker, you should assume that N+2 is closer to a decent improvement over what N+1 should have been instead of what it was. Because if the failure was an outlier, you should expect to revert to the mean. See, for example, NV3x vs NV4x. (Or in the opposite direction, when one generation is an advance much greater than expected, you should expect the generational improvement immediately after that to be much more muted, because if the great advance was an outlier, you should, again, expect reversion to the mean. See G80 and successors.)




Not the same node. N4P is half a shrink over 4N, you get both increased density and substantial perf/watt. You can see this very well on Hopper vs Blackwell.
Except RDNA2 was the outlier, not RDNA3. So perhaps you're right and I should have never been optimistic about RDNA3 in the first place...

And no, N4P is nowhere near 'half' a full node shrink.
 
I’m really not getting what’s so bad about RDNA 3 aside from RT performance. Perf/watt is slightly behind Ada. What exactly are people looking at to determine it sucks.
 
I’m really not getting what’s so bad about RDNA 3 aside from RT performance. Perf/watt is slightly behind Ada. What exactly are people looking at to determine it sucks.

I guess it's just not cheap enough to offset the downsides. It's not that AMD is too greedy or something, of course, because it's really not cheap to manufacture these things.
A product is not just about the hardware, it's the entire package. AMD is getting close with things like FSR3.1, but I do hope AMD keeps on making more efforts in making their GPU more attractive.
 
I guess it's just not cheap enough to offset the downsides. It's not that AMD is too greedy or something, of course, because it's really not cheap to manufacture these things.
A product is not just about the hardware, it's the entire package. AMD is getting close with things like FSR3.1, but I do hope AMD keeps on making more efforts in making their GPU more attractive.

Sure but the complaints haven’t been about pricing or extra features. People seem to be disappointed in the architecture and performance. But performance seems absolutely fine to me (excluding heavy RT).
 
Sure but the complaints haven’t been about pricing or extra features. People seem to be disappointed in the architecture and performance. But performance seems absolutely fine to me (excluding heavy RT).
They drifted further behind Nvidia in both features and performance while raising prices.
 
I’m really not getting what’s so bad about RDNA 3 aside from RT performance. Perf/watt is slightly behind Ada. What exactly are people looking at to determine it sucks.
Sure but the complaints haven’t been about pricing or extra features. People seem to be disappointed in the architecture and performance. But performance seems absolutely fine to me (excluding heavy RT).
People don't know/don't care that RDNA2 achieved near-parity to Ampere through a higher bill of materials on a more advanced node, and don't know/don't care that for RDNA3/Ada that situation is reversed.
 
People don't know/don't care that RDNA2 achieved near-parity to Ampere through a higher bill of materials on a more advanced node, and don't know/don't care that for RDNA3/Ada that situation is reversed.

Yeah exactly. The 6900 xt was a 80 CU card going up against a 82 SM 3090 on an inferior node. The 7900 xtx is a 96 CU card going up against a 128 SM 4090 on the same node. It’s baffling that people would expect the same outcome. RDNA 3 seems to be performing perfectly in line with its specs.

Is RDNA 4 expected to be on 3nm or 5nm (and equivalents) ?
 
Sure but the complaints haven’t been about pricing or extra features. People seem to be disappointed in the architecture and performance. But performance seems absolutely fine to me (excluding heavy RT).
I think the issue is that the 7900 XTX BOM should put it between the 4080 and 4090, yet it is barely faster than the 4080 in rasterization. AMD made a big silicon investment with Navi 31 and didn't seem to make up any ground at all.

Also the performance per watt improvement vs. RDNA 2 is unimpressive considering N5 alone is supposed to offer 30% more efficiency at the same clocks.
 
Sure but the complaints haven’t been about pricing or extra features. People seem to be disappointed in the architecture and performance. But performance seems absolutely fine to me (excluding heavy RT).

I think it comes from the misleading RDNA 3 reveal. The performance claims and PPW claims there just did not stand up to the 3rd party review findings.
 
I’m really not getting what’s so bad about RDNA 3 aside from RT performance. Perf/watt is slightly behind Ada. What exactly are people looking at to determine it sucks.
Lovelace out-the-box TDP is typically a fair bit higher than it needs to be though, for most of the range. It's much more efficient in reality.

I'm not gonna derail this too much with a ton of RDNA3 talk, but I think it's very reasonable to say it was hugely disappointing. I mean, it even fell a fair bit short of what AMD themselves said it was gonna do(and I do not think AMD was simply lying). There's literally something wrong with it.
 
yes, but even if Strix would magically clock 3,5 Ghz with lower voltage, it won´t change RDNA3 GPU fate as it´ obvious now, AMD not planning any NAVI 31/32/33 refresh/respin
Yeah but I need my vindication.
I’m really not getting what’s so bad about RDNA 3 aside from RT performance. Perf/watt is slightly behind Ada. What exactly are people looking at to determine it sucks.
PPA is dogshit versus projected.
Needs that fmax juice back.
 
Back
Top