AMD: RDNA 3 Speculation, Rumours and Discussion

Status
Not open for further replies.
Navi22 is the right comparision to Nav10 because they are have the same power consumption and the same compute configuration.
You can compare the 6700XT to the Polaris 590 card. The 6700XT is ~2,4x more effcient. And that is with a better uArch and a galaxy better process from TSMC.
 
Navi22 is the right comparision to Nav10 because they are have the same power consumption and the same compute configuration.
You can compare the 6700XT to the Polaris 590 card. The 6700XT is ~2,4x more effcient. And that is with a better uArch and a galaxy better process from TSMC.

Same compute configuration means ZERO if it's not compared to actual performance, power consumption, features and competition. Voltage/frequency curve is a tool used to adjust performance to your competitors while drastically changing the power characteristics.

Ok, let's do a game.

Card A has same SM count of card B but B has different (seemingly improved) architecture
Card B is 35%-40% more powerful than card A
card B consumes 30% more power than card A even with a better process.

What is your opinion on card B, considering that in every metric is below the improvements seen in the RDNA1 to RDNA2 change?
 
Last edited:
Navi22 is the right comparision to Nav10 because they are have the same power consumption and the same compute configuration.
Architecture is more than CUs or transistors. It's also clocks and efficiency.
When a new µarch achieves the same or better performance as an older one (yet to be verified independently), it's very valid to compare from this aspect, i.e. iso perf.

P.S.: Navi22 and Navi10 is not the same compute configuration, it's 40 vs. 36 CUs, IIRC.
edit: strike the last, I was thinking of Polaris, not Navi.
 
Last edited:
Same compute configuration means ZERO if it's not compared to actual performance, power consumption, features and competition. Voltage/frequency curve is a tool used to adjust performance to your competitors while drastically changing the power characteristics.

You can build a twice as big chip and just downclock it. It will be more effcient. 6700XT has the same compute configuration like the 5700XT so the performance improvement has to come from
a) process and/or
b) architecture.
6700XT is 30% bigger (1.7x more transistors), 30% more effcient and overall 30% faster. That is pmax between RDNA1 and 2. The 6900XT is 100% bigger (2,5x more transistors), 45% more effcient while delivering overall 95% more performance.

Chiplet design wont help AMD for gaming. This talk that AMD would be deliver 2.7x more compute is just baseless like that Navi23 would be a nVidia killer...
 
You can build a twice as big chip and just downclock it. It will be more effcient. 6700XT has the same compute configuration like the 5700XT so the performance improvement has to come from
a) process and/or
b) architecture.
6700XT is 30% bigger (1.7x more transistors), 30% more effcient and overall 30% faster. That is pmax between RDNA1 and 2. The 6900XT is 100% bigger (2,5x more transistors), 45% more effcient while delivering overall 95% more performance.

Chiplet design wont help AMD for gaming. This talk that AMD would be deliver 2.7x more compute is just baseless like that Navi23 would be a nVidia killer...

You are arbitrarily assuming 6700XT vs 5700XT is the max efficiency inflexion point whereas it would be quite easy to get same performance from a N22 chip by clocking it as low as the 5700XT and having same performance while consuming way less power.Efficiency points are not immovable, but they are part of a curve depending on architecture, frequency and voltage. This is why TSMC gives as parameters for a comparison between processes the value "+frequency at ISO power" and "-power at ISO frequency" and guess what, the second percentage value is always way higher than the first one. The simple fact that AMD managed to get at least a +30%perf/W on the same process with a very similar architecture is nothing short of great from an engineering point of view.
And again, you did not give me any answer about cards A and B. They are existing cards, could you tell me which ones?
 
Edit- made a mistake, double check your math kids! Updated to reflect

Alrighty then, cost estimate time. Using: https://caly-technologies.com/die-yield-calculator/ and estimating from the 6900xt die and approximating a 120CU 5nm die as 13x15nm (who knows how big the I/O die will be or where it'll come from). We get 66 good dies from 6900xt per wafer; we get 239 good dies per wafer for 120CU 5nm.

Now the cost for 5nm wafers is about double 7nm, maybe a bit more but we'll leave error bars there. Still, we've got enough dies for (almost) 120 good GPUs. Even accounting for doubling the cost, and including the I/O die... the cost for this ludicrous mode GPU doesn't look a ton higher than for a 6900xt. Behold the power of chiplets.

Still, I'd rather support a 384bit bus from an engineering standpoint, rather than a 256bit bus again, at least for the highest end dual chiplet configs. "Infinity cache improvements" are bullshit, either your buffers fit or they don't, and still need to pull textures and meshes and stuff from main memory every frame for what have you. Thus to improve that portion of your frametime you need more bandwidth. A 384bit bus and 24gb of 20gbps GDDR6 (there's been talk about this for years, and I swear I saw an announcement for test production this year but F* me if I can find it). Still, I'll keep this in mind and thus...

7900xt: 240CU, 500 watts, 24gb ram, $1,600-$2,000
7800xt: 200-216CU, 450 watts, 12-20gb ram? $1,000-$1,200
(note: might be 3 configs, depending on yields)

7700xt: 120CU, 300-330 watts, 16gb ram, $750
7700: 100-108CU, 275-300 watts, 16gb ram, $600.

Below:??? Different compute die.

Keep in mind the 256bit bus might be right. But as stated above, that's a fixed frametime cost you're not improving whatsoever if that's true, and that's despite moving to an I/O chiplet with much better yields. Also the 7700xt/non xt prices are just based on zero information about competition. Intel won't be any competition whatsoever in this performance category, so what Nvidia comes up with will probably heavily inform the single chiplet configs. Could easily go $500 for the 7700 if really needed.
 
Last edited:
Imagine the slowdown on macro polygons.
In Future Games there are no Macro Polygons any more, there are micro Polygons. Nanite runs on shader Code but this is inefficient you waste a lot of die space for doing something which can be faster done on a smaller area with fixed functio hardware. It would be nice to now how much shader resources Nanite is allocating in percent? I will not be surprised when it be 20-40% of shaders.
 
140mm^2 is a crazy low estimate for a compute die with 50% more CUs than Navi 21. Even accounting for no infinity cache, you're ~quadrupling the density from N21.

Made a slight mistake with one estimation, true. It should be about it should be... 200ish? What I get for not double checking. But it should be small. While transistor density from 7nm to 5nm TSMC isn't exactly double, it's still 95% there. Looking at die approximations of Navi 21, the actual compute unit part is, maybe half the die. The SRAM takes up a lot, but TSMC has been rattling on about how ultra dense its new SRAM libraries are, ridiculously small versus 7nm, so proportionally that will shrink a lot as well. Remember, you're not quadrupling from N21, you're only tripling the CU count. Proportionally I/O and associated (like video stuff) should be smaller now, even with a 384bit bus, and could potentially be built relatively cheap, on say Global Foundries and whatever they have. It'll be a bit more expensive than the 6900xt for die, and possibly more for ram if there's 24gb instead of 16. But the prices shouldn't be affected that much, still looking at <= $2k
 
You're only counting fully good dies, ignoring harvesting and small-scale redundancies, right?
 
It should be about it should be... 200ish?
More.
even with a 384bit bus
NEVER.
Literally never happening anymore.
on say Global Foundries and whatever they have
MCDs are N6 and they don't contain video cores or display cores or anything like that.
ridiculously small versus 7nm
?
N5 SRAM scaling is miserable, like 1.15x or so?
Still, I'd rather support a 384bit bus from an engineering standpoint, rather than a 256bit bus again, at least for the highest end dual chiplet configs. "Infinity cache improvements" are bullshit, either your buffers fit or they don't, and still need to pull textures and meshes and stuff from main memory every frame for what have you. Thus to improve that portion of your frametime you need more bandwidth. A 384bit bus and 24gb of 20gbps GDDR6 (there's been talk about this for years, and I swear I saw an announcement for test production this year but F* me if I can find it). Still, I'll keep this in mind and thus...
not happening.
sorry.
 
In for a penny, in for a pound:

b3da045.png
 
Status
Not open for further replies.
Back
Top