AMD: RDNA 3 Speculation, Rumours and Discussion

troyan · Jul 30, 2021

Navi22 is the right comparision to Nav10 because they are have the same power consumption and the same compute configuration.
You can compare the 6700XT to the Polaris 590 card. The 6700XT is ~2,4x more effcient. And that is with a better uArch and a galaxy better process from TSMC.

Leoneazzurro5 · Jul 30, 2021

troyan said:
Navi22 is the right comparision to Nav10 because they are have the same power consumption and the same compute configuration.
You can compare the 6700XT to the Polaris 590 card. The 6700XT is ~2,4x more effcient. And that is with a better uArch and a galaxy better process from TSMC.

Same compute configuration means ZERO if it's not compared to actual performance, power consumption, features and competition. Voltage/frequency curve is a tool used to adjust performance to your competitors while drastically changing the power characteristics.

Ok, let's do a game.

Card A has same SM count of card B but B has different (seemingly improved) architecture
Card B is 35%-40% more powerful than card A
card B consumes 30% more power than card A even with a better process.

What is your opinion on card B, considering that in every metric is below the improvements seen in the RDNA1 to RDNA2 change?

CarstenS · Jul 30, 2021

troyan said:
Navi22 is the right comparision to Nav10 because they are have the same power consumption and the same compute configuration.

Architecture is more than CUs or transistors. It's also clocks and efficiency.
When a new µarch achieves the same or better performance as an older one (yet to be verified independently), it's very valid to compare from this aspect, i.e. iso perf.

P.S.: Navi22 and Navi10 is not the same compute configuration, it's 40 vs. 36 CUs, IIRC.
edit: strike the last, I was thinking of Polaris, not Navi.

Bondrewd · Jul 30, 2021

CarstenS said:
it's 40 vs. 36 CUs, IIRC.

40 vs 40 but different amount of FF ingest and all.

troyan · Jul 30, 2021

Leoneazzurro5 said:
Same compute configuration means ZERO if it's not compared to actual performance, power consumption, features and competition. Voltage/frequency curve is a tool used to adjust performance to your competitors while drastically changing the power characteristics.

You can build a twice as big chip and just downclock it. It will be more effcient. 6700XT has the same compute configuration like the 5700XT so the performance improvement has to come from
a) process and/or
b) architecture.
6700XT is 30% bigger (1.7x more transistors), 30% more effcient and overall 30% faster. That is pmax between RDNA1 and 2. The 6900XT is 100% bigger (2,5x more transistors), 45% more effcient while delivering overall 95% more performance.

Chiplet design wont help AMD for gaming. This talk that AMD would be deliver 2.7x more compute is just baseless like that Navi23 would be a nVidia killer...

Bondrewd · Jul 30, 2021

troyan said:
You can build a twice as big chip and just downclock it

Well you see, N23 is smaller, less power and a bit more perf.

troyan said:
45% more effcient

what

troyan said:
Chiplet design wont help AMD for gaming

WHAT

troyan said:
This talk that AMD will deliver 2.7x more compute

No that's the actual perf.
Compute is 3x aka 6 Navi10 stacked side-by-side.

CarstenS · Jul 30, 2021

Bondrewd said:
40 vs 40 but different amount of FF ingest and all.

Ah, right. Got confused about mixing in RX 500.

Leoneazzurro5 · Jul 30, 2021

troyan said:
You can build a twice as big chip and just downclock it. It will be more effcient. 6700XT has the same compute configuration like the 5700XT so the performance improvement has to come from
a) process and/or
b) architecture.
6700XT is 30% bigger (1.7x more transistors), 30% more effcient and overall 30% faster. That is pmax between RDNA1 and 2. The 6900XT is 100% bigger (2,5x more transistors), 45% more effcient while delivering overall 95% more performance.

Chiplet design wont help AMD for gaming. This talk that AMD would be deliver 2.7x more compute is just baseless like that Navi23 would be a nVidia killer...

You are arbitrarily assuming 6700XT vs 5700XT is the max efficiency inflexion point whereas it would be quite easy to get same performance from a N22 chip by clocking it as low as the 5700XT and having same performance while consuming way less power.Efficiency points are not immovable, but they are part of a curve depending on architecture, frequency and voltage. This is why TSMC gives as parameters for a comparison between processes the value "+frequency at ISO power" and "-power at ISO frequency" and guess what, the second percentage value is always way higher than the first one. The simple fact that AMD managed to get at least a +30%perf/W on the same process with a very similar architecture is nothing short of great from an engineering point of view.
And again, you did not give me any answer about cards A and B. They are existing cards, could you tell me which ones?

Subtlesnake · Jul 30, 2021

troyan said:
RDNA2 is a new architecture, too. Yet with 70% more transistors than RDNA1 it is only ~30% more effcient.

It's 30% more clockspeed at the same power level, or half the power at the same clockspeed as RDNA 1.

https://www.kitguru.net/components/graphic-cards/dominic-moass/amd-rx-6800-review/2/

Frenetic Pony · Jul 31, 2021

Edit- made a mistake, double check your math kids! Updated to reflect

Alrighty then, cost estimate time. Using: https://caly-technologies.com/die-yield-calculator/ and estimating from the 6900xt die and approximating a 120CU 5nm die as 13x15nm (who knows how big the I/O die will be or where it'll come from). We get 66 good dies from 6900xt per wafer; we get 239 good dies per wafer for 120CU 5nm.

Now the cost for 5nm wafers is about double 7nm, maybe a bit more but we'll leave error bars there. Still, we've got enough dies for (almost) 120 good GPUs. Even accounting for doubling the cost, and including the I/O die... the cost for this ludicrous mode GPU doesn't look a ton higher than for a 6900xt. Behold the power of chiplets.

Still, I'd rather support a 384bit bus from an engineering standpoint, rather than a 256bit bus again, at least for the highest end dual chiplet configs. "Infinity cache improvements" are bullshit, either your buffers fit or they don't, and still need to pull textures and meshes and stuff from main memory every frame for what have you. Thus to improve that portion of your frametime you need more bandwidth. A 384bit bus and 24gb of 20gbps GDDR6 (there's been talk about this for years, and I swear I saw an announcement for test production this year but F* me if I can find it). Still, I'll keep this in mind and thus...

7900xt: 240CU, 500 watts, 24gb ram, $1,600-$2,000
7800xt: 200-216CU, 450 watts, 12-20gb ram? $1,000-$1,200
(note: might be 3 configs, depending on yields)

7700xt: 120CU, 300-330 watts, 16gb ram, $750
7700: 100-108CU, 275-300 watts, 16gb ram, $600.

Below:??? Different compute die.

Keep in mind the 256bit bus might be right. But as stated above, that's a fixed frametime cost you're not improving whatsoever if that's true, and that's despite moving to an I/O chiplet with much better yields. Also the 7700xt/non xt prices are just based on zero information about competition. Intel won't be any competition whatsoever in this performance category, so what Nvidia comes up with will probably heavily inform the single chiplet configs. Could easily go $500 for the 7700 if really needed.

Digidi · Jul 31, 2021

Putas said:
Imagine the slowdown on macro polygons.

In Future Games there are no Macro Polygons any more, there are micro Polygons. Nanite runs on shader Code but this is inefficient you waste a lot of die space for doing something which can be faster done on a smaller area with fixed functio hardware. It would be nice to now how much shader resources Nanite is allocating in percent? I will not be surprised when it be 20-40% of shaders.

Qesa · Jul 31, 2021

Frenetic Pony said:
Alrighty then, cost estimate time.

140mm^2 is a crazy low estimate for a compute die with 50% more CUs than Navi 21. Even accounting for no infinity cache, you're ~quadrupling the density from N21.

Putas · Jul 31, 2021

Digidi said:
In Future Games there are no Macro Polygons any more, there are micro Polygons.

I don't have such crystal ball.

Frenetic Pony · Jul 31, 2021

Qesa said:
140mm^2 is a crazy low estimate for a compute die with 50% more CUs than Navi 21. Even accounting for no infinity cache, you're ~quadrupling the density from N21.

Made a slight mistake with one estimation, true. It should be about it should be... 200ish? What I get for not double checking. But it should be small. While transistor density from 7nm to 5nm TSMC isn't exactly double, it's still 95% there. Looking at die approximations of Navi 21, the actual compute unit part is, maybe half the die. The SRAM takes up a lot, but TSMC has been rattling on about how ultra dense its new SRAM libraries are, ridiculously small versus 7nm, so proportionally that will shrink a lot as well. Remember, you're not quadrupling from N21, you're only tripling the CU count. Proportionally I/O and associated (like video stuff) should be smaller now, even with a 384bit bus, and could potentially be built relatively cheap, on say Global Foundries and whatever they have. It'll be a bit more expensive than the 6900xt for die, and possibly more for ram if there's 24gb instead of 16. But the prices shouldn't be affected that much, still looking at <= $2k

CarstenS · Jul 31, 2021

You're only counting fully good dies, ignoring harvesting and small-scale redundancies, right?

Bondrewd · Jul 31, 2021

Frenetic Pony said:
It should be about it should be... 200ish?

More.

Frenetic Pony said:
even with a 384bit bus

NEVER.
Literally never happening anymore.

Frenetic Pony said:
on say Global Foundries and whatever they have

MCDs are N6 and they don't contain video cores or display cores or anything like that.

Frenetic Pony said:
ridiculously small versus 7nm

?
N5 SRAM scaling is miserable, like 1.15x or so?

Frenetic Pony said:
Still, I'd rather support a 384bit bus from an engineering standpoint, rather than a 256bit bus again, at least for the highest end dual chiplet configs. "Infinity cache improvements" are bullshit, either your buffers fit or they don't, and still need to pull textures and meshes and stuff from main memory every frame for what have you. Thus to improve that portion of your frametime you need more bandwidth. A 384bit bus and 24gb of 20gbps GDDR6 (there's been talk about this for years, and I swear I saw an announcement for test production this year but F* me if I can find it). Still, I'll keep this in mind and thus...

not happening.
sorry.

Jawed · Jul 31, 2021

In for a penny, in for a pound:

Bondrewd · Jul 31, 2021

Jawed said:
In for a penny, in for a pound:

a) GCDs have no IMCs or anything like that.
b) N33 and lower are single die N6 parts for volume purposes.

Jawed · Jul 31, 2021

Bondrewd said:
a) GCDs have no IMCs or anything like that.
b) N33 and lower are single die N6 parts for volume purposes.

I speculate you're wrong on both counts.

Bondrewd · Jul 31, 2021

Jawed said:
I speculate you're wrong on both counts.

Dawg this is from AMD slides.
The wonky looking internal ones at that.

AMD: RDNA 3 Speculation, Rumours and Discussion

troyan

Leoneazzurro5

CarstenS

Moderator

Bondrewd

troyan

Bondrewd

CarstenS

Moderator

Leoneazzurro5

Subtlesnake

Frenetic Pony

Digidi

Qesa

Putas

Frenetic Pony

CarstenS

Moderator

Bondrewd

Jawed

Bondrewd

Jawed

Bondrewd

Similar threads