AMD: RDNA 3 Speculation, Rumours and Discussion

Status
Not open for further replies.
Comparing laptop do desktop SKU is not very accurate.
For example RX 6800M has ~45-50% better perf/W compared to desktop model.
Gaming FrequencyTBPSpecification
RX 6700XT2424 MHz230 W40CU; 192bit 16gbps
RX 6800M2300 MHz145+ W40CU; 192bit 16gbps
Screenshot_1.png
6700XT is only 5% faster in Cyberpunk 2077 than 6800M and RX 6900XT is 76% faster.
So this N32 will provide ~80% higher performance at ISO power(150W) than N22.
If It's actually a cutdown model that would be even better, but I wouldn't bet on It.
 
I have to wonder how high is this N32 actually clocked, because 80% higher performance at ISO power is not so much If I think about N32's specs.
N32 has 50% more WGPs than N22(30 vs 20) and each WGP should be significantly better.

Expected performance per ISO power(~150W)
RX 6800M : 100%
N32 mobile : +80%

N32 has +50% WGPs and If we say RDNA 3 WGP is 50% better than RDNA2 WGP then ideally that's 100*1.5*1.5=225% or +125% higher performance, but It's actually only +80%. I have to lower clockspeed by 20% from 2300MHz to 1840MHz to have the expected performance.

This leak from Greymom55 could actually be a cutdown version of N32, because I don't expect a mobile N32 to be downclocked from >3GHz down to <2GHz just to be within 145-165W, when N22 was downclocked by only a few %.
 
Last edited:
I have to wonder how high is this N32 actually clocked, because 80% higher performance at ISO power is not so much If I think about N32's specs.
N32 has 50% more WGPs than N22(30 vs 20) and each WGP should be significantly better.

Expected performance per ISO power(~150W)
RX 6800M : 100%
N32 mobile : +80%

N32 has +50% WGPs and If we say RDNA 3 WGP is 50% better than RDNA2 WGP then ideally that's 100*1.5*1.5=225% or +125% higher performance, but It's actually only +80%. I have to lower clockspeed by 20% from 2300MHz to 1840MHz to have the expected performance.

This leak from Greymom55 could actually be a cutdown version of N32, because I don't expect a mobile N32 to be downclocked from >3GHz down to <2GHz just to be within 145-165W, when N22 was downclocked by only a few %.
It depends on a lot of factors. If the performance increase was given at 1080p, in example (as gaming laptops have in the most of cases that resolution) you can be more CPU limited than at higher resolution. I.e. at 1080p the 4090 is not so much higher than a 6950XT while at 4K it is much, much faster.
 
It depends on a lot of factors. If the performance increase was given at 1080p, in example (as gaming laptops have in the most of cases that resolution) you can be more CPU limited than at higher resolution. I.e. at 1080p the 4090 is not so much higher than a 6950XT while at 4K it is much, much faster.
It's true we don't know at what resolution It has 6950XT level of performance, but N32 has 256bit GDDR6 + 64MB IC, so even at higher resolutions It won't be bottle necked, so I don't think It matters at what resolution It was.
If that performance was for N33, but that was already denied, then It would be only at 1080p, because It has only 128bit GDDR6 + 32MB IC.
 
It's true we don't know at what resolution It has 6950XT level of performance, but N32 has 256bit GDDR6 + 64MB IC, so even at higher resolutions It won't be bottle necked, so I don't think It matters at what resolution It was.
If that performance was for N33, but that was already denied, then It would be only at 1080p, because It has only 128bit GDDR6 + 32MB IC.
Are you just assuming some of the leaks being correct or do you actually know the specs for facts like you claim?
 
Are you just assuming some of the leaks being correct or do you actually know the specs for facts like you claim?
I don't have any insider info, I am basing It on Angstronomics and no one was disputing their leaked RDNA3 specs as far as I know.
Link
Do you have any info that their specs are incorrect?
 
I don't have any insider info, I am basing It on Angstronomics and no one was disputing their leaked RDNA3 specs as far as I know.
Link
Do you have any info that their specs are incorrect?
No, but considering how 'reputable leakers' have been all over the place as usual, I'm not giving credit to a new name before they've earned it
 
Slight update to guesses, assuming more recent leaks are true:

7900xtx: 384b bus, 24gb 24gbps ram, 192mb cache, 2 (96cu, 2.2ghz) gfx chiplets. $1499
79000xt: 320b bus, 20gb 24gbps ram, 160mb cache, 2 (80cu 2.2ghz) gfx chiplets, $1199
7900: 320b bus, (20?)gb 20gbps ram, 160mb cache, 1 (96cu 3ghz) gfx chiplet, $999
7800xt: 256b bus, 16gb 20gbps ram, 128mb cache, 1 (80cu 2.8ghz) gfx chiplet $699
7800: 256b bus, 16gb 16gbps ram, 128mb cache, 1 (60cu 3ghz) gfx chiplet $549
7700xt: 192b bus, 12gb 18gbps ram, 96mb cache, 1 (54cu 2.7ghz) chiplet $449
7700: 128b bus, 8gb 20gbps ram, 64mb cache, 32cu 3ghz monolithic $349
7600: 128b bus, 8gb 16gbps ram, 32mb cache, 28cu 2.7ghz $279

Series numbers goes down by how high a resolution the card is optimized for. X900 = 4k+, X800 = 1440-4k, X700 = 1440p, X600 = 1080. Bottom tier might skip 64mb, both be 32mb 600 series.
 
If It has 2x better raster performance, but only 2x better RT performance then that would mean RT performance is the same as was with RDNA2 and no improvement was made there.

Thinking about "raster performance vs RT performance" is the wrong way of thinking.

It's really about shader performance vs raw rasterization performance vs raw RT performance.

And shader performance is needed for both rasterization AND ray tracing.

And shader performance has always been improving faster than raw rasterization performance.

The new AMD chips will quite probably have less than 2x the raw rasterization performance, but probably slightly over 2x the shader and raw RT performance.

And as the RT units are integrated into TMUs which are integrated into shader cores, the performance scaling of raw shader power and raw RT power should be equal.
However, they might have some small tweaks in the RT units to allow higher utilization => slightly better real-world performance scaling for RT work.

Also, if the L3 cache size in the high end model has decreased from 128 MiB to 96 MiB, this might mean that rasterization performance might scale worse to 4k resolution due to cache misses increasing considerably at taht resolution. Having more bandwidth does not offset this, the increased bandwidth would be needed to sustain the higher rasterization performance even with similar-sized cache.

So I would not be surprised it typical rasterization workloads would have speedup like 1.8x and RT workloads speedup like 2.4x

But something like 2.4x RT performance would still not be enough to beat nVidia.
 
Slight update to guesses, assuming more recent leaks are true:
I'm betting instead for:

(7950XT3D: Full N31, 24GB, v-cache on MCDs)
7950XT: Full N31, 24GB
7900XT: clipped N31, 20GB
7800T: Full N32, 16GB
7700: clipped N32, 12GB
7600: Full N33, 8GB
7500: clipped N33, 8GB

Clocks are still ???, but I'd expect higher than yours. The big differences are that N33 is 600 and 500 series, not 700, and there's less use of prefixes instead just going for the numbers. Also, if there is a vCache SKU, they are definitely going to market that heavily, including having new branding that ties into the CPU vCache thing in some way.
 
I'm not sure I understand the point of the Vcache option for RDNA3.

Is there actually much scope for performance improvement by adding Vcache on the MCDs? If so, all it would really tell me is that the base model is a bit too bandwidth starved and that AMD skimped too much on the L3. Especially in light of the reduction from 128MB to 96MB for their flagship GPU.

I mean, we're not talking about some enormous GPU that's just gonna be so incredibly expensive to produce that they had to desperately save on die space. They're using an older, cheaper process for the MCD's and the overall amount of silicon is pretty reasonable for a high end product. There would clearly have been scope for using slightly larger MCD's if 96MB of L3/IC would still leave the product fairly bandwidth starved.

Just not very keen on this idea that they might be 'gimping' their normal flagship part in order to charge an extra premium for a Vcache variant.
 
Supposedly AIBs have working cards. They haven't even leaked any performance numbers.

AMD seems to have tightened things up a great deal. I wonder if AMD has given AIBs special software to allow them to do stress tests for thermals and power without running any non-AMD graphics code?

Can we presume that Navi 31 uses a heatspreader, like those seen on Ryzen? The heatspreader is required to deal with the chiplets to protect them from mechanical damage and provide a known thermal solution.

If there is a heatspreader, then this means the AIBs can't even demount the heatsink and provide die (package) shots...
 
Another thing to say is that this seems to be "Infinity Cache 2.0", that is, it's very likely they changed something and made it more efficient, so direct comparison between Navi21 IC and Navi31 IC amount may be misleading. Also, it depends on how the cache will be used. In IC 1.0 the biggest selling point was the bandwidth amplification, in IC 2.0 we can see also something more than that, and if it will be used more for compute related tasks we could see some applications taking more advantage on it. But, let's see what AMD did here. If they really made a version with stacked cache, it means that there should be some advantages, even if it's a few % of added performance.
 
Can we presume that Navi 31 uses a heatspreader, like those seen on Ryzen? The heatspreader is required to deal with the chiplets to protect them from mechanical damage and provide a known thermal solution

No, see Vega with the HBM there you also did not need any Heatspreader. They are basically used on CPUs because you have to install a cooler. GPUs come with them pre-installed so no risk of damaging the die by the user.
 
By this school of logic, is AMD somehow "gimping" all their Zen 3 CPU SKUs so that 5800X3D can exist?
The Vcache chip for the 5800X3D represents a very significant percentage increase in silicon usage.

The alternative - adding a much larger L3 cache on the base chiplet - would be even less space efficient.

So it makes sense here. The performance potential unlocked is also very obvious.

For Navi 31, it's quite different. An extra 8MB of L3 per MCD would not represent a huge percentage increase for silicon needed. And again, it's all on the cheaper process compared to the compute die.

I mean, AMD has reduced the Infinity Cache on Navi 31 from 128MB to 96MB. Doing this backstep in specs would be fine if 96MB is sufficient, but if it's not(as in the situation being talked about here), then such a backstep feels an awful lot like they're just not producing a great product like they could, out of cheapness.
 
Status
Not open for further replies.
Back
Top