AMD: RDNA 3 Speculation, Rumours and Discussion

Status
Not open for further replies.
True. I guess I wasn't really paying much attention to that side of the equation, I was assuming a 50% bump to the interface and 50% bump to the Infinity Cache would have been solid for 4k.
I guess I got hung up on the Infinity Cache hit rate chart. When I was dissecting it, it seemed like 192MB would get up near 75-80% hit rate for 4k which matched the hit rate for Navi21 at 1440p/1080p.
It seemed like moving to 256/384MB cache wouldn't really do that much.

It's hypothetically useful for upscaling/running at 8k, in which case things like the the fp16 accumulation target get huge. The apparent reason FSR2 running twice as fast on Nvidia as AMD is that the 4k target spills out of cache even with 128mb while Nvidia just has the bandwidth. I wouldn't expect this size of cache for anything other than the super expensive top end.
 
True. I guess I wasn't really paying much attention to that side of the equation, I was assuming a 50% bump to the interface and 50% bump to the Infinity Cache would have been solid for 4k.
I guess I got hung up on the Infinity Cache hit rate chart. When I was dissecting it, it seemed like 192MB would get up near 75-80% hit rate for 4k which matched the hit rate for Navi21 at 1440p/1080p.
It seemed like moving to 256/384MB cache wouldn't really do that much.
It's a 1st generation novelty. These hit rates can be an indicative baseline for how RDNA 2 hypothetically scales, but I am fairly certain that there would be incremental hitrate improvements planned besides hypothetical capacity increase, especially since it will contribute to overall lower pJ/bit.

For example, GFX11 shaders will gain the ability to control Infinity Cache (MALL) write allocation behaviour on a per-instruction basis. On GFX10, this can only be controlled through a PTE (page table entry) bit, which is not as flexible.
 
NAVI 31 die size

The upcoming Navi 31 GPU reportedly features 350 mm² graphics die, significantly smaller than the rumored figures from last year.With an estimated GCD size at 350 mm², and four MCDs at 40 mm² each, the full design could measure 590 mm² which is almost as big as the rumored NVIDIA AD102 GPU (~600 mm²).
 
I wonder how much variability there is in MCD dies, vs GCD dies? Especially in terms of what speed they will run at.
if there is a lot of variability in MCD dies, then they can bin a lot of those MCD's very well.

Eg. Bundle the slower ones for slower cards with slower GDDR needs.
I'm still hoping we end up getting a monster config with multiple GCD as well.

It seems kinds pointless to go for such a small GCD die, if they are moving to a chiplet style arch.
Not taking advantage of being able to make much bigger compute dies seems silly....

I'm not sure if Navi3x gen is the GPU version of zen1 or zen2!
 
I wonder how much variability there is in MCD dies, vs GCD dies? Especially in terms of what speed they will run at.
if there is a lot of variability in MCD dies, then they can bin a lot of those MCD's very well.

Eg. Bundle the slower ones for slower cards with slower GDDR needs.
I'm still hoping we end up getting a monster config with multiple GCD as well.

It seems kinds pointless to go for such a small GCD die, if they are moving to a chiplet style arch.
Not taking advantage of being able to make much bigger compute dies seems silly....

I'm not sure if Navi3x gen is the GPU version of zen1 or zen2!
Memory chips are still gonna be separate. The MCD's are just L3 cache + memory controllers.

But yes, if all this information turns out correct, I dont think Navi 31 is gonna be quite as competitive with Nvidia as people were thinking before. The benefits for such a chiplet approach would seem to be entirely yields/costs rather than anything to do with performance. Perhaps AMD is worried about optimizing the amount of GPU's they can produce with the limited capacity they have ordered from TSMC, given that they need to share that with their CPU's.

This would have no real equivalent with Zen. It's not monolithic like Zen 1, but there's no big compute die performance scaling potential like with Zen 2.
 
That is not really exact, the effect can be both in costs/yields and performance , it will depend on how many 5nm xtors they are willing to dedicate to the compute part. And a 5nm 350+mm^2 "pure compute" 5nm die is not exactly small, and I wonder if really the transistor number dedicated to shaders and such are really so different between the two architectures, given that a lot of area in AD102 (rumored 600mm^2 )will be dedicated to the upgraded cache and memory system.
 
That is not really exact, the effect can be both in costs/yields and performance , it will depend on how many 5nm xtors they are willing to dedicate to the compute part. And a 5nm 350+mm^2 "pure compute" 5nm die is not exactly small, and I wonder if really the transistor number dedicated to shaders and such are really so different between the two architectures, given that a lot of area in AD102 (rumored 600mm^2 )will be dedicated to the upgraded cache and memory system.
It's impossible to know with absolute certainty, but I think it's not unreasonable to think if the compute die is only 350mm², they could have feasibly made the same overall configuration with the same(or even slightly better?) performance as one die that is very safely within the reticle limits for TSMC 5nm. 192MB of L3 + twelve 32-bit memory controllers couldn't be more than what, 200-250mm² at max?

As for competitiveness with Nvidia, I just mean in comparison to what we had been hearing earlier, where there were some expectations based on the rumors/specs floating around that would have enabled AMD to likely take the performance lead. I think it may be possible Navi 31 is up near AD102, but I'd also be surprised if Nvidia doesn't hold the general performance crown.
 
Do we know what node the MCD's will be fabbed on? I doubt those need to be on the more expensive 5 nm node. Hell, do we even know if the MCDs will be fabbed at TSMC?

Regards,
SB
 
Do we know what node the MCD's will be fabbed on? I doubt those need to be on the more expensive 5 nm node. Hell, do we even know if the MCDs will be fabbed at TSMC?

Regards,
SB
Oh I seriously doubt they're using 5nm for those. But that would just further prove the point that the approach here is entirely about improving yields/costs rather than anything to do with improving performance. They could have done all this monolithic just fine if they wanted to.
 
It's impossible to know with absolute certainty, but I think it's not unreasonable to think if the compute die is only 350mm², they could have feasibly made the same overall configuration with the same(or even slightly better?) performance as one die that is very safely within the reticle limits for TSMC 5nm. 192MB of L3 + twelve 32-bit memory controllers couldn't be more than what, 200-250mm² at max?

As for competitiveness with Nvidia, I just mean in comparison to what we had been hearing earlier, where there were some expectations based on the rumors/specs floating around that would have enabled AMD to likely take the performance lead. I think it may be possible Navi 31 is up near AD102, but I'd also be surprised if Nvidia doesn't hold the general performance crown.
That may or may not be true, especially if there will be verisons with 384MB of IC. It may also be possible that the 350mm^2 die is N32 and not N31, who knows, as these are leaks and it was not uncommon in the past to have wrong rumors going on, especially after AMD went for the "all secrecy" way.
 
Do we know what node the MCD's will be fabbed on? I doubt those need to be on the more expensive 5 nm node. Hell, do we even know if the MCDs will be fabbed at TSMC?

Regards,
SB
It's been rumoured forever that it'll use 2 different nodes, IIRC N5+N6 had been popular guess
 
I also don't understand why AMD didn't go for the kill (if rumors are right). You know that practical limit on N5/4 is around 600mm2 for a gaming GPU, before cost explodes in datacenter territory. So why don't make a 600mm2 GDC and 6*40mm2 MCD (840mm2 total) and be sure to land well above AD102 ? In other words, take significantly the performance crown and charge whatever you want for it. In marketing, the rule is simple, people pay whatever for the best, even when it has horrible value for money. It has been Nvidia motto forever and it worked very well... So why AMD?
 
That is not really exact, the effect can be both in costs/yields and performance , it will depend on how many 5nm xtors they are willing to dedicate to the compute part. And a 5nm 350+mm^2 "pure compute" 5nm die is not exactly small, and I wonder if really the transistor number dedicated to shaders and such are really so different between the two architectures, given that a lot of area in AD102 (rumored 600mm^2 )will be dedicated to the upgraded cache and memory system.
Between the total area being smaller, some of it being 7nm, and extra IO area for inter-chiplet communication, it'd be very impressive if Navi 31 matched AD102. Assuming all of the chip size rumours are true, naturally.
 
What kind of extra I/O N31/N32 will have area depends on how the packaging is done (Stacking, etc.). I will be surprised if that is really so significant.
 
That may or may not be true, especially if there will be verisons with 384MB of IC. It may also be possible that the 350mm^2 die is N32 and not N31, who knows, as these are leaks and it was not uncommon in the past to have wrong rumors going on, especially after AMD went for the "all secrecy" way.
Of course. I'm simply talking in the context of 'if' these rumors are correct.

Navi 31 may actually be a dual GPU tile product still, for all we really know.
 
So why don't make a 600mm2 GDC and 6*40mm2 MCD (840mm2 total) and be sure to land well above AD102 ?
Because doing chiplets for graphics is hard?

Seeing how badly the MI250X scales in many data center and AI applications, it's pretty obvious that chiplets are not ideal for graphics at all, at least not yet. Compute applications are the best case scenario for multi chip processing, yet the outcome is far from optimal there, so for the time being, graphics is not viable at all.
 
Because doing chiplets for graphics is hard?

Seeing how badly the MI250X scales in many data center and AI applications, it's pretty obvious that chiplets are not ideal for graphics at all, at least not yet. Compute applications are the best case scenario for multi chip processing, yet the outcome is far from optimal there, so for the time being, graphics is not viable at all.
MI250X is treated as 2 completely separate GPUs, isn't it?
 
Status
Not open for further replies.
Back
Top