AMD: Navi Speculation, Rumours and Discussion [2019-2020]

Status
Not open for further replies.
The ROPs in Navi are tied to SA's. According to the drivers the ratio of WGP's to SA's is identical in Navi 21, which means the number of SA's is doubled from Navi 10. Unless they cut the ROP count per SA for ??? reason, it will have 128 ROPs.
`num_rb_per_se` is halved from 8 to 4 in the scrapped info from firmware binaries for all Navi 2X GPUs. So the RB:SA ratio is implicitly halved given that SE:WGP is unchanged at 1:10 (40 WGPs for Navi 21), unless SE:SA is changed from 1:2 (unlikely?).

This means Navi 21 and 22 will get 4 RBs * 4 SEs and 4 RBs * 2 SEs respectively by that metric, and 64/32 colour ROPs respectively, assuming each RBE still does four 32b pixels/clk.
 
Last edited:
`num_rb_per_se` is halved from 8 to 4 in the scrapped info from firmware binaries for all Navi 2X GPUs. So the RB:SA ratio is implicitly halved given that SE:WGP is unchanged at 1:10 (40 WGPs for Navi 21), unless SE:SA is changed from 1:2 (unlikely?).

This means Navi 21 and 22 will get 4 RBs * 4 SEs and 4 RBs * 2 SEs respectively by that metric, and 64/32 colour ROPs respectively, assuming each RBE still does four 32b pixels/clk.
Ah, I missed that
 
Looks like AMD can't do 3SEs after all,

https://forum.beyond3d.com/posts/2073537/

Cutting down to 72/64CUs might be in store for bridging the gap between N21 and N22.

AMD's rasterizers divide screen space into a checkerboard pattern, where each rasterizer and associated RBEs is solely responsible for handling geometry in a given tile. A bounding box based on the min and max xy coordinates can be used to look up responsibility with 4 rasterizers, and it's significantly simpler or trivial with 2 or 1. That's what made me question rumors positing a change that didn't fit this scheme.

Three isn't so clean, unless striping screen space into vertical regions, but that could more realistically lead to unbalanced distribution of work in scenes with a lot of vertical geometry.
It wouldn't be impossible for AMD to alter things, but there's a lot of power of two assumptions built into things like ROP tiles, rasterizer regions, caches, and exports as well.
 
Well one is quite probably a summary of possible rumors (6Gbytes on a 6700XT? Well no, if they are not aiming at a sub-300$ selling price - with 6Gbytes you are starting to be frame buffer limited in various games, not even mentining the regression in clock when AMD explicitely stated incresing it); the other comes from a dump of an OS. Which one has better chance to be right?
 
XSX does 8 so its safe to assume Navi2x does too.
Where is this number coming from? The HC31 presentation says 116 Gpixel/sec, which is ~64 pixel/clk at 1.825 GHz.

Its block diagram does draw only one RB per shader array, but I would ehh on interpreting that as “1 new RB is the new 2 old RBs”. The diagram is not drawn for precision, especially if you look at L0$ and L2$.
 
Where is this number coming from? The HC31 presentation says 116 Gpixel/sec, which is ~64 pixel/clk at 1.825 GHz.

Its block diagram does draw only one RB per shader array, but I would ehh on interpreting that as “1 new RB is the new 2 old RBs”. The diagram is not drawn for precision, especially if you look at L0$ and L2$.

The Hot Chips presentation shows us that the XSX has 2 Shader engines and 64 ROPS so that's 32 ROPS per SE. This leak shows us that Navi2x appears to have 4 RBE's per SE which means that if the same holds true for XSX, it must have 8 ROPS per RBE for a total of 64.

So unless Navi21 has a different configuration it stands to reason that it's 4 Shader Engines will come with 128 ROPs.
 
This is fun (I can't work out how to link an image posted in a tweet):

https://pbs.twimg.com/media/EisSihKXgAEw4ls?format=jpg&name=large
https://pbs.twimg.com/media/EisSihKXgAEw4ls?format=jpg&name=large
from:


I've not heard of InFO_MS before:

https://www.cieonline.co.uk/cadence...tsmc-info_ms-advanced-packaging-technologies/
https://www.cieonline.co.uk/cadence...tsmc-info_ms-advanced-packaging-technologies/
I'm struggling to understand what this really is. It seems to be a "non-interposer" based chip stacking technology.

Allows heterogeneous integration of different dies
  • Improves performance
  • Reduces power consumption
  • Provides maximum functionality in a smaller form factor to support numerous applications in networking, graphics, mobile communications and networking
from: https://www.cadence.com/en_US/home/solutions/3dic-design-solutions.html
 
Last edited:
This is fun (I can't work out how to link an image posted in a tweet):

https://pbs.twimg.com/media/EisSihKXgAEw4ls?format=jpg&name=large
from:
I've not heard of InFo_MS before:

https://www.cieonline.co.uk/cadence...tsmc-info_ms-advanced-packaging-technologies/
I'm struggling to understand what this really is. It seems to be a "non-interposer" based chip stacking technology.


from: https://www.cadence.com/en_US/home/solutions/3dic-design-solutions.html
https://www.anandtech.com/show/16051/3dfabric-the-home-for-tsmc-2-5d-and-3d-stacking-roadmap

Explanation for most of the terms.

And for InFO_MS:
https://fuse.wikichip.org/wp-content/uploads/2019/07/semicon-2019-tsmc-info_ms.png

Both memory and logic die have fan-out integrated (InFO), and the memory sits directly on the substrate with a "back end" (side by side) connection between the logic and the memory die. No interposer, but the dies are stretched to work with the spacing of the substrate or edge to edge bonding.
 
Last edited:
Wow, so much out of the loop on this stuff. New respect for TSMC, too.

So, the rumoured ~500mm² Navi 21, if usng InFO_MS to work with HBM, could have substantially less than ~500mm² active GPU area...
 
This is fun (I can't work out how to link an image posted in a tweet):

It is, thanks. I had a big laugh looking up if Rambo Cache is something serious, and then finding Raja in front of a flat line performance graph ... ROTFL

Edit: Rambo Cache surfaces in the tweets responses.
 
Wow, so much out of the loop on this stuff. New respect for TSMC, too.

So, the rumoured ~500mm² Navi 21, if usng InFO_MS to work with HBM, could have substantially less than ~500mm² active GPU area...

InFO_MS does not account RAM area as logic area, and by the way the (few) pictures we have and card renders show a package that is quite bigger than Radeon VII (which had 4 HBM dies) and it is on par with Vega 10 package (which was a 495 mm^2 chip with 2 HBM chunks)
 
Status
Not open for further replies.
Back
Top