AMD: Navi Speculation, Rumours and Discussion [2019-2020]

Status
Not open for further replies.
It should be 8 per RBE now for most common 32-bit and lower formats, just as most of the APUs since Stoney Ridge.

That makes the most sense given each rasterizer spits out 16 pixels per clock for a total of 128 on Navi21. Unless of course the rasterizers were scaled back to 8 pixels per clock but that seems unlikely.
 
It should be 8 per RBE now for most common 32-bit and lower formats, just as most of the APUs since Stoney Ridge.

Indeed, why would they halve the number of RBE's per shader engine from Navi10 to Navi20 without compensating elsewhere?

Also the XSX has 64 ROPS with only 2 Shader Engines = 8 ROPS / RBE.

So Navi 21 = 128 ROPS.
 
AMD may double zixel (z buffer) rate per RBE - like it did with RV770.

The rate of render target colour operations is theoretically falling, because of shader complexity (this is why NVidia doubled FP32 ALU rate). The time you need unreal fillrate is for depth pre-pass and shadow buffer rendering, which is zixel rate.

Also, game engines are moving away from deferred rendering, which is a frequently encountered use case for unreal colour fillrate (short shaders writing lots of colour bytes per pixel in the G-buffer pass).

5700XT appears to have far too high colour fillrate for its actual performance in games.

Anyway, these are just my theories.
 
AMD may double zixel (z buffer) rate per RBE - like it did with RV770.

The rate of render target colour operations is theoretically falling, because of shader complexity (this is why NVidia doubled FP32 ALU rate). The time you need unreal fillrate is for depth pre-pass and shadow buffer rendering, which is zixel rate.

Also, game engines are moving away from deferred rendering, which is a frequently encountered use case for unreal colour fillrate (short shaders writing lots of colour bytes per pixel in the G-buffer pass).

5700XT appears to have far too high colour fillrate for its actual performance in games.

Anyway, these are just my theories.

That’s true but there’s also the 4K hype to account for. That’s a pretty significant increase in fillrate requirements.
 
Has there ever been an AMD GPU with more than 64 ROPs?

I'd like to believe Navi 21 has 128 ROPs, but my hopes for 96 or more ROPs from AMD have been dashed, time and time again.
 
That’s true but there’s also the 4K hype to account for. That’s a pretty significant increase in fillrate requirements.

It's certainly a fill rate monster. 2.2Ghz puts it at 72% more fill rate than the 3080. Now we just have to see if the memory bandwidth is enough to back it up. Given the XSX is rocking a 320bit bus though I find it virtually inconceivable that Navi21 will be stuck on 256bit. My money's on 384bit with 18Gbps for 864GB/s.
 
It's certainly a fill rate monster. 2.2Ghz puts it at 72% more fill rate than the 3080. Now we just have to see if the memory bandwidth is enough to back it up. Given the XSX is rocking a 320bit bus though I find it virtually inconceivable that Navi21 will be stuck on 256bit. My money's on 384bit with 18Gbps for 864GB/s.
All the leaks are suggesting 256-bit with 128MB of some sort of cache to compensate for the lower bandwidth. Sounds like how the X360 used its eDRAM or the XBO used its eSRAM. Hopefully, if it is like this, immature drivers/firmware doesn't gimp the performance of the cards.
 
Maybe on blowout sale already, but you can get 5600XT's (which is basically a 5700-ish class of card) for 250 € already. If it's with 8+ GByte and DXR hardware, then ok, fair point.
sounds fine to me. An affordable GPU with decent RT performance would be ideal. Another more of the same GPU makes no sense when you have the 5600XT at that price. In some countries a potential cheap GPU featuring RT would be a winner.

Looks like the rumors from the banned man were rigtt.

40cu, 2.5ghz clock lower power then 5700xt.... That's a nice bolder.....

80cu at 2.2 is impressive as well

32cu on 128bit bus, will be interesting to see

Navi 21 or Sienna Cichlid seems to have 80 CUs or 5,120 SPs, assuming that each CU still carries 64 SPs on RDNA 2. Navi21A silicon shows a boost clock up to 2,050 MHz. The Navi 21B silicon seems to have a 2,200 MHz boost clock. Most interestingly, the power limit varies from 220W to 238W?

Quite promising if you ask me.
 
sounds fine to me. An affordable GPU with decent RT performance would be ideal. Another more of the same GPU makes no sense when you have the 5600XT at that price. In some countries a potential cheap GPU featuring RT would be a winner.


Navi 21 or Sienna Cichlid seems to have 80 CUs or 5,120 SPs, assuming that each CU still carries 64 SPs on RDNA 2. Navi21A silicon shows a boost clock up to 2,050 MHz. The Navi 21B silicon seems to have a 2,200 MHz boost clock. Most interestingly, the power limit varies from 220W to 238W?

Quite promising if you ask me.
That would be the GPU only, not including the RAM, fans and whatever else needs power on the PCB, like RGB etc.
 
Has there ever been an AMD GPU with more than 64 ROPs?
It seems there haven't. Hawaii brought whooping 64 ROPs with 4 SEs back in 2013. After that AMD was stuck at 64 ROPs with all following gens counting Fiji, Vega 10, Vega 20 and Navi 10.

All the leaks are suggesting 256-bit with 128MB of some sort of cache to compensate for the lower bandwidth.
There have been only a single "leak" mentioning the 128MB "cache" so far. The 256b interface was derived indirectly from the drivers.

Using Occam's razor one would say the notation in drivers has changed and doesn't reflect Navi 2. So all the alien tech super-caches go away.
 
Also, game engines are moving away from deferred rendering, which is a frequently encountered use case for unreal colour fillrate (short shaders writing lots of colour bytes per pixel in the G-buffer pass).

Ah shit, better go tell Naughty Dog and Unity they're out of date :p

Really though, while newer weird stuff like deferred texturing and whatever UE5 does will probably show up more and more, I don't see deferred being ditched entirely. Heck now you've got all the layer mixing and deferred decals you want.

Also I read that "FRC" paper you posted. Why, why are thesis written this way, why are they made to be the least understandable and most overwritten paper you can make, whyyy. But anyway, trying to skip through it, while it's a neat idea for reducing fetch latency and l2 miss at the same time, ultimately it just results in more IPC (or OPC, as the author calls it for some reason) if their modelling is correct. Which means, as their own numbers way far down in the paper suggest, that instructions are done faster on average and so even more traffic to main memory is generated, despite the increased hit rate. Absolute opposite of magic cache that makes main memory access go away as a problem.
 
Last edited:
L2 cache bumping up to 128MB is not that unlikely now, having a thought about it. This is considering rumours suggesting Navi 21 being twice as large (~505 mm^2) as Navi 10, while on the other hand still sticking with a 256-bit GDDR6 bus.

It explains why there are very few clues in the drivers, and it could be a motivation of AMD reducing/eliminating L2 flushes and adding GL1 cache in the renewed RDNA memory hierarchy. GL1 cache can replace L2 for the bandwidth amplification role in texture reads, and it does seem a more natural cache level for doing this, especially with its fixed tie to a binning rasterizier and its screen space tiles (forces that contribute to locality).

Although if this were to happen, I would expect clues emerging in the shader compilers by now, with e.g. new optimization passes altering L2 cache policies of memory requests, based on shader type and resource type. Otherwise, commonality of streaming-like accesses in the conventional graphics pipeline is going to work against a way larger L2 cache, which has been small with a write combining focus throughout its GCN lineage, rather than having a decent read hit rate.

Another possible implementation of "128MB cache" is embedded SRAM banks like Xbox One. Combined with the HBCC resurrected from Vega 10, we will have 64KB (?) pages being hot migrated between the eSRAM and the GDDR6 pool. This could also explain a lack of clue in OSS drivers, because in theory the GPU can function solely with the GDDR6 pool, and so the OSS code drop can be delayed.

Edit: More on-chip SRAM could also explain why Navi 22 was rumoured to be 320+ mm2, despite apparently having same amount of CUs and 25% narrower in GDDR6 bus. (assuming that intersection module in TMU has a slim transistor budget)
 
Last edited:
The very same firmware tables show Navi 10 as having a clockspeed of 1400, which is 300 ish MHz under what it actually does. I wouldn't count on the Navi 2x number being an absolute limit. The power of the GPU actually going down between N22 and N10 despite having the same CU count tells me they're not even pushing it as hard.

That said N14 is 1900MHz in those tables, which is actually a reasonable number, so unless we know what the exact function here of the driver we can't say what it means. There's also the likely possibility of RDNA2 itself behaving differently with the same input configuration.
 
The very same firmware tables show Navi 10 as having a clockspeed of 1400, which is 300 ish MHz under what it actually does. I wouldn't count on the Navi 2x number being an absolute limit. The power of the GPU actually going down between N22 and N10 despite having the same CU count tells me they're not even pushing it as hard.

That said N14 is 1900MHz in those tables, which is actually a reasonable number, so unless we know what the exact function here of the driver we can't say what it means. There's also the likely possibility of RDNA2 itself behaving differently with the same input configuration.

Those Navi 10 clocks could coincide with the Radeon Pro GPUs:
https://www.techpowerup.com/gpu-specs/?generation=Radeon+Pro+Mac&sort=generation

Just looking at Navi 10, Apple may then use some kind of multiplier algorithm to control max/base clocks in the 5600M (MacBook Pro), 5700 and 5700 XT (iMac), and W5700X (Mac Pro).

I presume Apple are going in on Navi 21 for a Mac Pro MPX module. The plot thickens... Navi 21, if it's above 2.00GHz, is going to have some very impressive performance.
 
It seems there haven't. Hawaii brought whooping 64 ROPs with 4 SEs back in 2013. After that AMD was stuck at 64 ROPs with all following gens counting Fiji, Vega 10, Vega 20 and Navi 10.

For that reason, I find it very hard to believe Navi 21 has 128 ROPs.

It was widely reported that Radeon VII had 128 ROPs but of course that turned out not to be the case, it was just 64.

For those that are saying Navi 21 has 128, I'd like to know the reasons why, some evidence, etc. Not just rumor.
 
Newegg is listing Radeon RX 6700 XT, 6800 XT and 6900 XT specs in its blog
We have no clue as to why the information is still present at Newegg, but it is, check for yourself. The next AMD flagship would be named RX 6900 XT and would get 5,120 shading processors tied to a base clock speed of 1,500 MHz. This card gets 16 GB of gddr6 memory based on a 256-bit memory bus. So that means 512 GB/s of memory bandwidth and in fact, would be equal to the bandwidth of the RTX 3070. The TDO listed is 300 watts. According to rumors, the product performance sits between the RTX 3070 and the 3080.

Second, to best would be the Radeon RX 6700 XT that would get GB gddr6 and thus a 192-bit memory bus with a power consumption of 200 watts. The base clock speed is listed at of 1,500 MHz, actually for all three cards listed. This card would get 3840 shading processors.

The smallest model is the Radeon RX 6700 XT has 2,560 cores (similar to RX 5700 XT)and sees a power consumption at 150 watts. This card again is tied to a 192-bit wide memory bus and gets 6GB of GDDR6 memory. As to the validity of these specs, hey we know as much as you do. We do think that somebody contributed the content based on speculation or two, and Newegg posted it.
https://www.guru3d.com/news-story/n...-xt6800-xt-and-6900-xt-specs-in-its-blog.html
 
For that reason, I find it very hard to believe Navi 21 has 128 ROPs.

It was widely reported that Radeon VII had 128 ROPs but of course that turned out not to be the case, it was just 64.

For those that are saying Navi 21 has 128, I'd like to know the reasons why, some evidence, etc. Not just rumor.
The ROPs in Navi are tied to SA's. According to the drivers the ratio of WGP's to SA's is identical in Navi 21, which means the number of SA's is doubled from Navi 10. Unless they cut the ROP count per SA for ??? reason, it will have 128 ROPs.
 
Status
Not open for further replies.
Back
Top