AMD: Navi Speculation, Rumours and Discussion [2019-2020]

tEd · Oct 18, 2020

nAo said:
If a 5700XT with 40 CUs and 1.755 GHz game clock has a TDP of 225W (https://www.techpowerup.com/gpu-specs/radeon-rx-5700-xt.c3339), a 72 CUs part with 50% perf/W improvement (at game clock?) would have a 270W TDP. But we’re talking a ~35% higher game clock here.. so perhaps the 255W figure is about the main chip only?

Those 225W are TBP not TDP according to AMD https://www.amd.com/en/products/graphics/amd-radeon-rx-5700-xt
(Scroll all the way down for the specs)

yuri · Oct 18, 2020

Finally, clocks which make sense!

Navi21 XT

Base clock 1450-1500MHz
Game clock 2000-2100MHz
Boost clock 2200-2400MHz

Lower base clock than Navi10 but higher boost. It seems RDNA2 (design + DVFS) makes it possible to scale the frequency to fit the load much better.

https://twitter.com/x/status/1317731737347248129

Jawed · Oct 18, 2020

Theory: large range in clocks hints that big Navi is 160 CUs not 80.

Deleted member 13524 · Oct 18, 2020

There are really no rumors / insiders suggesting 160CUs..

Plus, 4x Navi10 CUs at 2.1GHz wouldn't compete with a 3080/3090, it'd be considerably faster.

trinibwoy · Oct 18, 2020

Jawed said:
Theory: large range in clocks hints that big Navi is 160 CUs not 80.

160 CUs or more SIMDs per CU? Both would be monstrous but I don’t see how the former is possible unless they’ve hit theoretical peak density numbers on 7nm.

DegustatoR · Oct 18, 2020

How much power would a 160 CU chip require?

A1xLLcqAgt0qc2RyMz0y · Oct 18, 2020

160 CU's, 2.4 GHz clocks.

Silly season is in full force.

PSman1700 · Oct 18, 2020

A1xLLcqAgt0qc2RyMz0y said:
Silly season is in full force.

Nothing compared to RAMdisk, 24gb hbm3, zen3 and navi3 (>13.1TF) rumors for a 2020 console at 400/500 bucks. People will speculate in a speculation thread

Rootax · Oct 18, 2020

I'm really curious how amd will market RDNA2 on PC. If the rasterisation perfs are on par with 3080, but RT 2 steps behind, do you sell it same price as 3080 ? How do you communicate about RT in this situation ? In 2021, I don't see myself buying a card which do not allow me to play with RT a little (CyberPunk 2077 and the remaster of W3 are the 2 main game to come for me).

Or, a good scenario would be yes, RT is not at ampere level, but sit between Turing and Ampere ?

trinibwoy · Oct 18, 2020

A good scenario would be RT that kicks Ampere’s ass.

Jawed · Oct 18, 2020

ToTTenTranz said:
There are really no rumors / insiders suggesting 160CUs..

True. AMD's counter-espionage has hit truly spectacular levels though. NVidia has done much the same, though it seems AMD has won this round.

My die analyses of Navi variants are trivial things for NVidia to have done too (and NVidia has had a very long time to do them). I believe this is why NVidia has marketed "value" so heavily, because it doesn't take a rocket scientist to see that GPU compute is uber-cheap now.

Honestly, I'm gobsmacked by the weak compute of the two consoles (16% of the XSX die is for CUs). I see this as a major fail. Or, built-in extreme obsolescence. Perhaps the next console gen will consist of 2 refreshes?

Plus, 4x Navi10 CUs at 2.1GHz wouldn't compete with a 3080/3090, it'd be considerably faster.

Why would 6900XTX with ~41TF (2GHz) be on a completely different level from 3090 at 36TF? Only NVidia is now allowed to have huge amounts of compute?

trinibwoy said:
160 CUs or more SIMDs per CU?

I've contemplated more SIMDs per CU. ALI:TEX could double, sure, but I wonder about LDS space and LDS versus VGPR/lane-mechanics too. To be honest, I have no theory for or against.

If both XSX and PS5 have RDNA 1 CUs, then yes, PC Navi could be a 4-SIMD per CU "monster", being the only "RDNA 2" GPU that has RDNA 2 CUs.

Also, I think a patent that talks about CUs sharing L1 encourages lots of CUs per L1. One thing I haven't been able to work out is whether an L1 is per shader engine or per shader array.

Because an L0 (and LDS) is shared by two CUs in a WGP, the patent should probably be read with WGPs in mind, not CUs. The WGP is the real unit of compute in RDNA, not a CU.

Additionally there are vague rumours saying that RDNA 2 is real RDNA, not the GCN/RDNA hybrid seen in Navi 1x. This could be interpreted to mean that any rumours that talk about Navi 2x CUs should be re-interpreted with WGP replacing CU.

Both would be monstrous but I don’t see how the former is possible unless they’ve hit theoretical peak density numbers on 7nm.

I don't understand what you mean by theoretical peak density numbers and why hitting them is relevant.

A Navi 10, 14 or XSX CU is ~2mm². A 5xxmm² die with ~150mm² of "cache" doesn't make sense to me. No matter how exciting the idea of a solid 128MB lump of last level cache, I can't take it seriously. I believe that a cache is a cache precisely because it's a small, efficient, block of memory.

A1xLLcqAgt0qc2RyMz0y said:
160 CU's, 2.4 GHz clocks.

Silly season is in full force.

We've seen a die that's considerably larger than 500mm². So take your pick:

massive last level cache
lots of CUs, about the same FLOPS as EDIT: GA102
HBM 4096-bit bus plus 512-bit GDDR6 bus
some combination of these

Navi 21 scaled up from Navi 10 with 80 CUs and 4 shader engines is only about 360mm².

I would expect the full die to run at lower clocks.

BRiT · Oct 18, 2020

I think it would be 160 HCUs. Not CUs. Not DCUs. ...

xEx · Oct 18, 2020

nAo said:
180W for Navi 10? If you start there you get 215W for the chip alone, without considering the much higher clock (if true) and everything else on the board. I am not say it’s impossible but perhaps the numbers we got so far are a bit off..

OTOH if 255W is for Navi21 ‘only’ then it start making more sense.

I was refering the the leak that said the 225W was the TGP. iirc the 5700xt have a 225W TGP with 180W TDP.

Tbh if we believe all these leaks we would expect Navi to have a fully functional AI chip that can upscale image from 240p to 4K with 99% Quality consuming 150W....

I will keep myself a little skeptical until I see the final product. I don't care about the Top of the line so for me the price/performance in the mid range is more important and I know AMD will at least have that.

Rootax said:
I'm really curious how amd will market RDNA2 on PC. If the rasterisation perfs are on par with 3080, but RT 2 steps behind, do you sell it same price as 3080 ? How do you communicate about RT in this situation ? In 2021, I don't see myself buying a card which do not allow me to play with RT a little (CyberPunk 2077 and the remaster of W3 are the 2 main game to come for me).

Or, a good scenario would be yes, RT is not at ampere level, but sit between Turing and Ampere ?

I don't think AMD will price Navi based on RT. They don't have the "brand" to do so, it will be slower than NV and only a handful ppl care that much about it. AMD until now market it's RT implementation as "functional" like not losing 50% performance when enabling it so I expect their marketing will go in that direction something like "less brute force but more intelligent and useful" We will see. I am also waiting for CP2077 with RT.

trinibwoy · Oct 18, 2020

Rootax said:
Or, a good scenario would be yes, RT is not at ampere level, but sit between Turing and Ampere ?

Btw I don’t think we’ve seen any data that shows Ampere RT to be any better than Turing. Relative perf hit with RT on vs off is about the same.

Kaotik · Oct 18, 2020

Rootax said:
I'm really curious how amd will market RDNA2 on PC. If the rasterisation perfs are on par with 3080, but RT 2 steps behind, do you sell it same price as 3080 ? How do you communicate about RT in this situation ? In 2021, I don't see myself buying a card which do not allow me to play with RT a little (CyberPunk 2077 and the remaster of W3 are the 2 main game to come for me).

Or, a good scenario would be yes, RT is not at ampere level, but sit between Turing and Ampere ?

IIRC someone calculated that theoretical 80 CU @ 2.x GHz (can't remember exact number) would be clearly above anything NVIDIA has in ray-box but would be slower than 3080 (or was it 2080 Ti? 2080?) in ray-triangle

andermans · Oct 18, 2020

I think for raytracing the reality is going to be that it doesn't depend on raytracing hardware at all, but that the raytracing HW on both sides is probably going to be fast enough that the memory/cache hierarchy and divergency become the bottleneck. Those see stark differences between raytracing and "traditional" workloads in ways that are much harder to optimize for raytracing.

e.g walking the BVH tree for raytracing is going to be way more latency sensitive than most GPU things workloads we've seen and both the tree walk as well as the rays scattering around are horrible for shader divergence.

trinibwoy · Oct 18, 2020

Kaotik said:
IIRC someone calculated that theoretical 80 CU @ 2.x GHz (can't remember exact number) would be clearly above anything NVIDIA has in ray-box but would be slower than 3080 (or was it 2080 Ti? 2080?) in ray-triangle

How would you even begin to calculate that? AMDs patent suggests each intersection engine can do 4 boxes or 1 triangle per clock. Even if you assume that’s true what numbers would you use for Turing and Ampere?

Kaotik · Oct 18, 2020

trinibwoy said:
How would you even begin to calculate that? AMDs patent suggests each intersection engine can do 4 boxes or 1 triangle per clock. Even if you assume that’s true what numbers would you use for Turing and Ampere?

Couldn't find the image, I think it was explained at the time where they got their numbers for GeForces. I'll try to dig it up later today if I have the time, if someone else doesn't have it at hand (I'm pretty sure it was in this particular thread)

trinibwoy · Oct 18, 2020

Kaotik said:
Couldn't find the image, I think it was explained at the time where they got their numbers for GeForces. I'll try to dig it up later today if I have the time, if someone else doesn't have it at hand (I'm pretty sure it was in this particular thread)

If it’s the image I think you’re talking about I thought that was made up nonsense. Nvidia hasn’t shared any details about its RT units.

yuri · Oct 18, 2020

Rootax said:
I'm really curious how amd will market RDNA2 on PC. If the rasterisation perfs are on par with 3080, but RT 2 steps behind, do you sell it same price as 3080 ? How do you communicate about RT in this situation ? In 2021, I don't see myself buying a card which do not allow me to play with RT a little (CyberPunk 2077 and the remaster of W3 are the 2 main game to come for me).

Or, a good scenario would be yes, RT is not at ampere level, but sit between Turing and Ampere ?

RT perf is irrelevant. AMD simply has to be cheaper considering their weaker brand recognition and mainly their lacking SW - utterly horrid last gen driver experience, possibly broken OpenCL, lack of DLSS, CUDA, "driver utils" like Ansel, etc.

AMD: Navi Speculation, Rumours and Discussion [2019-2020]

tEd

Casual Member

yuri

Jawed

Deleted member 13524

Guest

trinibwoy

Meh

DegustatoR

A1xLLcqAgt0qc2RyMz0y

PSman1700

Rootax

trinibwoy

Meh

Jawed

BRiT

(>• •)>⌐■-■ (⌐■-■)

xEx

trinibwoy

Meh

Kaotik

Drunk Member

andermans

trinibwoy

Meh

Kaotik

Drunk Member

trinibwoy

Meh

yuri