AMD: Navi Speculation, Rumours and Discussion [2019-2020]

Frenetic Pony · Jun 5, 2019

BoMbY said:
Not necessarily, simple RT could be implemented without many changes: http://diglib.eg.org/handle/10.2312/hpg.20141091.029-040

Of course dedicated circuits would be better. They could also do it like in the paper, and then reserve a fixed amount of CUs do to RT in parallel to the normal pipeline, or whatever.

What a great paper, it could explain the rumored massive frontend redesign of Navi. Looking it over, I'm not sure Navi would even need the proposed specialty hardware as it already has low precision double rate and quad rate built in. Nvidia already does 4bit and there's no reason AMD can't either (relevant to the proposed BHV update scheme herein). The cache hierarchy redesign that would be needed is already confirmed.

Using rapid packed math low precision would still be slower/larger than the proposal, but would be much more flexible than the tiny fixed function units proposed. GCN has had a great run for consoles by being highly programmable for a 2013 arch, Death Stranding and Doom Eternal look great thanks in part to giving programmers flexibility. What if programmers want to use bounding spheres for better BVH building? Or bounding octahedrons for better tree traversal? Etc. etc.

Not saying that's what Navi has done at all, but it'd be fascinating if they did. The frontend redesign rumored would be a massive change, but could also help with disparate shading from a work distribution perspective. It could also be helpful with the more modern triple A requirements of the GPU, eg running physics and possibly AI and stuff as well as the usual shading operations.

Or it's all completely wrong, ah well, a few days to find out.

w0lfram · Jun 5, 2019

Frenetic Pony said:
High clocks speeds should be expected. Vega was supposed to hit high clockspeeds but failed, causing some exponential power spikage that seems partially independent of silicon node. Hence why Vega 64 missed its clockspeed targets and the Vega arch gets way more efficient at the low clockspeeds found on mobile parts.

Thus any improvements to clockspeed from 7nm can't be inferred from the Vega VII, that things huge powerdraw is due to arch, not the fmax curve of TSMC's 7nm. Going by TSMC's own numbers 7nm vs 16nm should be a huge improvement in TDP efficiency for AMD. Thus the low quoted efficiency improvements could easily be due to maxxing out clockspeed with no regards to powerdraw for a desktop part, a strategy which AMD just recently said sells better. Besides, the quote for 1.5x improvement was "at least" meaning it'll probably get much better for whatever laptop part they have replacing the Vega 10.

Vega is a server compute chip, cut down for gamers.

It was never going to be high clocks and now only worth using it for a reference for Navi (Vega is dead for gaming). Where as Navi was specifically designed for games and high clocks. AMD claimed improvements, does allow for us to speculate, that Navi can reach speeds of 2,000+ Mhz. I havn't heard an argument against it.

Gubbi · Jun 5, 2019

Vega might have fallen short of frequency targets, but the reason AMD had to force clocks past the optimal schmoo point was because of competitive pressure from Nvidia and the very high RAM prices at the time. The high cost memory system meant AMD had to price Vega high. To price it high Vega had to be competitive with Nvidia's offerings, to be competitive they had to clock it high; Since P ≃ f³, Vega ended up with high TDP.

High end GPUs are entirely limited by power consumption, AMD should optimize for power, not frequency. Want more performance ? Add more CUs

Cheers

no-X · Jun 5, 2019

Gubbi said:
High end GPUs are entirely limited by power consumption

Yes.

Gubbi said:
AMD should optimize for power, not frequency.

They need both.

Gubbi said:
Want more performance ? Add more CUs

No. It doesn't work. Take Vega 56 and Vega 64, set the same clockspeed and performance will be the same despite different number of CUs. The same applies for Polaris / Radeon RX 570 / 580. They need to process more pixels per second. One way is higher clock-speed, the other one is wider (more paralel) configuration. There are two well-balanced GCN GPUs: Pitcairn with 20 CUs + 32 pixels per clock and Hawaii with 44 CUs and 64 pixels per clock. Both of them had very low CU / ppc ratio. Adding more CUs without making the design wider worked for compute, but not for gaming. Every design with higher CU / ppc ratio was less efficient in terms of gaming FPS per CU*clock.

Bondrewd · Jun 5, 2019

Gubbi said:
Want more performance ? Add more CUs

Xtors cost money, in a market with a very limited amount of it.

Ike Turner · Jun 5, 2019

no-X said:
Yes.

They need both.

No. It doesn't work. Take Vega 56 and Vega 64, set the same clockspeed and performance will be the same despite different number of CUs. The same applies for Polaris / Radeon RX 570 / 580. They need to process more pixels per second. One way is higher clock-speed, the other one is wider (more paralel) configuration. There are two well-balanced GCN GPUs: Pitcairn with 20 CUs + 32 pixels per clock and Hawaii with 44 CUs and 64 pixels per clock. Both of them had very low CU / ppc ratio. Adding more CUs without making the design wider worked for compute, but not for gaming. Every design with higher CU / ppc ratio was less efficient in terms of gaming FPS per CU*clock.

This can't be repeated enough. GCN as we know it right now is fundamentally aimed at compute workloads where it goes head-to-head & often edges Nvidia's uArch with the same power efficiency. It's when your throw 3D rendering at them that shit starts to hit the fan.

yuri · Jun 5, 2019

Gubbi said:
Vega might have fallen short of frequency targets, but the reason AMD had to force clocks past the optimal schmoo point was because of competitive pressure from Nvidia and the very high RAM prices at the time. The high cost memory system meant AMD had to price Vega high. To price it high Vega had to be competitive with Nvidia's offerings, to be competitive they had to clock it high; Since P ≃ f³, Vega ended up with high TDP.

High end GPUs are entirely limited by power consumption, AMD should optimize for power, not frequency. Want more performance ? Add more CUs

Vega is efficient when you underclock it by ~300MHz. In other words, when you cap it on <80% of its normal performance. That's pretty bad from from the market positioning PoV.

Strangely, Vega's fundamental architectonic goal proclaimed by AMD was improving the frequency scaling. They reportedly spent a majority of Vega's huge transistor budget on that.

Adding more CUs is not gonna work with the current GCN architecture. Go through a few reviews and plot relative FPS gains of Tahiti, Hawaii, Fiji, Polaris 10 and Vega 10/20 compared to their FLOPS gains. FLOPS went through roof with Fiji and Vegas, though the FPS gains didn't follow.

Throwing more FLOPS on games doesn't work. GCN-based stuff featuring <= 4 SEs is behind in terms of fixed-function. It's kinda surprising they hadn't seen it when they were simulating Fiji's configuration.

RDNA featuring a massive fixed-function rework might work. Who knows.

no-X · Jun 5, 2019

Anyway, we expect 2560 SPs, 64 ROPs and 256bit bus for Navi 2019 at ~250-275 mm². Wouldn't it be possible to double the specs for Navi 2020? 5120 SPs, 128 ROPs, 512bit bus or HBM2 at 500-550 mm²? Maybe the die size could be even smaller with HBM2 (smaller interface) and if the front-end configuration doesn't scale and stays the same.

McHuj · Jun 5, 2019

I don't think you'll see a 512-bit bus. I think anything needing above a 384-bit bus will just use HMB. The question is how economical is a 500+ mm die on 7nm. I think it's still too early for that in a consumer device. Navi 2020 will probably be +50% area (still under 400) and maybe any additional gains will come from even higher clocks.

no-X · Jun 5, 2019

+50 % would mean another die with 4096 SPs. Fourth in a row (Fiji, Vega 10, Vega 20, Navi 2020). It's possible, maybe even likely. But it won't stand against Turing. It's competitor will be Ampere. +50 % performance compared to Navi 2019 (GeForce RTX 2070 level) is exactly the same performance as of GeForce RTX 2080 Ti. It's likely, that GeForce RTX 2080 Ti will be replaced by Ampere GeForce RTX 3080. So AMD would - again - position it's biggest die against "biggest-2" die of Nvidia. Relative position of Navi 2020 would stay the same as was with Vega 10 (×GTX 1080) and Vega 20 (×RTX 2080). That would be quite a waste of their early adoption of 7nm process. 2020 and another sub-400mm² die? AMD manufactured 331mm² 7nm die in 2018. Adding ~50mm² in almost 2 years would be a really conservative plan.

Bondrewd · Jun 5, 2019

no-X said:
Adding ~50mm² in almost 2 years would be a really conservative plan.

A plan that's really good for their margins.

Jawed · Jun 5, 2019

no-X said:
Take Vega 56 and Vega 64, set the same clockspeed and performance will be the same despite different number of CUs. The same applies for Polaris / Radeon RX 570 / 580. They need to process more pixels per second. One way is higher clock-speed, the other one is wider (more paralel) configuration. There are two well-balanced GCN GPUs: Pitcairn with 20 CUs + 32 pixels per clock and Hawaii with 44 CUs and 64 pixels per clock. Both of them had very low CU / ppc ratio. Adding more CUs without making the design wider worked for compute, but not for gaming. Every design with higher CU / ppc ratio was less efficient in terms of gaming FPS per CU*clock.

I wonder what happens when this type of comparison is done with the consoles, since console "refreshes" have changed ratios on both PS4 and XBox One...

I suppose the framerate caps in console games make this a more difficult comparison to make...

Kaotik · Jun 5, 2019

no-X said:
Anyway, we expect 2560 SPs, 64 ROPs and 256bit bus for Navi 2019 at ~250-275 mm². Wouldn't it be possible to double the specs for Navi 2020? 5120 SPs, 128 ROPs, 512bit bus or HBM2 at 500-550 mm²? Maybe the die size could be even smaller with HBM2 (smaller interface) and if the front-end configuration doesn't scale and stays the same.

McHuj said:
I don't think you'll see a 512-bit bus. I think anything needing above a 384-bit bus will just use HMB. The question is how economical is a 500+ mm die on 7nm. I think it's still too early for that in a consumer device. Navi 2020 will probably be +50% area (still under 400) and maybe any additional gains will come from even higher clocks.

While it is possible that there will be "Navi 2020", remember that AMD also has promised Next gen for 2020.

Bondrewd · Jun 6, 2019

Kaotik said:
While it is possible that there will be "Navi 2020", remember that AMD also has promised Next gen for 2020.

You can have both.

del42sa · Jun 6, 2019

Bondrewd said:
A plan that's really good for their margins.

perhaps it may be good for their margin, but not good for selling, unless they have some ACE in the slave. If they won´t compete, they won´t sale much. With that kind of thinking they will never be able to compete with Nvidia and they can kiss goodbye their margin. Vicious cycle...

LordEC911 · Jun 6, 2019

McHuj said:
I don't think you'll see a 512-bit bus. I think anything needing above a 384-bit bus will just use HMB. The question is how economical is a 500+ mm die on 7nm. I think it's still too early for that in a consumer device. Navi 2020 will probably be +50% area (still under 400) and maybe any additional gains will come from even higher clocks.

Yep. That's sorta what I was thinking.

1.5x is what I was thinking as well for Navi 20. A little on the large side for a 7nm+ pipecleaner though...
Navi 20 pushes Navi 10 down the stack sometime in 1H '20.
True next-gen (meant to replace Vega 20 in HPC) comes out 2H '20 at +500mm2 and either stacks on top of Navi 20 or, depending on competition, slots in to the same price segment and pushes Navi 20 down.

Edit- Wonder if they decided to move Navi 12 to 7nm+, since we haven't heard anything about it.

Bondrewd · Jun 6, 2019

LordEC911 said:
True next-gen (meant to replace Vega 20 in HPC) comes out 2H '20 at +500mm2 and either stacks on top of Navi 20 or, depending on competition, slots in to the same price segment and pushes Navi 20 down.

Or it won't have a client offering at all.

McHuj · Jun 6, 2019

Kaotik said:
While it is possible that there will be "Navi 2020", remember that AMD also has promised Next gen for 2020.

I agree, I just expect the 2020 chip to be bigger and a high tier product. I think Navi 2019 will be AMD's mainstream consumer chip for the next 2 years at least.

glow · Jun 6, 2019

LordEC911 said:
Yep. That's sorta what I was thinking.

1.5x is what I was thinking as well for Navi 20. A little on the large side for a 7nm+ pipecleaner though...
Navi 20 pushes Navi 10 down the stack sometime in 1H '20.
True next-gen (meant to replace Vega 20 in HPC) comes out 2H '20 at +500mm2 and either stacks on top of Navi 20 or, depending on competition, slots in to the same price segment and pushes Navi 20 down.

Edit- Wonder if they decided to move Navi 12 to 7nm+, since we haven't heard anything about it.

It's a little late for a 7nm pipecleaner, isn't it? AMD already has 2 SKUs based on a relatively large 7nm chip, out since last year.

w0lfram · Jun 6, 2019

no-X said:
+50 % would mean another die with 4096 SPs. Fourth in a row (Fiji, Vega 10, Vega 20, Navi 2020). It's possible, maybe even likely. But it won't stand against Turing. It's competitor will be Ampere. +50 % performance compared to Navi 2019 (GeForce RTX 2070 level) is exactly the same performance as of GeForce RTX 2080 Ti. It's likely, that GeForce RTX 2080 Ti will be replaced by Ampere GeForce RTX 3080. So AMD would - again - position it's biggest die against "biggest-2" die of Nvidia. Relative position of Navi 2020 would stay the same as was with Vega 10 (×GTX 1080) and Vega 20 (×RTX 2080). That would be quite a waste of their early adoption of 7nm process. 2020 and another sub-400mm² die? AMD manufactured 331mm² 7nm die in 2018. Adding ~50mm² in almost 2 years would be a really conservative plan.

?
RDNA is for gaming. All the bits all the transistors are meant for gaming. The RTX2080ti has tons of wasted transistors, meant for science and compute...

Dr Su already said Navi is scalable. So at minimal, big navi will just have more CUs, but I am leaning towards big navi also using HBM3 memory.

AMD: Navi Speculation, Rumours and Discussion [2019-2020]

Drunk Member