AMD: Navi Speculation, Rumours and Discussion [2019-2020]

Status
Not open for further replies.
Sadly GPUs almost never scale 2x, especially when they get very big.

Regardless, when AMD mentioned the 50% power efficiency uplift on RDNA2 they were definitely talking about Big Navi (which they mentioned by name).
So if it's not a 300W GPU with 25% higher performance than the 2080 Ti, then it's a 250W part with ~5% higher performance.

I don't think AMD is launching what they'd call "Big Navi" with a 225W TDP like Navi 10 XT.
 
He said "over the 2080 Ti", not over nvidia's next-gen offering.
It's not hard to assume a 80 CU Navi 2x will be 20-30% over the 2080 Ti. Just do Navi 10 x2 that's where you stand.
I know what he wrote, no need to get overexited. :)

I did this fun exercise of yours with Vega56/64 and RX 5700/5700XT respectively, using the TFlops-data from Techpowerup and the percentages from your screenshot.

Vega 56 -> 64: +20 % TFlops, +9 % performance (I say again: in the screenshot you posted)
RX 5700 -> 5700 XT: +23 % TFlops, +13 % Performance. (I say again: in the screenshot you posted)
edit: Don't think it's an AMD bashing:
RTX 2070 -> 2080 Super: +49 % TFlops, +32 % Performance (I say again: in the screenshot you posted)
please ignore the above, I meant to use the 2070 Super als based on TU104:
RTX 2070 Super -> 2080 Super: +23 % TFlops and +15 % performance


At the same time, assume AMD's own +50% power efficiency numbers and you get Navi 10's 225W * 2 / 1.5 = 300W.
So a 300W graphics card that beats the 2080 Ti by ~25% is just within AMD's promise.
I'll take all of these gladly, once they manifest. Hey, I pay for my graphics card too and I would love stiffer competition and lower prices as well as the next dude.

Ampere is the real wild card here though. GA100's clocks aren't anything to write home about, when compared to GV100 and GP100, but it might not be representative of their consumer lineup.
I couldn't care less about clock speeds as single number on a sheet of (virtual) paper. To the contrary: Usually, seemingly underwhelming clocks can indicate that an arch is not pushed to or beyond it's breaking point.
 
Last edited:
I know what he wrote, no need to get overexited. :)

I did this fun exercise of yours with Vega56/64 and RX 5700/5700XT respectively, using the TFlops-data from Techpowerup and the percentages from your screenshot.

Vega 56 -> 64: +20 % TFlops, +9 % performance (I say again: in the screenshot you posted)
RX 5700 -> 5700 XT: +23 % TFlops, +13 % Performance. (I say again: in the screenshot you posted)
edit: Don't think it's an AMD bashing:
RTX 2070 -> 2080 Super: +49 % TFlops, +32 % Performance (I say again: in the screenshot you posted)
It all comes down to what you decide to pick as comparison points
For example, using TechPowerUp data:
5500 XT > 5700 XT: +87% TFLOPS, +90% performance
or if you want even prettier picture you could pick 5500 XT > 5700: +53% TFLOPS, +68% performance
 
It all comes down to what you decide to pick as comparison points
For example, using TechPowerUp data:
5500 XT > 5700 XT: +87% TFLOPS, +90% performance
or if you want even prettier picture you could pick 5500 XT > 5700: +53% TFLOPS, +68% performance
I was picking1 three points with the least variables (i.e. same underlying chips), as common sense would dictate. And it was not me who brought this kind of math in here.

1 not even picking, those three were the first that came to mind.
 
I was picking1 three points with the least variables (i.e. same underlying chips), as common sense would dictate. And it was not me who brought this kind of math in here.

1 not even picking, those three were the first that came to mind.
I wasn't trying to indicate you would have "picked" them because they support some specific point or any such, just pointing out that they don't necessarily tell whole story (also RTX 2070 > 2080 Super is different chips).
Same underlying chips isn't necessarily the best option either to see how specific architecture scales at least when one's trying to guess the performance of unreleased different chip.
 
I wasn't trying to indicate you would have "picked" them because they support some specific point or any such, just pointing out that they don't necessarily tell whole story (also RTX 2070 > 2080 Super is different chips).

Oh, damn, you're right. I meant to use the upgraded 2070 Super. Just a second.
That's +23 % TFlops and +15 % performance

Same underlying chips isn't necessarily the best option either to see how specific architecture scales at least when one's trying to guess the performance of unreleased different chip.
It at least introduces the least variables given the frame set out earlier in the thread. Apart from that, I let your point speak for itself.
 
Last edited:
I know what he wrote, no need to get overexited. :)
I'm not excited.. Are you?


I did this fun exercise of yours with Vega56/64 and RX 5700/5700XT respectively, using the TFlops-data from Techpowerup and the percentages from your screenshot.

Vega 56 -> 64: +20 % TFlops, +9 % performance (I say again: in the screenshot you posted)
RX 5700 -> 5700 XT: +23 % TFlops, +13 % Performance. (I say again: in the screenshot you posted)
edit: Don't think it's an AMD bashing:
RTX 2070 -> 2080 Super: +49 % TFlops, +32 % Performance (I say again: in the screenshot you posted)

This is not the fun exercise I did at all.
The data of point I used was not theoretical TFLOPs vs. effective gaming performance. I used power efficiency because that's what AMD has been using for RDNA (or ever since Raja left).



Wjv3A5g.png




For RDNA1 Navi 10, they claimed 50% power efficiency over Vega 10, which is what they delivered:

EZB3IhZ.png
rP15Rda.png


1/0.64 = 1.56 = 56% higher power efficiency for 5700 XT vs. Vega 64, and for 5700 vs. Vega 56


They did not lie about the power efficiency of Navi 10 over Vega 10. I'm not assuming AMD is lying about the power efficiency increase of Navi 2x over Navi 10, but you're free to, obviously.




I couldn't care less about clock speeds as single number on a sheet of (virtual) paper. To the contrary: Usually, seemingly underwhelming clocks can indicate that an arch is not pushed to or beyond it's breaking point.
I wonder if you thought the same of Vega 10's underwhelming clocks vs. Pascal.
Regardless, nvidia has a new 7nm chip of roughly the same die size of its 12nm predecessor and they decreased its clocks despite having a 33% larger power budget.
In fact, according to nvidia's own "virtual paper sheets", power efficiency per theoretical FP32 and FP64 throughput actually decreased, which is another oddity considering the supposed process gains.
I mentioned several times that this could mean nothing to Ampere's consumer GPUs, but someone seems pretty eager to throw nvidia's own official data out the window.
 
From TechPowerUp's conclusion
In our testing, the RX 5700 XT is much more power efficient than anything we've ever seen from AMD, improving efficiency by 30%–40% over the Vega generation of GPUs. With 220 W in gaming, the card uses slightly less power than RX Vega 56 while delivering 30% higher performance. Reduced PSU requirements are not only important for builders on a budget, they also matter a lot for system integrators, to control cost. Our power consumption testing results show that the RX 5700 XT has caught up with NVIDIA's last-generation Pascal architecture in this department. Turing-based graphics cards are still up to around 20% more efficient than the RX 5700 XT. For the Radeon RX 5700 (non-XT), AMD has implemented undervolting with impressive results, reaching parity with even Turing—more on that in our RX 5700 review.
https://www.techpowerup.com/review/amd-radeon-rx-5700-xt/35.html
 
This is not the fun exercise I did at all.
Then I did misinterpret your starting point "It's not hard to assume a 80 CU Navi 2x will be 20-30% over the 2080 Ti. Just do Navi 10 x2 that's where you stand." from which you expanded into power consumption as well.


They did not lie about the power efficiency of Navi 10 over Vega 10. I'm not assuming AMD is lying about the power efficiency increase of Navi 2x over Navi 10, but you're free to, obviously.
I am not assuming anyone lies until I see tangible proof. But I don't buy any single hype-building marketing slide either. I just wait until the product arrives and see

I wonder if you thought the same of Vega 10's underwhelming clocks vs. Pascal.
To be brutally honest, this is my conviction since the days of AMDs K5 and just has been proven time and again since then.
I did even buy (yes, my own money given to a graphics card company) a Vega 56 for my gaming machine - and surely not to troll around forums how disappointed I was, which I wasn't.

Regardless, nvidia has a new 7nm chip of roughly the same die size of its 12nm predecessor and they decreased its clocks despite having a 33% larger power budget.
In fact, according to nvidia's own "virtual paper sheets", power efficiency per theoretical FP32 and FP64 throughput actually decreased, which is another oddity considering the supposed process gains.
I mentioned several times that this could mean nothing to Ampere's consumer GPUs, but someone seems pretty eager to throw nvidia's own official data out the window.
Who would do that? Ampere's book still has some leaves left unturned, I guess.
 
In fact, according to nvidia's own "virtual paper sheets", power efficiency per theoretical FP32 and FP64 throughput actually decreased, which is another oddity considering the supposed process gains.
I mentioned several times that this could mean nothing to Ampere's consumer GPUs, but someone seems pretty eager to throw nvidia's own official data out the window.

Could you point me in the right direction, cause the only official numbers I've seen in regards to power is the TDP itself, which says nothing at all in regards to FP32 and FP64 efficiency.
 
Then I did misinterpret your starting point "It's not hard to assume a 80 CU Navi 2x will be 20-30% over the 2080 Ti. Just do Navi 10 x2 that's where you stand." from which you expanded into power consumption as well.
Care to point out where in that post do I mention TFLOPs numbers, which you used in your comparison?
80 CU Big Navi is simply the CU count that's been up on the rumor mill. You were the first one to bring TFLOPs to the table and CU count isn't enough to determine TFLOPs.

All I wrote was a 300W Big Navi would be up to 30% faster than a 2080 Ti, if Big Navi has a 300W TDP and AMD's claims of a 50% jump in power efficiency were as true as the 50% jump in efficiency they claimed for Vega 10 vs. Navi 10 (which became true).

Could you point me in the right direction, cause the only official numbers I've seen in regards to power is the TDP itself, which says nothing at all in regards to FP32 and FP64 efficiency.
It says FP32 and FP64 throughput in regards to TDP. It also suggests a 1425MHz core clock for the 400W GA100, which is contrasting to the 1455MHz clocks for the 300W GV100.
A lower clock on a similar sized chip with higher TDP, despite the jump to a (supposedly) significantly improved process node, together with a modest increase in FP32 and FP64 throughput.

I could repeat the "this might have nothing to do with consumer Ampere though" disclaimer but somehow that keeps getting ignored...
 
A lower clock on a similar sized chip with higher TDP, despite the jump to a (supposedly) significantly improved process node, together with a modest increase in FP32 and FP64 throughput.
New TCs are very green, and very mean.
I doubt it hits 400W workload power in generic non-GEMM FP32/64/you name it.
I could repeat the "this might have nothing to do with consumer Ampere though" disclaimer but somehow that keeps getting ignored...
Yeah the client Volta3 is a bit less impressive than some (ergo plebbitors) dream of.
Still real solid product so.
 
Care to point out where in that post do I mention TFLOPs numbers, which you used in your comparison?
80 CU Big Navi is simply the CU count that's been up on the rumor mill. You were the first one to bring TFLOPs to the table and CU count isn't enough to determine TFLOPs.
Yeah, my bad. By "80 CU Navi 2x will be 20-30% over the 2080 Ti. Just do Navi 10 x2 that's where you stand." you obviously meant something totally not connected to the number of CUs and frequency (which would be TFlops for instance), which is why you recommended "just to do Navi 10 x2".

All I wrote was a 300W Big Navi would be up to 30% faster than a 2080 Ti, if Big Navi has a 300W TDP and AMD's claims of a 50% jump in power efficiency were as true as the 50% jump in efficiency they claimed for Vega 10 vs. Navi 10 (which became true).
You seem to mistake me for someone who is contending the possibility of this becoming reality. Instead I wrote "After 2+ years and on a full node advantage, I'm really looking forward to your math coming true."
 
Last edited:
It says FP32 and FP64 throughput in regards to TDP. It also suggests a 1425MHz core clock for the 400W GA100, which is contrasting to the 1455MHz clocks for the 300W GV100.
A lower clock on a similar sized chip with higher TDP, despite the jump to a (supposedly) significantly improved process node, together with a modest increase in FP32 and FP64 throughput.
1410 MHz, FWIW. And what GA100 seemingly has done is investing a large portion of "7nm goodness" into more transistors. They won't switch free of charge, and they won't come free in terms of clock speeds either, considering how tightly they are packed.
 
They did not lie about the power efficiency of Navi 10 over Vega 10. I'm not assuming AMD is lying about the power efficiency increase of Navi 2x over Navi 10, but you're free to, obviously.
AMD measured perf/w on a standard test of The Division 2 running 1440p Ultra details.

Your TPU aggregate perf/w chart is not accurate, it just adds performance numbers to official TDP claims, it's not based on actual measurements. You would have to measure power consumption in each game, then do an aggreggate chart.
1410 MHz, FWIW. And what GA100 seemingly has done is investing a large portion of "7nm goodness" into more transistors. They won't switch free of charge, and they won't come free in terms of clock speeds either, considering how tightly they are packed.
Also, NVLink requires significant power consumption.

V100 SXM2 NVLink operates @1540MHz and 900GB/s HBMs with a TDP of 300w
V100S PCIe operates @1610MHz and 1100GB/s HBM2 with a TDP of 250w

Tesla-V100s-specs.png


The V100S PCI-E shaves 50W of power despite running at faster clocks for the core and memory just because it dumps the NVLink for PCIe.
 
It says FP32 and FP64 throughput in regards to TDP. It also suggests a 1425MHz core clock for the 400W GA100, which is contrasting to the 1455MHz clocks for the 300W GV100.

It says, where? All your claims seem to be referencing the specs sheet, where we can see various throughput numbers and TDP. But it is you who's making the link 19.5 TFlops @ 400w, or so it seems. If not, if you've seen it somewhere, that's what I'm asking for. Personally, I've yet to see that TDP linked to any specific task, but it's immediately obvious to me that it probably refers to the TDP required for the 320 TFlops in tensor cores, which is a 2.5x increase over Volta. There's literally no logical reason to believe that Ampere FP32 "cores" are somehow less efficient despite a new node, while at the same time Ampere Tensor Cores are >2x as efficient, while also providing much more functionality at the same time.

EDIT: Basically V100 did 16 FP32 TFlops and 130 TC FP16 Tflops. 130 / 16 = 8 times more.
A100 does almost 20 and 320, an 16x times more. If the TCs where not the most consuming units in V100, they most definitely are in A100.

I could repeat the "this might have nothing to do with consumer Ampere though" disclaimer but somehow that keeps getting ignored...

It's probably ignored because it's irrelevant until it is resolved whether or not AI/HPC has anything to "fix". I don't think there's anything suspect or out of place at all with GA100's "normal FP32 cores", so why would I discuss about "something" being "different" in consumer Ampere.
 
Last edited:
Oh wow, I would really like to see the thought pattern which leads one to compare 5700 XT to Vega 56 rather than 64 in conclusions. If it was 5700, I'd understand, but it's not.
Seems reasonable if you're trying to do a perf comparison at roughly iso-power.

5700XT = 225W
Vega56 = 210W
Vega64 = 295W

5700XT vs. Vega56 is a 7% difference. Vega64 vs. 5700XT is a 31% difference.
 
Status
Not open for further replies.
Back
Top