Nvidia Post-Volta (Ampere?) Rumor and Speculation Thread

DavidGraham · Jan 4, 2020

DuckThor Evil said:
Well not quite 601mm2 for GM200 vs 471mm2 for GP102.

Yeah, I corrected that before your post.

troyan · Jan 4, 2020

gamervivek said:
With 2080Ti's size, nvidia would have issues with surpassing its performance like they did with 1080 vs. 980Ti where 980Ti ref. was ~1200MHz. RT improvements should be much easier and 4k 60fps with RTX on and no DLSS crutch can be a huge seller.

Pascal perf/mm2 doubled over Maxwell. Reference GP104 with 317mm^2 is faster than a (reference) GTX980TI with 50W less power. Dont see why would this not possible going from 12(16)nm to 7nm after three years since Volta...

Deleted member 2197 · Jan 4, 2020

They say the 50% performance increase is not impossible due to 7nm node improvements and features. Sounds like there might be some hardware feature performance trade-offs depending on how they play it.

If utilizing the density alone, NVIDIA can extract at least 50% extra performance that is due to the use of a smaller node. However, performance should increase even further because Ampere will bring new architecture as well. Combining a new manufacturing node and new microarchitecture, Ampere will reduce power consumption in half, making for a very efficient GPU solution. We still don't know if the performance will increase mostly for ray tracing applications, or will NVIDIA put the focus on general graphics performance.

https://www.techpowerup.com/262592/...to-be-50-faster-than-turing-at-half-the-power

Benetanegia · Jan 4, 2020

gamervivek said:
Pascal saw 50-70% perf/W improvement over Maxwell, though the latter was not as clocked high with turbo boosting. 3x perf/W would be 4-6 times as good as that and highly unlikely.

With 2080Ti's size, nvidia would have issues with surpassing its performance like they did with 1080 vs. 980Ti where 980Ti ref. was ~1200MHz. RT improvements should be much easier and 4k 60fps with RTX on and no DLSS crutch can be a huge seller.

3x is not 4-6 times as good as that. 70% is equal to 1.7x and 3x is just 1.76x times 1.7x.

16FF+ used in pascal offered 1.9x density at some 45-50% lower power, compared to 28nm.
7n is 3.3x density and 65% lower power, compared to 12nm.

That means that if Pascal had enjoyed the same node improvementsl, for the same die size, they could have crammed 70% more transistors at very similar power. So the original 1.7x of Pascal multiplied by 70% larger chip 1.7 * 1.7 = 2.89. So there you have it 3X improvement. We didn't even take into account here that a 70% larger chip would be ludicrously much faster and thus they would more likely than not clock it lower increasing efficiency even further.

gamervivek · Jan 4, 2020

DavidGraham said:
The 980Ti size was huge as well, the 1080Ti was almost 3/4 the size of 980Ti and achieved a 75% uplift.

RT improvements don't come from fixed function units alone, they need a big uplift in compute performance as well, 4K60 RTX performance would need a huge increase in TF as well (50% more TF at least).

Turing is not really a new arch, it's an upgraded Volta with RTX, so NVIDIA might feel the incentive to push a new arch to satisfy both gaming and HPC sectors.

980Ti had the disadvantage of not boosting high like the turbo boost 3.0 does for Pascal and newer cards. Clocked to the max, the difference should be around 50%. Similar to 2080Ti and 1080Ti.

As for RT, it was due to my expectation of nvidia optimizing their first implementation of RTX and Ampere should improve it better than what FLOPS improvement over Turing.

troyan said:
Pascal perf/mm2 doubled over Maxwell. Reference GP104 with 317mm^2 is faster than a (reference) GTX980TI with 50W less power. Dont see why would this not possible going from 12(16)nm to 7nm after three years since Volta...

I'm not sure about how perf/mm2 would change, but the talk is about perf/W and that was about 50-70% for pascal over Maxwell(different cards doing different efficiency in TPU reviews) and so a 3x perf/W for Ampere is just highly unlikely.

Benetanegia said:
3x is not 4-6 times as good as that. 70% is equal to 1.7x and 3x is just 1.76x times 1.7x.

16FF+ used in pascal offered 1.9x density at some 45-50% lower power, compared to 28nm.
7n is 3.3x density and 65% lower power, compared to 12nm.

That means that if Pascal had enjoyed the same node improvementsl, for the same die size, they could have crammed 70% more transistors at very similar power. So the original 1.7x of Pascal multiplied by 70% larger chip 1.7 * 1.7 = 2.89. So there you have it 3X improvement. We didn't even take into account here that a 70% larger chip would be ludicrously much faster and thus they would more likely than not clock it lower increasing efficiency even further.

I miswrote, it's not 3x improvement, but 2x improvement, which would be ~3-4 times better than the improvement Pascal had over Maxwell. Highly unlikely.

The calculation you're doing is already factoring in Pascal's improvement and thus makes it look like nvidia wouldn't need to really do that well to see 3x perf/W.

The second part of your comment is for perf/mm2 and I'm not sure what calculations you're doing there.

Benetanegia · Jan 4, 2020

gamervivek said:
The calculation you're doing is already factoring in Pascal's improvement and thus makes it look like nvidia wouldn't need to really do that well to see 3x perf/W.

Of course it's already factoring Pascal's improvements, because I'm then adding the relative benefits that 7n/7n+ has over 16FF+ (i.e 1.7x relative more density for 7nm instead of the actual 3.3x more density). I can make it easier tho:

16nm improvements: 1.9x more transistors @ 0.5x power = Pascal 1.7x improvement
7nm improvements: 3.3-3.7x more transistors @ 0.35x power = Ampere 3x improvement

In both cases we are seeing an improvement that is some 10% lower than the full potential of the node. Nothing really specially impressive or unexpected for Ampere. Both achieve the same based on what's available...

The second part of your comment is for perf/mm2 and I'm not sure what calculations you're doing there.

No it's not for perf/mm2 because it's been calculated for iso power.

w0lfram · Jan 4, 2020

Benetanegia said:
3x is not 4-6 times as good as that. 70% is equal to 1.7x and 3x is just 1.76x times 1.7x.

16FF+ used in pascal offered 1.9x density at some 45-50% lower power, compared to 28nm.
7n is 3.3x density and 65% lower power, compared to 12nm.

That means that if Pascal had enjoyed the same node improvementsl, for the same die size, they could have crammed 70% more transistors at very similar power. So the original 1.7x of Pascal multiplied by 70% larger chip 1.7 * 1.7 = 2.89. So there you have it 3X improvement. We didn't even take into account here that a 70% larger chip would be ludicrously much faster and thus they would more likely than not clock it lower increasing efficiency even further.

70% more:
70% is actually x0.70. (You can then take that .70 value and add it to original to get the summation of original + % more. = x1.70). Naturally, everyone understands x 170% (ie: 1.7 x) and instantly uses that to calculate sum of both.

But we are not calculating sum, just % more. And .70 + .70 <does not equal> 3 wholes. Thus not 3x/3 times. (70% more per node x 70% more per node) = 140% more.. So, 1.4 times, not 2.89 times.

Or am I just high..?

DavidGraham · Jan 4, 2020

gamervivek said:
980Ti had the disadvantage of not boosting high like the turbo boost 3.0 does for Pascal and newer cards. Clocked to the max, the difference should be around 50%. Similar to 2080Ti and 1080Ti.

You can clock the 1080/1080Ti/2080Ti to 2000MHz as well. Clocking the 980Ti to 1400MHz will screw it's power efficiency so hard, it's going to be a power hog, this shouldn't be a factor.

gamervivek said:
As for RT, it was due to my expectation of nvidia optimizing their first implementation of RTX and Ampere should improve it better than what FLOPS improvement over Turing.

RT acceleration is not a one time thing, NVIDIA has showed that it relies on 3 things in Turing, acceleration through RT cores, normal CUDA cores utilization and separation of FP32 and INT32 units, you can clearly see this in the GTX Turing cards, where the 1660Ti achieves near 1080Ti performance running DXR games. RT performance will improve in Ampere through the low hanging fruits of increased TFLOPs and the increase in count of the separated FP/INT units, enhanced RT cores/increased RT cores count are also guaranteed to happen, it's going to be a cumulative effect through all of these factors.

Benetanegia · Jan 4, 2020

w0lfram said:
Or am I just high..?

w0lfram said:
70% more:
70% is actually x0.70. (You can then take that .70 value and add it to original to get the summation of original + % more. = x1.70). Naturally, everyone understands x 170% (ie: 1.7 x) and instantly uses that to calculate sum of both.

But we are not calculating sum, just % more. And .70 + .70 <does not equal> 3 wholes. Thus not 3x/3 times. (70% more per node x 70% more per node) = 140% more.. So, 1.4 times, not 2.89 times.

Or am I just high..?

Probably. Or just plain wrong, I wouldn't know.

With simple numbers.

2 is 2x times 1 OR 2 is 200% of 1 OR 2 is +100% above 1
3 is 3x times 1 OR 3 is 300% of 1 OR 3 is +200% above 1

Correct?
Also:
3 is 1.5x times 2 OR 3 is 150% of 2 OR 3 is +50% above 2

Correct?

However:

3 is not 100 + 50 = +150% above 1

Or is it?

w0lfram · Jan 4, 2020

70% <PLUS> 70%. = 140% gain in node improvements.

You are trying to multiply them, instead of adding what each node has done to dies.

Benetanegia · Jan 4, 2020

w0lfram said:
70% <PLUS> 70%. = 140% gain in node improvements.

You are trying to multiply them, instead of adding what each node has done to dies.

You have to multiply them...

3 is not 100 + 50 = +150% above 1
3 is not 1x + 0.5x = +1.5x above 1 that would be 2.5 not 3
3 is 2x * 1.5x = 3x of 1

Seriously... Derailing a thread with this... (I'm posting this here insteam of PM because there's clearly at least 2 users confused by this)

w0lfram · Jan 4, 2020

Benetanegia said:
You have to multiply them...

3 is not 100 + 50 = +150% above 1
3 is not 1x + 0.5x = +1.5x above 1 that would be 2.5 not 3
3 is 2x * 1.5x = 3x of 1

Seriously... Derailing a thread with this... (I'm posting this here insteam of PM because there's clearly at least 2 users confused by this)

My friend.
You didn't closely read my original post. Please go back and do so.

You are mixing fractions and percentages wrong. In your formulas above, what is 3..?
I think you mean to say 3x an arbitrary value (3x, or x 300%..?) , not the number #3.

Again, simple math:
Node 1 + 70% improvement
Node 2 + 70% improvement

Node 1 + Node 2 improvements = 140% in improvement. (Hence, 1.4x improvement)

Benetanegia · Jan 4, 2020

w0lfram said:
My friend.
You didn't closely read my original post. Please go back and do so.

You are mixing fractions and percentages wrong. In your formulas above, what is 3..?
I think you mean to say 3x an arbitrary value (3x, or x 300%..?) , not the number #3.

Again, simple math:
Node 1 + 70% improvement
Node 2 + 70% improvement

Node 1 + Node 2 improvements = 140% in improvement. (Hence, 1.4x improvement)

Sigh. That's simply wrong.

Read last posts with the 1 2 3.

Node 1 improvement is represented by the 2 which is a +100% improvement on 1
Node 2 improvement is represented by the 3 which is a +50% improvement on 2

If we want to know 3's improvement over 1, we do not add 100 + 50, we multiply 2 * 1.5.

I'm appaled...

EDIT: And no on those posts I don't mean an arbitrary 3x value, I mean literally the number 3 is 3x times 1 (DUH!!) and 3 is 1.5x times 2, etc, etc...

BRiT · Jan 4, 2020

Wouldnt that be?

Let N = original performance on Node 0
Let 70% be performance increase from transition to Node 1
Let 70% be performance increase from transition to Node 2 from Node 1

Performance at Node 1 = N + .70 N = 1.7 N
Performance at Node 2 = (1.70 N) + .70 (1.70 N) = 1.70 N + 1.19 N = 2.89 N

Benetanegia · Jan 4, 2020

BRiT said:
Wouldnt that be?

Let N = original performance on Node 0
Let 70% be performance increase from transition to Node 1
Let 70% be performance increase from transition to Node 2 from Node 1

Performance at Node 1 = N + .70 N = 1.7 N
Performance at Node 2 = (1.70 N) + .70 (1.70 N) = 1.70 N + 1.19 N = 2.89 N

Yes and no. I mean no because the second 70% is NOT the improvement of the second node that would be 3.3x, but the relative improvement over the improvement. I literally came up`with 1.7 x or 70% by dividing 3.3x density of 7n by 1.9 density of 16FF+.
It was an attempt to calculate how efficient would Pascal have been if it had a node jump equivalent to "16ff to 7n" instead of (28nm to 16ff).
I can see my mistake now. I should have just posted:

16nm improvements: 1.9x more transistors @ 0.5x power = Pascal 1.7x improvement
7nm improvements: 3.3-3.7x more transistors @ 0.35x power = Ampere 3x improvement

EDIT: BTW that's a complicated way of achieveing the same 1.7 * 1.7 = 2.89 result tho.

BRiT · Jan 4, 2020

Yeah, was just showing all the steps and all the work to some who might not be understanding or following along and why it's not just addition but multiplication. Personally, I did direct improvement formula: 1.7 * 1.7 * N

w0lfram · Jan 4, 2020

I understand now what he is trying to convey, but is failing to recognize, that you do not multiply the %, you add them up.

Coincidentally, TSMC is not claiming a 300% increase in density shrinkage over 2 nodes.... or 2.89x that he is claiming above. They have said that going from 7nm -to- 7nm+ will yield 20% shrink in die space. And from 12nm TSMC to 7nm TSMC is what... another 25% ~ 35% reduction in die space..? Or about 55% node shrinkage from 12nmTSMC to 7nm+TSMC.

Think about what percentile, 2.89x node shrink is..?

Benetanegia · Jan 4, 2020

w0lfram said:
I understand now what he is trying to convey, but is failing to recognize, that you do not multiply the %, you add them up.

Coincidentally, TSMC is not claiming a 300% increase in density shrinkage over 2 nodes.... or 2.89x that he is claiming above. They have said that going from 7nm -to- 7nm+ will yield 20% shrink in die space. And from 12nm TSMC to 7nm TSMC is what... another 25% ~ 35% reduction in die space..? Or about 55% node shrinkage from 12nmTSMC to 7nm+TSMC.

Think about what percentile, 2.89x node shrink is..?

Sigh...

https://en.wikichip.org/wiki/7_nm_lithography_process#N7

Compared to its own 16-nanometer technology, TSMC claims its 7 nm node provides around 35-40% speed improvement or 65% lower power.

In terms of density, N7 is said to deliver 1.6x and 3.3x improvement compared to N10 and N16 respectively.

As for the math, I give up, I shouldn't be doing the job that your math teacher clearly failed to do...

w0lfram · Jan 5, 2020

Benetanegia said:
~
That means that if Pascal had enjoyed the same node improvementsl, for the same die size, they could have crammed 70% more transistors at very similar power. So the original 1.7x of Pascal multiplied by 70% larger chip 1.7 * 1.7 = 2.89. So there you have it 3X improvement. We didn't even take into account here that a 70% larger chip would be ludicrously much faster and thus they would more likely than not clock it lower increasing efficiency even further.

70% is 0.70x, not 1.7x...!

.70 = 70%
1.7 = 170%

BRiT · Jan 5, 2020

w0lfram said:
70% is 0.70x, not 1.7x...!

.70 = 70%
1.7 = 170%

Original Performance is 1.0.
An increase of 70% means: 0.70 more in addition to the original performance of 1.0, so you're at 1.70.

Nvidia Post-Volta (Ampere?) Rumor and Speculation Thread

DavidGraham

troyan

Deleted member 2197

Guest

Benetanegia

gamervivek

Benetanegia

w0lfram

DavidGraham

Benetanegia

w0lfram

Benetanegia

w0lfram

Benetanegia

BRiT

(>• •)>⌐■-■ (⌐■-■)

Benetanegia

BRiT

(>• •)>⌐■-■ (⌐■-■)

w0lfram

Benetanegia

w0lfram

BRiT

(>• •)>⌐■-■ (⌐■-■)

Similar threads