DavidGraham
Veteran
Yeah, I corrected that before your post.Well not quite 601mm2 for GM200 vs 471mm2 for GP102.
Yeah, I corrected that before your post.Well not quite 601mm2 for GM200 vs 471mm2 for GP102.
With 2080Ti's size, nvidia would have issues with surpassing its performance like they did with 1080 vs. 980Ti where 980Ti ref. was ~1200MHz. RT improvements should be much easier and 4k 60fps with RTX on and no DLSS crutch can be a huge seller.
https://www.techpowerup.com/262592/...to-be-50-faster-than-turing-at-half-the-powerIf utilizing the density alone, NVIDIA can extract at least 50% extra performance that is due to the use of a smaller node. However, performance should increase even further because Ampere will bring new architecture as well. Combining a new manufacturing node and new microarchitecture, Ampere will reduce power consumption in half, making for a very efficient GPU solution. We still don't know if the performance will increase mostly for ray tracing applications, or will NVIDIA put the focus on general graphics performance.
Pascal saw 50-70% perf/W improvement over Maxwell, though the latter was not as clocked high with turbo boosting. 3x perf/W would be 4-6 times as good as that and highly unlikely.
With 2080Ti's size, nvidia would have issues with surpassing its performance like they did with 1080 vs. 980Ti where 980Ti ref. was ~1200MHz. RT improvements should be much easier and 4k 60fps with RTX on and no DLSS crutch can be a huge seller.
The 980Ti size was huge as well, the 1080Ti was almost 3/4 the size of 980Ti and achieved a 75% uplift.
RT improvements don't come from fixed function units alone, they need a big uplift in compute performance as well, 4K60 RTX performance would need a huge increase in TF as well (50% more TF at least).
Turing is not really a new arch, it's an upgraded Volta with RTX, so NVIDIA might feel the incentive to push a new arch to satisfy both gaming and HPC sectors.
Pascal perf/mm2 doubled over Maxwell. Reference GP104 with 317mm^2 is faster than a (reference) GTX980TI with 50W less power. Dont see why would this not possible going from 12(16)nm to 7nm after three years since Volta...
3x is not 4-6 times as good as that. 70% is equal to 1.7x and 3x is just 1.76x times 1.7x.
16FF+ used in pascal offered 1.9x density at some 45-50% lower power, compared to 28nm.
7n is 3.3x density and 65% lower power, compared to 12nm.
That means that if Pascal had enjoyed the same node improvementsl, for the same die size, they could have crammed 70% more transistors at very similar power. So the original 1.7x of Pascal multiplied by 70% larger chip 1.7 * 1.7 = 2.89. So there you have it 3X improvement. We didn't even take into account here that a 70% larger chip would be ludicrously much faster and thus they would more likely than not clock it lower increasing efficiency even further.
The calculation you're doing is already factoring in Pascal's improvement and thus makes it look like nvidia wouldn't need to really do that well to see 3x perf/W.
The second part of your comment is for perf/mm2 and I'm not sure what calculations you're doing there.
3x is not 4-6 times as good as that. 70% is equal to 1.7x and 3x is just 1.76x times 1.7x.
16FF+ used in pascal offered 1.9x density at some 45-50% lower power, compared to 28nm.
7n is 3.3x density and 65% lower power, compared to 12nm.
That means that if Pascal had enjoyed the same node improvementsl, for the same die size, they could have crammed 70% more transistors at very similar power. So the original 1.7x of Pascal multiplied by 70% larger chip 1.7 * 1.7 = 2.89. So there you have it 3X improvement. We didn't even take into account here that a 70% larger chip would be ludicrously much faster and thus they would more likely than not clock it lower increasing efficiency even further.
You can clock the 1080/1080Ti/2080Ti to 2000MHz as well. Clocking the 980Ti to 1400MHz will screw it's power efficiency so hard, it's going to be a power hog, this shouldn't be a factor.980Ti had the disadvantage of not boosting high like the turbo boost 3.0 does for Pascal and newer cards. Clocked to the max, the difference should be around 50%. Similar to 2080Ti and 1080Ti.
RT acceleration is not a one time thing, NVIDIA has showed that it relies on 3 things in Turing, acceleration through RT cores, normal CUDA cores utilization and separation of FP32 and INT32 units, you can clearly see this in the GTX Turing cards, where the 1660Ti achieves near 1080Ti performance running DXR games. RT performance will improve in Ampere through the low hanging fruits of increased TFLOPs and the increase in count of the separated FP/INT units, enhanced RT cores/increased RT cores count are also guaranteed to happen, it's going to be a cumulative effect through all of these factors.As for RT, it was due to my expectation of nvidia optimizing their first implementation of RTX and Ampere should improve it better than what FLOPS improvement over Turing.
Or am I just high..?
70% more:
70% is actually x0.70. (You can then take that .70 value and add it to original to get the summation of original + % more. = x1.70). Naturally, everyone understands x 170% (ie: 1.7 x) and instantly uses that to calculate sum of both.
But we are not calculating sum, just % more. And .70 + .70 <does not equal> 3 wholes. Thus not 3x/3 times. (70% more per node x 70% more per node) = 140% more.. So, 1.4 times, not 2.89 times.
Or am I just high..?
70% <PLUS> 70%. = 140% gain in node improvements.
You are trying to multiply them, instead of adding what each node has done to dies.
You have to multiply them...
3 is not 100 + 50 = +150% above 1
3 is not 1x + 0.5x = +1.5x above 1 that would be 2.5 not 3
3 is 2x * 1.5x = 3x of 1
Seriously... Derailing a thread with this... (I'm posting this here insteam of PM because there's clearly at least 2 users confused by this)
My friend.
You didn't closely read my original post. Please go back and do so.
You are mixing fractions and percentages wrong. In your formulas above, what is 3..?
I think you mean to say 3x an arbitrary value (3x, or x 300%..?) , not the number #3.
Again, simple math:
Node 1 + 70% improvement
Node 2 + 70% improvement
Node 1 + Node 2 improvements = 140% in improvement. (Hence, 1.4x improvement)
Wouldnt that be?
Let N = original performance on Node 0
Let 70% be performance increase from transition to Node 1
Let 70% be performance increase from transition to Node 2 from Node 1
Performance at Node 1 = N + .70 N = 1.7 N
Performance at Node 2 = (1.70 N) + .70 (1.70 N) = 1.70 N + 1.19 N = 2.89 N
I understand now what he is trying to convey, but is failing to recognize, that you do not multiply the %, you add them up.
Coincidentally, TSMC is not claiming a 300% increase in density shrinkage over 2 nodes.... or 2.89x that he is claiming above. They have said that going from 7nm -to- 7nm+ will yield 20% shrink in die space. And from 12nm TSMC to 7nm TSMC is what... another 25% ~ 35% reduction in die space..? Or about 55% node shrinkage from 12nmTSMC to 7nm+TSMC.
Think about what percentile, 2.89x node shrink is..?
Compared to its own 16-nanometer technology, TSMC claims its 7 nm node provides around 35-40% speed improvement or 65% lower power.
~
That means that if Pascal had enjoyed the same node improvementsl, for the same die size, they could have crammed 70% more transistors at very similar power. So the original 1.7x of Pascal multiplied by 70% larger chip 1.7 * 1.7 = 2.89. So there you have it 3X improvement. We didn't even take into account here that a 70% larger chip would be ludicrously much faster and thus they would more likely than not clock it lower increasing efficiency even further.
70% is 0.70x, not 1.7x...!
.70 = 70%
1.7 = 170%