Nvidia Post-Volta (Ampere?) Rumor and Speculation Thread

Status
Not open for further replies.
Original Performance is 1.0.
An increase of 70% means: 0.70 more in addition to the original performance of 1.0, so you're at 1.70.

Correct^
In that, you add them BOTH, to get the SUM = (x 1.7). We are talking only about the 70%. (Not the sum of both.)
So 70% = x 0.70

Not to mention, he was trying to multiply percentiles. 50% + 50% is different from 50% x 50%. This error was skewing his math and throwing off his calculations. I just wanted to bring this to his attentions.


If you EVER have any doubt, just type the equation in @Wolfram|Alpha. You can just type in words if you want and ask W|A.
 
He is correct. If you have 70% improvement on top of 70% improvement you multiply them.
 
Of course it's already factoring Pascal's improvements, because I'm then adding the relative benefits that 7n/7n+ has over 16FF+ (i.e 1.7x relative more density for 7nm instead of the actual 3.3x more density). I can make it easier tho:

16nm improvements: 1.9x more transistors @ 0.5x power = Pascal 1.7x improvement
7nm improvements: 3.3-3.7x more transistors @ 0.35x power = Ampere 3x improvement

In both cases we are seeing an improvement that is some 10% lower than the full potential of the node. Nothing really specially impressive or unexpected for Ampere. Both achieve the same based on what's available...



No it's not for perf/mm2 because it's been calculated for iso power.

If you're already factoring in Pascals' improvement, firstly we're talking past each other and secondly the rumor is not Ampere's improvement over Maxwell, but Turing.

What I've said is that for the previous node jump, Maxwell->Pascal, nvidia improved their perf/W by 50-70%, not some theoretical number derived from foundry figures for their new process, but for the chips that were released. If a rumor claims 200% perf/W jump for their next node change, Turing->Ampere, then it's doing ~3-4 better than the previous.

Too unlikely to see that in practice which is all that should matter.

I wasn't aware of the 3.3x figure for density for 7nm, that's just obscene. Interesting that AMD's 14nm->7nm transition is only 1.64x

You can clock the 1080/1080Ti/2080Ti to 2000MHz as well. Clocking the 980Ti to 1400MHz will screw it's power efficiency so hard, it's going to be a power hog, this shouldn't be a factor.


RT acceleration is not a one time thing, NVIDIA has showed that it relies on 3 things in Turing, acceleration through RT cores, normal CUDA cores utilization and separation of FP32 and INT32 units, you can clearly see this in the GTX Turing cards, where the 1660Ti achieves near 1080Ti performance running DXR games. RT performance will improve in Ampere through the low hanging fruits of increased TFLOPs and the increase in count of the separated FP/INT units, enhanced RT cores/increased RT cores count are also guaranteed to happen, it's going to be a cumulative effect through all of these factors.

You're confusing two separate things. 980Ti's size and 1080Ti's improvement over it which when clocked to max is around 50%, and I'm not just using 980Ti's highest clocks. And there's no point in mentioning the power usage since it's not under consideration.

As for the RT, I'm not sure why you need to argue when it's obvious that RT would have far better improvements than FP performance.
 
Correct^
In that, you add them BOTH, to get the SUM = (x 1.7). We are talking only about the 70%. (Not the sum of both.)
So 70% = x 0.70

Not to mention, he was trying to multiply percentiles. 50% + 50% is different from 50% x 50%. This error was skewing his math and throwing off his calculations. I just wanted to bring this to his attentions.


If you EVER have any doubt, just type the equation in @Wolfram|Alpha. You can just type in words if you want and ask W|A.

You're wrong dude.

Example:

Node 0 = 1
Node 1 = 1.70 (70% improvement over Node 0)
Node 1 becomes 1 now since it's 100% of itself
Node 2 = 1.70 (70% improvement over Node 1)
Node 2's performance = 1.7 x 1.7 which is 2.89x the original Node 0.

Pretty easy stuff man...
 
You're confusing two separate things. 980Ti's size and 1080Ti's improvement over it which when clocked to max is around 50%, and I'm not just using 980Ti's highest clocks. And there's no point in mentioning the power usage since it's not under consideration.
When you mention the 980Ti highest clocks you have to take into consideration the 1080Ti highest clocks as well, you don't just pick one and discard the other. The 1080Ti is perfectly capably of doing 2000MHz as a fixed permanent clock (even 2.1GHz), doing so will make it maintain it's 70% lead even on a max clocked 980Ti.

Besides, you want to screw the TDP of the 980Ti upwards of 350w and screw it's temps north of 90c just to clock at 1400MHz, be my guest, my point still stands, at the same TDP (of 250w) the 1080Ti beats the 980Ti by 70%, bringing the 980Ti overclocking potential to the discussion is irrelevant and pointless, as I can do the same to any card in any generation.


And there's no point in mentioning the power usage since it's not under consideration.
I am not sure what you are trying to do here exactly. Power is under heavy consideration, it's what the original rumor is fixating on, and it's why I mentioned the 3080Ti to be 70% faster than 2080Ti at the same TDP.
As for the RT, I'm not sure why you need to argue when it's obvious that RT would have far better improvements than FP performance.
Because the RT will not be improved without a significant improvement in FP performance, simple as that.
 
When you mention the 980Ti highest clocks you have to take into consideration the 1080Ti highest clocks as well, you don't just pick one and discard the other. The 1080Ti is perfectly capably of doing 2000MHz as a fixed permanent clock (even 2.1GHz), doing so will make it maintain it's 70% lead even on a max clocked 980Ti.

Not really commenting on how much 1080Ti beats 980Ti in general, it definitely is a huge upgrade, but 980Ti was more conservatively clocked out of the box. It is one of better overclockers of nVidia's chips, perhaps ever?

Here are links to Techpowerup's MSI Lighting reviews for both models, which are among the highest performing cards for both chips.

https://www.techpowerup.com/review/msi-gtx-980-ti-lightning/26.html

https://www.techpowerup.com/review/msi-gtx-1080-ti-lightning-z/33.html

Compared to the stock reference model, when max OC'd the Lighting 980Ti performs 35.9% better, whereas similar comparison with the 1080Tis gets 19.2% increase.
 
When you mention the 980Ti highest clocks you have to take into consideration the 1080Ti highest clocks as well, you don't just pick one and discard the other. The 1080Ti is perfectly capably of doing 2000MHz as a fixed permanent clock (even 2.1GHz), doing so will make it maintain it's 70% lead even on a max clocked 980Ti.

Besides, you want to screw the TDP of the 980Ti upwards of 350w and screw it's temps north of 90c just to clock at 1400MHz, be my guest, my point still stands, at the same TDP (of 250w) the 1080Ti beats the 980Ti by 70%, bringing the 980Ti overclocking potential to the discussion is irrelevant and pointless, as I can do the same to any card in any generation.



I am not sure what you are trying to do here exactly. Power is under heavy consideration, it's what the original rumor is fixating on, and it's why I mentioned the 3080Ti to be 70% faster than 2080Ti at the same TDP.

Because the RT will not be improved without a significant improvement in FP performance, simple as that.

I'd like to reply but I'm not seeing this conversation go anywhere. I'd just repost my comment I made which was NOT a reply to either you or to Benetanegia,

Pascal saw 50-70% perf/W improvement over Maxwell, though the latter was not as clocked high with turbo boosting. 3x perf/W would be 4-6 times as good as that and highly unlikely.

With 2080Ti's size, nvidia would have issues with surpassing its performance like they did with 1080 vs. 980Ti where 980Ti ref. was ~1200MHz. RT improvements should be much easier and 4k 60fps with RTX on and no DLSS crutch can be a huge seller.

I'd correct in the above that the 3x pert/W for Ampere would be 3-4 times as good as the previous node change for nvidia.

And that I wasn't aware of the density jump from 16nm to 7nm/+ for TSMC, thinking that it'd be similar from 28nm to 16nm.
 
All this math conversation is nice but as a lurker with a bit of industry insider, Ampere will benefit from the highest node improvement ever of the last decade. From 16-12nm to 7nm EUV, you can argue all night long, but when did we get a >3 times density uplift coupled with more than 50% power decrease ? I don't remember.
What I know is that Ampere had a tough gestation with multiple goals to accomplish on graphics, compute and AI. All the 3 departments were "fighting" to get more transistors allocated to their needs. Lot of internal politics where the AI guys want their dedicated silicon to not get their lunch eaten by the hungry AI startups... and Intel.. I even heard some fears in Nvidia management talking of Ampere as Jack of all trades, master of none.
With AI market maturing and dedicated silicon available, Nvidia dominance is at risk. That's why they have AI accelerator ready to spin in case of Ampere not being competitive enough in this field.
Very interesting times ahead...
 
If you're already factoring in Pascals' improvement, firstly we're talking past each other and secondly the rumor is not Ampere's improvement over Maxwell, but Turin

And what I did was not calculating Ampere over Maxwell. What I did was simply calculating a fictional Pascal improvement if it had had a fictional 3.3x density + 0.35x power node available instead of the 1.9x and 0.5x. To calculate Maxwell to Ampere, or rather 28nm to 7nm, I'd have done 1.7 * 3.3 =5.61.

The reason I'm mixing a real number (1.7x) with those of the foundry, is because we are talking about potential so some sort of reference is needeed. Sure most of the times, actual improvements would be below those of the process, but there's times when they are greater (i.e Maxwell).

What I've said is that for the previous node jump, Maxwell->Pascal, nvidia improved their perf/W by 50-70%, not some theoretical number derived from foundry figures for their new process, but for the chips that were released. If a rumor claims 200% perf/W jump for their next node change, Turing->Ampere, then it's doing ~3-4 better than the previous.

16nm improvements: 1.9x more transistors @ 0.5x power = Pascal 1.7x improvement
7nm improvements: 3.3-3.7x more transistors @ 0.35x power = Ampere 3x improvement

Both situations are equally close (10%) to maximizing the node's advantages. How come one (Pascal) is reality and the second is somehow impossible?
 
I get now where you're coming from but as I had replied to pharma, my comment wasn't about what you guys were posting earlier. I'm still wary of using transistors as proxy for performance.

I hadn't looked up the node changes, assuming that it'd be similar to the previous node jump. Now I'm quite curious why AMD's 7nm chips are so bad at transistor density and not even 2x of 14nm at which it was similar to TSMC 16nm. Heat density too much for higher clocks? And would nvidia need to follow suit?
 
I get now where you're coming from but as I had replied to pharma, my comment wasn't about what you guys were posting earlier. I'm still wary of using transistors as proxy for performance.

I hadn't looked up the node changes, assuming that it'd be similar to the previous node jump. Now I'm quite curious why AMD's 7nm chips are so bad at transistor density and not even 2x of 14nm at which it was similar to TSMC 16nm. Heat density too much for higher clocks? And would nvidia need to follow suit?

In regards to density, I guess you are calculating based on their reported numbers and personally after the "Bulldozer is 2 billion transistor, not sorry it is actually just 1.2 billion." thing I don't know if you can really pay too much attention. Who knows how they are counting them for each node/chip.

In terms of actual gains, on top of the ones you mention, AMD probably had to pay the price of implementing many things (i.e. variable rate shading, RT) that Turing already has, maybe some Turing doesn't have... Also coupling it with having to fix some "issues" the architecture had, like lack of scaling beyond 64 CUs, if that was really a thing. And who knows what else.
 
I get now where you're coming from but as I had replied to pharma, my comment wasn't about what you guys were posting earlier. I'm still wary of using transistors as proxy for performance.

I hadn't looked up the node changes, assuming that it'd be similar to the previous node jump. Now I'm quite curious why AMD's 7nm chips are so bad at transistor density and not even 2x of 14nm at which it was similar to TSMC 16nm. Heat density too much for higher clocks? And would nvidia need to follow suit?
Both nVidia and AMD have much lower transistor density on 16/14nm than LP designs (roughly half). It’s to be expected that this discrepancy will be extended to and in all likelyhood reinforced further by the move to 7nm.

I would avoid making statements about the densities AMD achieve on 7nm until we have other manufacturers delivering similar products on the node, giving us something real to actually compare.
 
In regards to density, I guess you are calculating based on their reported numbers and personally after the "Bulldozer is 2 billion transistor, not sorry it is actually just 1.2 billion." thing I don't know if you can really pay too much attention. Who knows how they are counting them for each node/chip.

In terms of actual gains, on top of the ones you mention, AMD probably had to pay the price of implementing many things (i.e. variable rate shading, RT) that Turing already has, maybe some Turing doesn't have... Also coupling it with having to fix some "issues" the architecture had, like lack of scaling beyond 64 CUs, if that was really a thing. And who knows what else.

I'm looking at TPU's DB for chips, it's gone from around 25Mt/mm2 to 41Mt/mm2 for AMD's 7nm jump. Not even close to being 2x. Their other issue could be improving clocks.

Both nVidia and AMD have much lower transistor density on 16/14nm than LP designs (roughly half). It’s to be expected that this discrepancy will be extended to and in all likelyhood reinforced further by the move to 7nm.

I would avoid making statements about the densities AMD achieve on 7nm until we have other manufacturers delivering similar products on the node, giving us something real to actually compare.

While you're right about that, but from the numbers I'm seeing, 28nm to 16/14nm led to almost 92% more for nvidia and almost similar for AMD if you look at chips like Tonga vs Vega.
 
While you're right about that, but from the numbers I'm seeing, 28nm to 16/14nm led to almost 92% more for nvidia and almost similar for AMD if you look at chips like Tonga vs Vega.
Yup, the improvement in density 14nm->7nm is significantly less than for AMD than for instance Apple got for their SoCs. Do note though that AMD was using GF for 14nm, not TSMC, so there’s an additional unknown there.

But we just don’t know if this is simply what to expect from a HP design on vanilla 7nm. Unfortunately, I suspect that it is. EUV and designing for lower power should help a bit.

If so, I daren’t make any prognosis for density gains on TSMC 5nm, as that generally seems to be pretty similar to 7nm, but taking a lot more advantage of EUV, but without (as far as I’m aware) doing anything drastic on the basic transistor level, still FF.

If there is anyone around who have a bit more insight into how the changes made to a process to make it better suited to HP applications are likely to scale going to TSMC 5nm, and generally to GAA at 3nm, your input would be much appreciated.
 
I'm looking at TPU's DB for chips, it's gone from around 25Mt/mm2 to 41Mt/mm2 for AMD's 7nm jump. Not even close to being 2x. Their other issue could be improving clocks.

Yeah, I was not casting doubt on the numbers you provided, I was just making the point that those numbers may be based on different ways of reporting the transistor count. In a similar fashion on how they suddenly change power reporting from TBP to TDP or viceversa.
 
All this math conversation is nice but as a lurker with a bit of industry insider, Ampere will benefit from the highest node improvement ever of the last decade. From 16-12nm to 7nm EUV, you can argue all night long, but when did we get a >3 times density uplift coupled with more than 50% power decrease ? I don't remember.
What I know is that Ampere had a tough gestation with multiple goals to accomplish on graphics, compute and AI. All the 3 departments were "fighting" to get more transistors allocated to their needs. Lot of internal politics where the AI guys want their dedicated silicon to not get their lunch eaten by the hungry AI startups... and Intel.. I even heard some fears in Nvidia management talking of Ampere as Jack of all trades, master of none.
With AI market maturing and dedicated silicon available, Nvidia dominance is at risk. That's why they have AI accelerator ready to spin in case of Ampere not being competitive enough in this field.
Very interesting times ahead...

I could see there being tension in terms of business and R&D priorities but silicon budgets for shipping products should be an easier problem to solve. AI and HPC capabilities can be scaled back significantly for lower market segments where raw performance is less important.

Either way Nvidia certainly has the resources to produce a pure gaming focused product line if they need to.
 
Chip design always is a compromise between multiple targets, not an exercise in achieving the densest chips possible. While smaller chips lend themselves more naturally to higher clocks, you can actively design around that by investing more transistors in order to shorten a critical path or such.
 
Speculation: NVIDIA GeForce RTX 3080, RTX 3070 leaked specs: up to 20GB GDDR6 RAM
GA103 will reportedly pack 3840 stream processors, 60 SMs, and 10/20GB of GDDR6 on a 320-bit memory bus. The 10/20GB VRAM option there is an interesting one, as we haven't had a card with 20GB of VRAM before. Moving onto the GA104, we have 3072 stream processors, 48 SMs, and 8/16GB GDDR6 on a 256-bit memory bus -- this will take form in the GeForce RTX 3070.
https://www.tweaktown.com/news/7005...070-leaked-specs-up-20gb-gddr6-ram/index.html
 
Status
Not open for further replies.
Back
Top