Nvidia Post-Volta (Ampere?) Rumor and Speculation Thread

Status
Not open for further replies.
So TU102 -> GA102:
72 SMs 84SMs
+15% IPC
= 33.4% more performance

Clockspeed has to make up the rest of the difference to reach 40%.

Coupled with the shift to Samsung 10nm and I'm feeling rather underwhelmed by these rumors. Hoping it's FUD.
 
For equivalent tiers. 3080 being 50% faster than 2080ti is a pipe dream
The 3080Ti being only 40% faster than 2080Ti is hogwash, this is a new node we are talking about. Even AMD was able to do much more. Heck, even NVIDIA achieved that with Turing on the same node as Pascal already.

We don't need rumors and suspicious sources to tell us about next gen, we only need to look at next HPC chips and work our way from there.
 
Last edited:
The 3080Ti being only 40% faster than 2080Ti is hogwash, this is a new node we are talking about. Even AMD was able to do much more. Heck, even NVIDIA achieved that with Turing on the same node as Pascal already.

We don't need rumors and suspicious sources to tell us about next gen, we only need to look at next HPC chips and work our way from there.
AMD was only able to because of the huge clock jump that Nvidia has already exhausted. It seems unlikely that we will see another Maxwell to Pascal clockspeed jump.
 
The 3080Ti being only 40% faster than 2080Ti is hogwash, this is a new node we are talking about. Even AMD was able to do much more. Heck, even NVIDIA achieved that with Turing on the same node as Pascal already.
Yeah, by growing die size by 60% (yes, they added new things too, but they don't take much of it)
 
The 3080Ti being only 40% faster than 2080Ti is hogwash, this is a new node we are talking about. Even AMD was able to do much more. Heck, even NVIDIA achieved that with Turing on the same node as Pascal already.

We don't need rumors and suspicious sources to tell us about next gen, we only need to look at next HPC chips and work our way from there.

The design choices for performance gains could be driven by the economics as well. If this rumor is true, it means that they are going with a cheaper more economical design. Samsung 10nm should be cheaper than 7FF, but that also comes at a price of density and performance which will be worse than 7FF. Samsung 10nm to me says, they're not going to push the price much higher than the 20xx series. (In fact I think in this potential world economy, it would probably be stupid to go for an even higher price).
 
AMD was only able to because of the huge clock jump that Nvidia has already exhausted. It seems unlikely that we will see another Maxwell to Pascal clockspeed jump.
NVIDIA didn't need a clock speed increase for Turing. Also AMD merely increase the clocks by 300MHz to 400MHz, this is anything but huge. Huge would be Maxwell to Pascal of 1GHz to 1.7/1.8GHz.

Yeah, by growing die size by 60% (yes, they added new things too, but they don't take much of it)
Yup, which they might do again here. Turing on 10nm/7nm will be smaller in size, they will double it up from there if they needed.

The design choices for performance gains could be driven by the economics as well.
Maybe, but I believe the prime thing here will be to maximize performance as much as possible, especially to distinguish themselves from next gen consoles and or RDNA2. NVIDIA said goodbye to small dies strategy when they integrated Tensor and RT cores and the rest of the new features. They will scale those up to increase performance even further, so bye bye small dies.

Anyway, I still think the HPC chips will tell us everything we need to know.
 
NVIDIA didn't need a clock speed increase for Turing. Also AMD merely increase the clocks by 300MHz to 400MHz, this is anything but huge. Huge would be Maxwell to Pascal of 1GHz to 1.7/1.8GHz.


Yup, which they might do again here. Turing on 10nm/7nm will be smaller in size, they will double it up from there if they needed.


Maybe, but I believe the prime thing here will be to maximize performance as much as possible, especially to distinguish themselves from next gen consoles and or RDNA2. NVIDIA said goodbye to small dies strategy when they integrated Tensor and RT cores and the rest of the new features. They will scale those up to increase performance even further, so bye bye small dies.

Anyway, I still think the HPC chips will tell us everything we need to know.

Maxwell ran well above 1GHz. I had a 980 Ti that did 1550MHz under water.
 
Kepler ran around 1GHz. Maxwell around 1500MHz. Pascal around 2GHz, Turing same. By percentage the increase from Kepler to Maxwell was bigger than that of Maxwell to Pascal.

Erm... no. Stock clocked Kepler boosted to well over 1100 Mhz. Even adverticed boost clocks were around 1100 Mhz in a few models and of course actual boost clocks could be much higher. Again, on stock cards&clocks. There was around a 100 Mhz difference between Kepler and Maxwell and this difference didn't really get much higher with OC.
 
Erm... no. Stock clocked Kepler boosted to well over 1100 Mhz. Even adverticed boost clocks were around 1100 Mhz in a few models and of course actual boost clocks could be much higher. Again, on stock cards&clocks. There was around a 100 Mhz difference between Kepler and Maxwell and this difference didn't really get much higher with OC.

I owned multiple Kepler products, and multiple Maxwell products. Average clock on Kepler (680, multiple 780s) was AROUND 1GHz, as I said. Average clock on Maxwell, across 970, 980, 980 Ti was AROUND 1500MHz.

No one is saying Pascal didn't receive large clock speed boosts over Maxwell. I'm just putting that boost in perspective by providing additional context that Maxwell also saw a large clock speed bump over Kepler.
 
I owned multiple Kepler products, and multiple Maxwell products. Average clock on Kepler (680, multiple 780s) was AROUND 1GHz, as I said. Average clock on Maxwell, across 970, 980, 980 Ti was AROUND 1500MHz.

I don't care how many cards you had. You are either remembering it very badly or lying through your teeth. Stock clocks vs stock clocks, Kepler was around 1100 Mhz and Maxwell around 1200 Mhz. OC under water was something like 1350 Mhz vs 1500.
 
My take on the clock situations is as follows and accounts for whats usually achieved with average ASIC quality and air cooling.

Kepler - 1150 mhz
Maxwell - 1400 mhz
Pascal - 1950 mhz.

Beyond that is usually reserved for higher quality chips and/or more advanced cooling but the majority of GPUs within each family should have no problem hitting the above clocks.
 
Very believable IMO. Hard to see overall performance being more than 50% faster as a best case.

I believe so, too. Nvidia developers already stated when they launched Turing that improved rasterization performance becomes less important to them, so we should only expect moderate improvements from now on. Raytracing otoh can still achieve a huge visual differences, so trying to imrpove on that front makes more sense now.

I'm a bit baffled about the VRAM amount through. I'd have guessed a doubling through the whole product line was in order.
 
I believe so, too. Nvidia developers already stated when they launched Turing that improved rasterization performance becomes less important to them, so we should only expect moderate improvements from now on. Raytracing otoh can still achieve a huge visual differences, so trying to imrpove on that front makes more sense now.

I'm a bit baffled about the VRAM amount through. I'd have guessed a doubling through the whole product line was in order.
You need to improve "rasterization performance" (which is what btw? general purpose FP32 SIMDs aren't tied to rasterization any more than your random memory access controller) to improve ray tracing performance since a lion's share of RT calculations do happen on your ordinary FP32 SIMD units.
 
You need to improve "rasterization performance" (which is what btw? general purpose FP32 SIMDs aren't tied to rasterization any more than your random memory access controller) to improve ray tracing performance since a lion's share of RT calculations do happen on your ordinary FP32 SIMD units.
They did add new features which can be used with rasterization. (Mesh shaders, texture shading stuff etc.)

If they reduce possible limitations of current RT core we might see decent improvement with same amount of RT units and ALUs.
 
They did add new features which can be used with rasterization. (Mesh shaders, texture shading stuff etc.)

If they reduce possible limitations of current RT core we might see decent improvement with same amount of RT units and ALUs.

Isn't that why there are now two FP32 units in every ALU, as this CorgiKitty claims? I'm no expert in microprocessor design, but wouldn't that mean more shading power can now be designated to help RT acceleration without the need to increase CUDA cores proportionally?
 
Status
Not open for further replies.
Back
Top