Nvidia Ampere Discussion [2020-05-14]

Pretty interesting video about scaling, cpu limitation, etc, for the 3080 :


I sometimes dont 100% agree (but who cares), but I like what he tried to do with this video.
 
What he did makes sense. I still think it's overlooking that games will change gen to gen, so you don't necessarily want to scale the architecture based on old games. For example, looking at the changes in the instruction caches on RDNA and Ampere compared to GCN. My understanding is short shaders were in part a necessity because of the instruction cache. Longer shaders will mean different gpu behaviour and scaling. Also, I know some people keep trying to downplay it, but the new consoles are designed for wide compute-driven geometry processing instead of the old vertex shader pipeline. Primitive shaders, mesh shaders will both leverage the width of the gpu. It's very likely that we'll see scaling change this gen as more games leverage these baseline features of the new consoles.

Eg, I'd like to see how the nvidia asteroids demo scales across resolutions between the 2080ti and the 3080.
 
Pretty interesting video about scaling, cpu limitation, etc, for the 3080

The data is interesting but the conclusions are really obvious. It's not clear why they thought they needed to make this video. Increasing the pixel load will hit some parts of the hardware harder than others. That's always been the case.

Much of the tech press seems to have tunnel vision when it comes to Ampere's surplus of shader power. They seem to be oblivious to the fact that bandwidth is also extremely important at higher resolutions and is not available in the same abundance.
 
What he did makes sense. I still think it's overlooking that games will change gen to gen, so you don't necessarily want to scale the architecture based on old games. For example, looking at the changes in the instruction caches on RDNA and Ampere compared to GCN. My understanding is short shaders were in part a necessity because of the instruction cache. Longer shaders will mean different gpu behaviour and scaling. Also, I know some people keep trying to downplay it, but the new consoles are designed for wide compute-driven geometry processing instead of the old vertex shader pipeline. Primitive shaders, mesh shaders will both leverage the width of the gpu. It's very likely that we'll see scaling change this gen as more games leverage these baseline features of the new consoles.

Eg, I'd like to see how the nvidia asteroids demo scales across resolutions between the 2080ti and the 3080.

Since UE is going in that direction, we will see wider useage of that kind of processing yes.
 
Since UE is going in that direction, we will see wider useage of that kind of processing yes.

I don't think it has anything to do with UE, really. Lots of games do compute culling already, but it has limitations. Mesh shaders and primitive shaders are both corrections to the limitations of the vertex shader pipeline which has a limited threading model and some other bottlenecks caused by the input assembler and writing out index buffers. The new model should be able to read geometry into cache and feed the raster engines directly, all while leveraging the width of the gpu.
 
Since UE is going in that direction, we will see wider useage of that kind of processing yes.
UE5 is a different beast altogether, with its compute based rasterization approach. It's hard to say how this approach will scale on Ampere at the moment since we don't really know which will be the main limiting part in its execution.
 
NVidia Ampere RTX 30 Series:
GA-102-300 = 3090 @ $1,500
GA-102-200 = 3080 @ $700

GA-104-300 = 3070 @ $500


All the benchamrking aside... how soon before nVidia has it's full product stack out?
 
NVidia Ampere RTX 30 Series:
GA-102-300 = 3090 @ $1,500
GA-102-200 = 3080 @ $700

GA-104-300 = 3070 @ $500


All the benchamrking aside... how soon before nVidia has it's full product stack out?

If the rumors about Turning SKUs being EOLed are correct, they need to get it the stack turned as holiday season opens up unless they want to leave money on the table. Whether they can secure enough capacity for meaningful availibility is an obvious question ATM.
 
Pretty interesting video about scaling, cpu limitation, etc, for the 3080 :


I sometimes dont 100% agree (but who cares), but I like what he tried to do with this video.
I've watched the video finally and both their methodology and conclusions are essentially wrong.
This is a second time they do a completely wrong approach to analysing Ampere, I wonder what's up with that?

Basically any benchmarking sequence can be both GPU and CPU limited at the same time since it consists of different scenes. The more there will be scenes with CPU limitation - the more CPU limited a benchmarking sequence will be, up to 100% being CPU limited. Actual games though are very rarely 100% CPU limited, even in 720p.
This however means that such benchmarking sequence will get progressively more CPU limited on faster GPUs which will affect performance scaling - but won't result in zero scaling at all as some scenes will remain GPU limited, even on 3080, even in 720p possibly.

This is pretty much exactly what their benchmark results are implying, with 3080's 1440p performance nearing that of 2080Ti in 1080p. Of course you'd get worse scaling from there to 4K on 3080 as you'd be a lot more CPU limited in 1440p on it when compared to 2080Ti.
If they wanted to actually show these limitations they should've compared framerate / frametime graphs instead of providing average fps results. Such graphs would easily show portions of benchmarking sequences which are CPU and GPU limited with the former being the same on 3080 and 2080Ti and the latter being some +30% between them. The higher the resolution - the less CPU limited parts there is in a benchmarking sequence and the more the gain of 3080 on 2080Ti is.

Blaming Ampere frontend for low scaling in lower resolutions makes no sense as in this case something prior to Turing (which has arguably a similar FE setup between 2080Ti and 3080) would show what exactly there?
Would a 1080Ti with its 28 TPCs show 67% of performance there compared to 3080? Does that actually happen? What happens on GCN and RDNA cards there in these resolutions which would point to there actually being a frontend bottleneck?
They have no actual data to make such claims really, they haven't shown anything which would back it up - just like it was with their Ampere RT h/w analysis.
 
I dont see a problem with "blaming" the front end. nVidia has not improved it with Ampere. With GA102 they just have a scaling up problem from GA104. Does this matter? Maybe in old or games with a "low" amount of compute workload. But it will help them immensely to scale down Ampere.
GA104 has 6 Rasterizer, 96ROPs, 24 geometry units and ~22TFLOPs within 392,5mm^2. Going down gives nVidia more option and a 250mm^2 die should be easily match a 2070 with 150W or less.
 
Last edited:
I dont see a problem with "blaming" the front end.
The problem is that the "blame" is likely misplaced.

With GA102 they just have a scaling up problem from GA104.
In what way?

GA104 has 6 Rasterizer, 96ROPs, 48 geometry units
GA104 has 24 geometry units actually (one per TPC with each TPC made up of 2 SMs). Which is some 30% less than 2080Ti (34) which it should be on par with.
So again, either they will be on par in 4K only - which is unlikely for this performance level as it's not high enough for 4K and NV's own benchmarks are for 1440p - or Ampere's frontend isn't an issue.
 
I've watched the video finally and both their methodology and conclusions are essentially wrong.
This is a second time they do a completely wrong approach to analysing Ampere, I wonder what's up with that?

Basically any benchmarking sequence can be both GPU and CPU limited at the same time since it consists of different scenes. The more there will be scenes with CPU limitation - the more CPU limited a benchmarking sequence will be, up to 100% being CPU limited. Actual games though are very rarely 100% CPU limited, even in 720p.
This however means that such benchmarking sequence will get progressively more CPU limited on faster GPUs which will affect performance scaling - but won't result in zero scaling at all as some scenes will remain GPU limited, even on 3080, even in 720p possibly.

This is pretty much exactly what their benchmark results are implying, with 3080's 1440p performance nearing that of 2080Ti in 1080p. Of course you'd get worse scaling from there to 4K on 3080 as you'd be a lot more CPU limited in 1440p on it when compared to 2080Ti.
If they wanted to actually show these limitations they should've compared framerate / frametime graphs instead of providing average fps results. Such graphs would easily show portions of benchmarking sequences which are CPU and GPU limited with the former being the same on 3080 and 2080Ti and the latter being some +30% between them. The higher the resolution - the less CPU limited parts there is in a benchmarking sequence and the more the gain of 3080 on 2080Ti is.

Blaming Ampere frontend for low scaling in lower resolutions makes no sense as in this case something prior to Turing (which has arguably a similar FE setup between 2080Ti and 3080) would show what exactly there?
Would a 1080Ti with its 28 TPCs show 67% of performance there compared to 3080? Does that actually happen? What happens on GCN and RDNA cards there in these resolutions which would point to there actually being a frontend bottleneck?
They have no actual data to make such claims really, they haven't shown anything which would back it up - just like it was with their Ampere RT h/w analysis.

I mostly agree with your analysis, but there was one game where performance on 2080Ti was higher at lower resolutions (as in more FPS than 3080) poining to either some driver issue or actual pipeline bottleneck.
I found that interesting ...
 
There's also this rumor for the day:


I do wonder if switching GA10x to N7 will be enough to provide "the next generation product".
It's far more likely IMO that only GA102 will actually "switch", with GA103 being made on N7 from the start. This would allow them to make a "Super" style 30 series refresh next year - which seems unlikely on Samsung's 8N as they are thoroughly power limited in the high end.

I mostly agree with your analysis, but there was one game where performance on 2080Ti was higher at lower resolutions (as in more FPS than 3080) poining to either some driver issue or actual pipeline bottleneck.
I found that interesting ...
Driver shader compiler is likely not as good for Ampere as it is for Turing right now so this could explain such results.
 
Proshop, one of the bigger players in northern Europe (they serve the nordics, germany, austria and poland) has released actual hard numbers:
https://www.proshop.de/RTX-30series-overview

In short they've received so far little over 420 RTX 30 cards total, with over 3700 orders waiting to be filled and only 178 cards coming in from manufacturers at the moment (excluding 3070's from this because there's no 3070 orders yet)
 
Last edited:
Back
Top