GCN has inherent utilization issues which prevented reaching peak throughput without using new programming paradigms.I'm honestly getting Deja Vu from the Fury X days with these arguments.
With Turing, they had reserved unconstrained Tensor throughput for Quadro cards, not 100% sure about Titan though:Regarding the comparison in the white paper for the Titan RTX vs 3090, for AI compute.
In this table the Titan RTX is listed for FP16 Tensor TFLOPS with FP32 Accumulate as having 65.2 TFLOP, where it has actually 130 TFLOP
The question is this by mistake ? And similarly does the 3090 have then 142 TFLOP or the listed 71 TFLOP
I'm curious, can you summarise precisely why GCN has utilisation issues and why Ampere doesn't?GCN has inherent utilization issues which prevented reaching peak throughput without using new programming paradigms.
Ampere doesn't have any such issues [...]
The scaling being less than peak figures here are mostly due to a) FP32 isn't really doubled as you need to consider the INTs which are there on top of peak FP32 in Turing and are now a part of peak FP32 in Ampere - this is a sizeable chunk of in game math, from 25 to 30% of all throughput.
Weak graphics frontend, low single thread performance, issues with state changes creating pipeline bubbles which can only be solved by running either pure compute (hello compute based culling and such tricks of this console gen) or async compute (which fills the bubbles which aren't there in NV h/w in the first place which is why it doesn't benefit from it as much). None of this exist in Ampere.I'm curious, can you summarise precisely why GCN has utilisation issues and why Ampere doesn't?
Why is that? Do all those who say that Ampere is scaling badly in games "disentangle" all these from compute while they say this?When you're explaining this, you must disentangle compute from rasterisation/bandwidth/TEX/cache-hierarchy/ROPs/work-distribution/load-balancing.
Ampere doesn't benefit from DX12 and Vulkan as much as you think (?), it just needs them to access its new h/w - but this is an API level decision not a h/w limitation. You can hit Ampere peak math utilization in DX11 and OpenGL just fine, you just can't access RT and TCs from them.Careful: DX12 and Vulkan both introduced "new programming paradigms" which Ampere benefits from (thanks, AMD) so your explanation needs to be based upon DX11 or earlier APIs.
All of these sound like very generic statements. I doubt in reality anyone but NVIDIA actually knows how easy or hard it will be to utilize those resources, not until devs get to grips with Ampere, and anything we say right now is wishful thinking at best. The only thing that can be said for certain is that it's a lot of ALU's to feed, and we've seen this before. How it will come out in practice is another story entirely.Weak graphics frontend, low single thread performance, issues with state changes creating pipeline bubbles which can only be solved by running either pure compute (hello compute based culling and such tricks of this console gen) or async compute (which fills the bubbles which aren't there in NV h/w in the first place which is why it doesn't benefit from it as much). None of this exist in Ampere.
Why is that? Do all those who say that Ampere is scaling badly in games "disentangle" all these from compute while they say this?
Ampere doesn't benefit from DX12 and Vulkan as much as you think (?), it just needs them to access its new h/w - but this is an API level decision not a h/w limitation. You can hit Ampere peak math utilization in DX11 and OpenGL just fine, you just can't access RT and TCs from them.
AMD has not invented a wheel here.Careful: DX12 and Vulkan both introduced "new programming paradigms" which Ampere benefits from (thanks, AMD) so your explanation needs to be based upon DX11 or earlier APIs.
All of these sound like very generic statements. I doubt in reality anyone but NVIDIA actually knows how easy or hard it will be to utilize those resources, not until devs get to grips with Ampere, and anything we say right now is wishful thinking at best. The only thing that can be said for certain is that it's a lot of ALU's to feed, and we've seen this before. How it will come out in practice is another story entirely.
We have the details about Ampere SM architecture already. It's not at all different to Turing and most paths and stores have been beefed up to accommodate for the increased throughput. Can you point to anything in Ampere which looks like it may create flops utilization issues besides what I've desribed already (fp/int and math/bandwidth balances in current gen s/w)?All of these sound like very generic statements. I doubt in reality anyone but NVIDIA actually knows how easy or hard it will be to utilize those resources, not until devs get to grips with Ampere, and anything we say right now is wishful thinking at best. The only thing that can be said for certain is that it's a lot of ALU's to feed, and we've seen this before. How it will come out in practice is another story entirely.
That's the opposite of what I'm saying in case you didn't read. Ampere doesn't require any specific optimizations, it just needs more math then games are pushing right now. What benchmarks we have show this already. Or do you consider Borderlands 3 to be optimized for Ampere?Armchair experts: It's the game engines that aren't prepared for this innovative architecture and developers need to optimize for it.
Who have no choice but to extract performance this way since they have to ship games on the thing? What would they say exactly?In the meanwhile, actual game developers who have been optimizing for GCN for the better part of the last decade:
The act to address deficiencies (underutilisation included) is called optimization, whether or not it is about, say, adding more optional compute heavy effects, or rethinking your VRAM/resource uses to better enable ILP. You are arguing with yourself here.Ampere doesn't require any specific optimizations, it just needs more math then games are pushing right now.
Making a game with more complex shading isn't really an optimization for any particular h/w. GCN required very specific optimizations to reach its peak processing power.The act to address deficiencies (underutilisation included) is called optimization, whether or not it is about, say, adding more optional compute heavy effects, or rethinking your VRAM/resource uses to better enable ILP. You are arguing with yourself here.
He is basically arguing that some people have double standards. Putting that debate aside, you can argue that what Ampere does is an easier starting point for optimization, and existing code can coincidentally benefit from the doubled throughput. But arguing that it does not need "specific" optimization is... a bit slippery IMO.
If Ampere is 'underutilized' and already hitting a power wall. What exactly would you gain by increasing utilization?
In theory you could optimise to extract more FPS from the same throughout. While hardware capability is a known quantity, software isn't. Especially now that RT is really going to start to be used.
With no power increase? I don't see that happening. Same thing happens with consoles. Better utilization through console lifespan inevitably leads to higher power consumption. That's one of the mentioned reasons they beefed up cooling this time.
2080Ti (300W)1080p 5,24ms 190,8 FPS
1440p 6,66ms 150,1 FPS
2160p 8,42ms 118,8 FPS
another 2080Ti @320-360W1080p: 161,8 (+17.9%)
1440p: 100,9 (+48.7%)
2160p: 49,4 (+140%)
56.3fps / 17.76ms +110%
DX12 needs more adoption, it's not that good right now -i.e.Star Wars Battlefront II can take up to 16GB of RAM only for the game itself (VRAM not counted) if run on DX12 for whatever reason, and the framerate can go down to 5fps or so if your PC has 16GB of RAM-.- AMD launches a graphics card whose performance doesn't scale with TFLOPs throughput as expected.
Armchair experts: GCN has utilization issues.
- nVidia launches a graphics card whose performance doesn't scale with TFLOPs throughput as expected:
Armchair experts: It's the game engines that aren't prepared for this innovative architecture and developers need to optimize for it.
In the meanwhile, actual game developers who have been optimizing for GCN for the better part of the last decade:
¯\_(ツ)_/¯
Couple of steps of GPU boost down seems like a fare trade for additional 20 to 30% of performance. Power usage goes down fast when you drop clocks.If Ampere is 'underutilized' and already hitting a power wall. What exactly would you gain by increasing utilization?