Speculation: GPU Performance Comparisons of 2020 *Spawn*

Status
Not open for further replies.
Semantics. Gaming workloads are those that are found in real world. At the moment, no one has written a pure FP32 gaming workload, with FP:INT ratio varying from 1,7:1 to around 3:1.
In the future? Who knows. That does not demonstrate these gaming workloads will exist.

ModEdit: removed unnecessary descriptor

The extreme examples aren't meant to be taken literally. BTW Dreams for PS4 is a fully compute shader based game, I don't know the FP:INT mix. But while not 2,7x times faster as with pure FP32, Ampere is still 2x times faster on compute with current mix.

In the near future, for the various games, the ratio could vary from what you mention to something like "from 2,1:1 to around 3,1:1" which is a minor shift and on performance charts the 3080 would appear to be 15% faster than it is now. Even more if more compute is used in general vs relying on dozens of render targets. That's how aggregate performance works. There are already very real "gaming workloads" where Ampere is over 2x faster...
 
Last edited by a moderator:
That's assuming that there are two separate SIMDs for FP and INT - which is unlikely. A far more likely scenario is one SIMD with two sets of ALUs - just like in RDNA or Pascal or whatever. So what are you even arguing about?
RDNA which has 4 SIMDs in a WGP each of which is capable of running INT32 is wasting a lot more h/w in games where there's only 25% of math in INT32 compared to Ampere for example.
Without knowing the details of the circuity looks like, I would errr from drawing any "hardware is wasted" conclusion:

1. Nvidia has apparently been giving their integer SIMD (and now a 2nd FP SIMD) an actual hardware pipeline and datapath (operand forwarding/routing + RF) for dual-issue arithmetic — GCN/RDNA doesn't have that.
2. Floating point and integer arithmetic units may share circuitry, and there is no way to know unless AMD released the details in public domain. Nvidia having made the choice of having them separated does not imply AMD having made the same choices.
3. Maybe more variety of ops in the same pipeline is cheaper than dual-issue. Maybe not. I can't tell.

This topic is likely an endless merry-go-round with the information currently available on this forum. It is kinda like arguing about which CPU cores "waste more hardware" by simply extrapolating from execution unit distributions across issue ports. :p
 
Last edited:
Without knowing the details of the circuity looks like, I would errr from drawing any "hardware is wasted" conclusion
Not my choice of words. Should've used quotes on "wasting" in the previous post.
I (and I presume that NV and AMD too) don't care about "wasted" h/w as long as it helps with providing competitive performance per transistor.
Which is why this whole topic of how Ampere is "wasting" h/w seems completely pointless. There are tensor cores in Ampere which are being about 99% wasted everywhere beyond games with DLSS - does this make them bad? Should they remove them and deprecate DLSS in future h/w?

1. Nvidia has apparently been giving their integer SIMD (and now a 2nd FP SIMD) an actual hardware pipeline and datapath (operand forwarding/routing + RF) for dual-issue arithmetic — GCN/RDNA doesn't have that.
This is not what is shown on the schematics and written in the whitepaper though. They clearly state a separate sets of "CUDA cores" (i.e. ALUs) are being used on the same datapath. This datapath can lead to one SIMD with two sets of ALUs - just like in all other GPUs on the market.

2. Floating point and integer arithmetic units may share circuitry, and there is no way to know unless AMD released the details in public domain. Nvidia having made the choice of having them separated does not imply AMD having made the same choices.
What does "share circuitry" even mean here? NV can't use these ALUs in parallel which obviously mean that they "share circuitry" too. The amount of such reuse may be different of course between different architectures - but again, does it even matter?

This topic is likely an endless merry-go-round with the information currently available on this forum. It is kinda like arguing about which CPU cores "waste more hardware" by simply extrapolating from execution unit distributions across issue ports.
Agreed.
 
Ugh. Talking to some of you is like talking to a rock. No offense but, but that's exactly how it feels. The extreme examples aren't meant to be taken literally. BTW Dreams for PS4 is a fully compute shader based game, I don't know the FP:INT mix. But while not 2,7x times faster as with pure FP32, Ampere is still 2x times faster on compute with current mix.

In the near future, for the various games, the ratio could vary from what you mention to something like "from 2,1:1 to around 3,1:1" which is a minor shift and on performance charts the 3080 would appear to be 15% faster than it is now. Even more if more compute is used in general vs relying on dozens of render targets. That's how aggregate performance works. There are already very real "gaming workloads" where Ampere is over 2x faster...

ModEdit: Removed unnecessary bits

I not only agreed that Ampere has 2x the FP32 PEAK performance, I've written it myself. What I don't agree is that being representative of typical gaming workload, and no, there is NO game out there with 2x the performance improvement respect to Turing, with the same SM number. Probably you are comparing the 3080 to the 2080, but 3080 has the same SM count as 2080TI. Which has more bandwidth and lower base clocks, making the comparison useless. I will see that 2x when it will be on independent benchmarks, like I would not buy a 2x of Navi 21 over Navi 10 without any real world benchmark.
 
Last edited by a moderator:
I suggest everyone take a break away from the discussion to realize this is supposed to be a Technical forum for open positive discussions. Cooler and logical heads should prevail.

This is a complete mess and is going to take a long time to sort out how to get everything back on track. Le Sigh.
 
Status
Not open for further replies.
Back
Top