Speculation: GPU Performance Comparisons of 2020 *Spawn*

Status
Not open for further replies.
They've had a hard time utilizing them because they've had widely known h/w design related issues which prevented them from such utilization in graphics specifically. Ampere don't have these thus this comparison isn't valid. As I've already said.
Please do elaborate more.
Both GCNs and Ampere can reach performance their FLOPSs suggest when given perfect software for said architectures. In both GCNs and Amperes case, that software isn't games.
It's irrelevant if the culprit is bandwidth, shader engine design, frontend, backend or whatever in between, both architectures can spit out full FLOPS they promise but that requires very specific loads.
 
They've had a hard time utilizing them because they've had widely known h/w design related issues which prevented them from such utilization in graphics specifically. Ampere don't have these thus this comparison isn't valid. As I've already said.

So what you're saying is Ampere has h/w design related issues but they're just not widely known yet?
 
Please do elaborate more.
Both GCNs and Ampere can reach performance their FLOPSs suggest when given perfect software for said architectures. In both GCNs and Amperes case, that software isn't games.
It's irrelevant if the culprit is bandwidth, shader engine design, frontend, backend or whatever in between, both architectures can spit out full FLOPS they promise but that requires very specific loads.

No developer will design games with so much compute workload that only Ampere will be useful. Crysis Remastered with (Software) Raytracing is such case:
https://www.purepc.pl/test-wydajnosci-crysis-remastered-czy-mi-pojdzie-zalezy-co-masz?page=0,14

With 2x the amount of shader throughout other units will be the limitation.
 
Again guys, Ampere and Vega situation is completely different. Ampere has some clear "limitations" in regards to pixel and texture fillrate and memory bandwidth that Vega did not:

Vega 64 vs RX580 had:
2.05x FLOPS (12.6 vs 6.1 TFLOPS)
1.9x times memory BW (484 vs 256 GB/s)
2.05x texture fillrate (396 vs 196 GT/s)
2.3x pixel fillrate (99 vs 43 GPixel/s)
1.58x performance on TPU
With every mayor metric being close or exceeding 2x, it has no excuse for not being nearly 2x times faster. It is only 1.58x times faster according to TPU.
What's more since everything is a consistent 2x increase, making games use any one of those metrics more than the others is not going to make the card faster.

3080 vs 2080S:
2.7x FLOPS (30 vs 11.1 TFLOPS)
or 2x TOPS (30 vs 15 TOPS) when accounting for INT on Turing
1.53x times memory BW (760 vs 496 GB/s)
1.34x texture fillrate (465 vs 348 GT/s)
1.29x pixel fillrate (150 vs 116 GPixel/s)
1.56x performance on TPU

Performance increase of 56% is higher than every other metric besides TFLOPS, and in fact falls probably right where you would expect it based on the metrics. Increasing the one metric: TFLOPS that exceeds current performance, thus relying less on the others, will undoubtely increase relative game performance. If that ultimately happens (more FMA) is a completely different discussion.
 
Again guys, Ampere and Vega situation is completely different. Ampere has some clear "limitations" in regards to pixel and texture fillrate and memory bandwidth that Vega did not:

Vega 64 vs RX580 had:
2.05x FLOPS (12.6 vs 6.1 TFLOPS)
1.9x times memory BW (484 vs 256 GB/s)
2.05x texture fillrate (396 vs 196 GT/s)
2.3x pixel fillrate (99 vs 43 GPixel/s)
1.58x performance on TPU
With every mayor metric being close or exceeding 2x, it has no excuse for not being nearly 2x times faster. It is only 1.58x times faster according to TPU.
Exactly. Mocking the "nVidia boys" for praising the FLOPS metric is one thing, but the bottlenecked 64CU GCN designs were a different matter.
 
Vega 64 vs RX580 had:
2.05x FLOPS (12.6 vs 6.1 TFLOPS)
1.9x times memory BW (484 vs 256 GB/s)
2.05x texture fillrate (396 vs 196 GT/s)
2.3x pixel fillrate (99 vs 43 GPixel/s)
1.58x performance on TPU

You're assuming maximum sustained boost clocks for the Vega 64? Its clocks in normal usage go down to 1400MHz.
Unless TPU was cooling down the card between tests, that 1.58x performance you claim is coming from 11.4 TFLOPs.

Furthermore, the Vega 64 SKU is known for having "too many CUs" for gaming. It's a card that sacrifices efficiency in lieu of maximum achievable performance just like e.g. the RTX 3090.
If you redo those calculations on a Vega 56 that averages at 1300MHz, then you get 6.1 vs. 9.3 TFLOPs (1.52x), while getting 1.46x more performance on TPU @1440p.
 
Again guys, Ampere and Vega situation is completely different. Ampere has some clear "limitations" in regards to pixel and texture fillrate and memory bandwidth that Vega did not:

Vega 64 vs RX580 had:
2.05x FLOPS (12.6 vs 6.1 TFLOPS)
1.9x times memory BW (484 vs 256 GB/s)
2.05x texture fillrate (396 vs 196 GT/s)
2.3x pixel fillrate (99 vs 43 GPixel/s)
1.58x performance on TPU
With every mayor metric being close or exceeding 2x, it has no excuse for not being nearly 2x times faster. It is only 1.58x times faster according to TPU.
What's more since everything is a consistent 2x increase, making games use any one of those metrics more than the others is not going to make the card faster.

3080 vs 2080S:
2.7x FLOPS (30 vs 11.1 TFLOPS)
or 2x TOPS (30 vs 15 TOPS) when accounting for INT on Turing
1.53x times memory BW (760 vs 496 GB/s)
1.34x texture fillrate (465 vs 348 GT/s)
1.29x pixel fillrate (150 vs 116 GPixel/s)
1.56x performance on TPU

Performance increase of 56% is higher than every other metric besides TFLOPS, and in fact falls probably right where you would expect it based on the metrics. Increasing the one metric: TFLOPS that exceeds current performance, thus relying less on the others, will undoubtely increase relative game performance. If that ultimately happens (more FMA) is a completely different discussion.

Thankyou for this informative post. And then we shouldnt forget that a 2080S, for example, is actually more then the stated 11.1TF.

Just keep in mind, its almost sure AMD does have a high TF beast in the works, probably very close to 30TF, hence the reason for this Ampere Titan product, just so NV can say we have the most powerfull gaming GPU on the planet. In raw numbers, AMD probably has that ballpark 30TF Navi2 gpu coming. Remember, they clock high, and with alot of CUs they can achieve this.
 
Please do elaborate more.
I'm sorry but I won't. GCN issues are widely known and documented. Feel free to educate yourself on the matter.

Both GCNs and Ampere can reach performance their FLOPSs suggest when given perfect software for said architectures. In both GCNs and Amperes case, that software isn't games.
It is games in Ampere's case as has been demonstrated by dozens of independent benchmarks which you just ignore for some reason.

It's irrelevant if the culprit is bandwidth, shader engine design, frontend, backend or whatever in between
No, it's not. Frontend and backend and shader engine design specifically means a lot in how your are able to utilize the shading h/w in graphics workloads.

both architectures can spit out full FLOPS they promise but that requires very specific loads.
It requires the workload to be FP32 math limited on current gen h/w (Turing) when we're talking about Ampere.
It requires specifically optimized code which will do things in very specific ways if we're talking about GCN.
They are not comparable.

So what you're saying is Ampere has h/w design related issues but they're just not widely known yet?
What I'm saying is clearly written in the post you've quoted.
 
Thankyou for this informative post. And then we shouldnt forget that a 2080S, for example, is actually more then the stated 11.1TF.
Just like 3080 is actually more than the stated 30 TF.

The point was that just like GCN has offered a lot of raw FLOPS which it can't fully utilize in games, just like Ampere does. We can go 'till the end of time nitpicking the ton of variables which cause this, but the end result and the point doesn't change: 3080 acts nothing like 30 TFLOPS card compared to Turing in gaming.
 
The point was that just like GCN has offered a lot of raw FLOPS which it can't fully utilize in games, just like Ampere does. We can go 'till the end of time nitpicking the ton of variables which cause this, but the end result and the point doesn't change: 3080 acts nothing like 30 TFLOPS card compared to Turing in gaming.

Actually GCN gpus perform much better then their NV counterparts, you can as you say come with tons of variables as to why that is, but they aged much better. Some games even have half the performance on comparable NV gpus.
The 3080 already is showing a close to 100% increase in older and current gen games over a 2080. And like DF mentioned, thats bound to increase over time with modern or next gen game engines. Also dont forget that Turing boosts to a much higher TF metric then what Ampere does.

Yes it does act like a 30TF GPU. I have no idea why thats a problem for some though, AMD is going to come with a product in that same TF range as the 3080, and i have no doubt it being faster at it.
 
Wasn't Vega very competitive in select few games compared to Pascal? I think Doom was one such case? It's literally the same story with Ampere. Very good in select few games and nowhere near that in everything else.
 
It's irrelevant if the culprit is bandwidth, shader engine design, frontend, backend or whatever in between, both architectures can spit out full FLOPS they promise but that requires very specific loads.

No it's not irrelevant, because of the comparison I posted above and because "specific loads" are not as simple as saying "games" vs "some generic compute workload" inferring they are simply different and always will be completely different and have completely different needs and specific mix of compute vs fillrate vs badwidth requirements.

Current games usually have a light (pre)pass which is written to G-Buffer, that's pixel fillrate and bandwidth consumed; followed by a shadows pass, which is again more fillrate and BW; followed by a GI pass & buffer, fillrate and BW; followed by AO pass & buffer, fillrate and BW; followed by probe reflections, pass & buffer; followed by screen space reflections, pass & buffer; followed by ??? You do get the drill right?

Future games could (due to consoles having RT and with Nvidia influence for sure) combine all or many/most of that into a single RT "lighting" pass, vastly reducing fillrate and BW requirements, which is by far Ampere's weak link.
 
Wasn't Vega very competitive in select few games compared to Pascal? I think Doom was one such case? It's literally the same story with Ampere. Very good in select few games and nowhere near that in everything else.

GCN did age much and much better then anything NV from that time. I remember posting my findings, and people educated me that it was thanks to GCN having more compute power (i think it was programmer @JoeJ ? How things change ;)
 
Actually GCN gpus perform much better then their NV counterparts, you can as you say come with tons of variables as to why that is, but they aged much better. Some games even have half the performance on comparable NV gpus.
The 3080 already is showing a close to 100% increase in older and current gen games over a 2080. And like DF mentioned, thats bound to increase over time with modern or next gen game engines. Also dont forget that Turing boosts to a much higher TF metric then what Ampere does.

Yes it does act like a 30TF GPU. I have no idea why thats a problem for some though, AMD is going to come with a product in that same TF range as the 3080, and i have no doubt it being faster at it.

I fail to follow the logic where "30 TF GPU" with over 1.7x theoretical performance is beating "11 TF GPU" by under 1.0x in gaming can be "acting like 30 TF GPU in gaming".

Also at least our 2080S FE boosted less over it's advertised Boost-clocks compared to 3080 FE over it's advertised Boost-clocks. (percentage of actual average load clocks vs advertised boost clocks)
 
Well, others have explained before me already. It doesnt matter, AMD will come with a comparable product in TF range anyway. The 3090 is another market not intresting for anyone anyway (1500 dollar price tag), but NV can claim they have the fastest. Actually, AMD could probably do the same there with some workstation product.
 
You're assuming maximum sustained boost clocks for the Vega 64? Its clocks in normal usage go down to 1400MHz.
Unless TPU was cooling down the card between tests, that 1.58x performance you claim is coming from 11.4 TFLOPs

Sure let's compare RX580 at maximum boost vs non maximum Vega... Seems fair. And it doesn't even make a difference, because clocks affect all metrics equally, so it doesn't change a thing compared to Ampere.


All metrics on Vega 56 scale down equally, not just FP, so again that's not relevant at all. Only pixel fillrate remained the same compared to Vega 64 and that cannot be the culprit, since at 2.3x times that of RX580 it is by far the strongest metric in Vega and it doesn't show compared to RX580. So, we are again at some fundamental weakness elsewhere.
 
Vega 64 vs RX580 had:
2.05x FLOPS (12.6 vs 6.1 TFLOPS)
1.9x times memory BW (484 vs 256 GB/s)
2.05x texture fillrate (396 vs 196 GT/s)
2.3x pixel fillrate (99 vs 43 GPixel/s)
1.58x performance on TPU
With every mayor metric being close or exceeding 2x, it has no excuse for not being nearly 2x times faster. It is only 1.58x times faster according to TPU.
What's more since everything is a consistent 2x increase, making games use any one of those metrics more than the others is not going to make the card faster.

Major thing not scaling is probably that they can both only cull and rasterise 4 triangles/clock. Realistically only rasterising 2 triangles if half are culled from backface. All the CUs/bandwidth/ROPs in the world won't help if they don't have any threads to shade.
 
Major thing not scaling is probably that they can both only cull and rasterise 4 triangles/clock. Realistically only rasterising 2 triangles if half are culled from backface. All the CUs/bandwidth/ROPs in the world won't help if they don't have any threads to shade.

For GCN iirc some devs did cull and rasterise on software/comute shaders, after all. On consoles at least.
 
Which is one obvious way of both getting around its frontend weaknesses and using that compute power we're talking about.
And how is that any different from Ampere, which requires disproportionate amounts of specific type of load to flex it's muscle? That's not games, it's not "acting like 30TF GPU" in games which was the whole point of the debacle.
 
Status
Not open for further replies.
Back
Top