Speculation: GPU Performance Comparisons of 2020 Spawn

Kaotik · Sep 3, 2020

pjbliverpool said:
Well aside from the fact that it's the only machine learning based resolution upscaling solution on the market from any vendor, has been for the last 2 years, and no-one else has even announced they're working on a competing solution at this stage (outside of a few high level patents that may or may not lead to something).

Microsoft demoed DirectML-based superresolution solution in 2018 already, so clearly they've been working on theirs for quite some time

DegustatoR · Sep 3, 2020

Kaotik said:
Microsoft demoed DirectML-based superresolution solution in 2018 already, so clearly they've been working on theirs for quite some time

They've demoed NV solution running through DML on NV h/w. Not sure if this qualifies as MS working on their solution for quite some time.

Jay · Sep 3, 2020

Kaotik said:
Microsoft demoed DirectML-based superresolution solution in 2018 already, so clearly they've been working on theirs for quite some time

Although i agree with the sentiment that MS has probably been working on it.
The demo was based on Nvidia models, and it's the models that is the core of any ML

troyan · Sep 3, 2020

And it is not the same like DLSS. The Shield TV supports AI upscaling and with the latest update up to 1080p/60fps -> 4k/60FPS.

JoeJ · Sep 3, 2020

Bondrewd said:
Lol.

I'm curious about your optimism.
In the past 1 AMD TF always was more for me than 1 NV TF, but the difference became pretty small with the years. Now i'm no longer up to date with concurrent float / int ALUs on NV.
I would not wonder if AMD misses the goal to compete at high end. But we'll see, and not sure how justified this new 'high end' is at all for the masses.

Bondrewd · Sep 3, 2020

JoeJ said:
I'm curious about your optimism.

Their marketing have joined the shitlord championship too.

JoeJ said:
In the past 1 AMD TF always was more for me than 1 NV TF

We're back to good old uncharted waters now, prepare your funny FLOP marketed@gaming actual charts because we'll all need them.

JoeJ said:
I would not wonder if AMD misses the goal to compete at high end

They're here to win.

JoeJ said:
But we'll see, and not sure how justified this new 'high end' is at all for the masses.

High-end thingies are usually made for PR purposes.
Being the best in a set of metrics is very nice and it gives every other product in your lineup the prestigious halo of dominance.

chris1515 · Sep 3, 2020

JoeJ said:
I'm curious about your optimism.
In the past 1 AMD TF always was more for me than 1 NV TF, but the difference became pretty small with the years. Now i'm no longer up to date with concurrent float / int ALUs on NV.
I would not wonder if AMD misses the goal to compete at high end. But we'll see, and not sure how justified this new 'high end' is at all for the masses.

Because in real game performance the 30 Tflops 3080 is not three times more powerful than 2080 10,07 Tflops but up to two times more performant after this is maybe not important for your work but this is the message of Nvidia.

At least this is the current state before driver improvement.

JoeJ · Sep 3, 2020

chris1515 said:
Because in real game performance the 30 Tflops 3080 is not three times more powerful than 2080

That's a problem of software, not hardware. Taking prev gen games and looking at 4K 180 fps vs. 90 fps does not tell us so much about real performance because obviously the GPU is just bored in both cases.
(For me, always interested in compute perf, games are no benchmark anyways.)

Rootax · Sep 3, 2020

And TF is not the only thing impacting perfs...

DegustatoR · Sep 3, 2020

chris1515 said:
Because in real game performance the 30 Tflops 3080 is not three times more powerful than 2080 10,07 Tflops but up to two times more performant after this is maybe not important for your work but this is the message of Nvidia.

At least this is the current state before driver improvement.

That's because Ampere isn't running INTs in parallel when it runs FP32 at full speed, and these INTs did give Turing some 30% or so performance boost on average. So you get less than twice the performance but you do get double the FP32 - which can be important when FP32 is what you're actually looking for.

Kaotik · Sep 3, 2020

PSman1700 said:
10TF vs a 20TF gpu, hell of an optimization there.

NVIDIAs "TFLOPS" as any sort of measurement unit compared to current HW (including XSX) went out the window with Ampere. Either they can't feed the FP32 units or there's some other major bottlenecks, since 3080 is getting some 60-90% increases in performance with near triple theoretical TFLOPS compared to 2080S

troyan · Sep 3, 2020

No, the number is correct. Except Volta/Turing no other GPU runs FP32 and INT32 concurrently. So the TFLOPs number from other GPUs arent "real", too.

Jawed · Sep 3, 2020

Graphics Core Next is tuned to extract maximum utilisation out of the float ALUs and for the scalar ALUs to rarely be a bottleneck - all without any compiler help. Has anyone measured how successfully these goals are achieved?

DegustatoR · Sep 3, 2020

Kaotik said:
NVIDIAs "TFLOPS" as any sort of measurement unit compared to current HW (including XSX) went out the window with Ampere. Either they can't feed the FP32 units or there's some other major bottlenecks, since 3080 is getting some 60-90% increases in performance with near triple theoretical TFLOPS compared to 2080S

2080S is around 12 tflops on actual boost clocks. Let's say that NV's own estimation of INT SIMD taking about 30% of math on them is correct on your typical gaming code - this would mean that you'll need around 15.5 tflops of pure FP32 without INT running in parallel to reach the same performance on similar hardware. That's about half of 3080 flops and +60+90% of performance from these seems very reasonable without the INTs running in parallel all the time. Your typical current gen game isn't limited only by math either, remains to be seen how many CPU, b/w and other limitations a 30 tflops GPU is hitting in this gen games - a sign of it being limited by something other than FP32 is already out there in the form of TSE results being a lot better than TS ones.

chris1515 · Sep 3, 2020

JoeJ said:
That's a problem of software, not hardware. Taking prev gen games and looking at 4K 180 fps vs. 90 fps does not tell us so much about real performance because obviously the GPU is just bored in both cases.
(For me, always interested in compute perf, games are no benchmark anyways.)

This is why I said I know for your case compute is more important

* but for gaming performance it is not the case at least for the moment. And like other people said there are other bottleneck point than the Tflops.

*Maybe they need to do a CDNA gaming version for your needs

trinibwoy · Sep 3, 2020

troyan said:
No, the number is correct. Except Volta/Turing no other GPU runs FP32 and INT32 concurrently. So the TFLOPs number from other GPUs arent "real", too.

This simple point seems to be escaping many people. It does mean that the 5700xt is quite efficient as it manages to stay in range of the 2070 super without the benefit of the extra INT pipeline.

JoeJ · Sep 3, 2020

chris1515 said:
*Maybe they need to do a CDNA gaming version for your needs

hehe, yeah! Arcturus would be the gaming GPU of my dreams. Bye bye restrictive ROPs and RT cores

But that's just dreams and won't ever happen. Luckily Vega is fast enough for my compute needs even while keeping clocked at 150 MHz with fans off. (still unsure if i can believe this myself - current test scene is small, though)

Bondrewd · Sep 3, 2020

trinibwoy said:
without the benefit of the extra INT pipeline.

bro it has scalar units for that purpose (whole two, one per SIMD, for Navi, that is).
Since GCN1 pretty sure.

It's more of a testament to nV prowess for getting that much out of relatively simple SMs pre-Volta.

Jawed · Sep 3, 2020

trinibwoy said:
This simple point seems to be escaping many people. It does mean that the 5700xt is quite efficient as it manages to stay in range of the 2070 super without the benefit of the extra INT pipeline.

GCN introduced the dedicated INT pipeline back in, erm, 2012 and it's still there in Navi.

Leoneazzurro5 · Sep 3, 2020

Judging from the technical description of the architecture, Navi 10 has a very efficient shader execution. If performance did not match the thoretical output, it may be due to something out of the workgroups/CU, in this case the main culprits may be in rasterization/primitive culling, shader scheduling (ACEs/load distribution circuitry), limited datapaths, texture unit capability and ROP capability. If indeed there were one or more such bottlenecks and they managed to solve it in this new iteration of the RDNA architecture, then they may show a big performance jump. There are a lot of "ifs" though.

Speculation: GPU Performance Comparisons of 2020 Spawn

Kaotik

Drunk Member

DegustatoR

Jay

troyan

JoeJ

Bondrewd

chris1515

JoeJ

Rootax

DegustatoR

Kaotik

Drunk Member

troyan

Jawed

DegustatoR

chris1515

trinibwoy

Meh

JoeJ

Bondrewd

Jawed

Leoneazzurro5

Similar threads

Speculation: GPU Performance Comparisons of 2020 *Spawn*

Drunk Member

Drunk Member

Meh

Similar threads

Speculation: GPU Performance Comparisons of 2020 Spawn