Speculation: GPU Performance Comparisons of 2020 *Spawn*

Status
Not open for further replies.
Well aside from the fact that it's the only machine learning based resolution upscaling solution on the market from any vendor, has been for the last 2 years, and no-one else has even announced they're working on a competing solution at this stage (outside of a few high level patents that may or may not lead to something).
Microsoft demoed DirectML-based superresolution solution in 2018 already, so clearly they've been working on theirs for quite some time
 
Microsoft demoed DirectML-based superresolution solution in 2018 already, so clearly they've been working on theirs for quite some time
Although i agree with the sentiment that MS has probably been working on it.
The demo was based on Nvidia models, and it's the models that is the core of any ML
 
And it is not the same like DLSS. The Shield TV supports AI upscaling and with the latest update up to 1080p/60fps -> 4k/60FPS.
 
I'm curious about your optimism.
In the past 1 AMD TF always was more for me than 1 NV TF, but the difference became pretty small with the years. Now i'm no longer up to date with concurrent float / int ALUs on NV.
I would not wonder if AMD misses the goal to compete at high end. But we'll see, and not sure how justified this new 'high end' is at all for the masses.
 
I'm curious about your optimism.
Their marketing have joined the shitlord championship too.
In the past 1 AMD TF always was more for me than 1 NV TF
We're back to good old uncharted waters now, prepare your funny FLOP marketed@gaming actual charts because we'll all need them.
I would not wonder if AMD misses the goal to compete at high end
They're here to win.
But we'll see, and not sure how justified this new 'high end' is at all for the masses.
High-end thingies are usually made for PR purposes.
Being the best in a set of metrics is very nice and it gives every other product in your lineup the prestigious halo of dominance.
 
I'm curious about your optimism.
In the past 1 AMD TF always was more for me than 1 NV TF, but the difference became pretty small with the years. Now i'm no longer up to date with concurrent float / int ALUs on NV.
I would not wonder if AMD misses the goal to compete at high end. But we'll see, and not sure how justified this new 'high end' is at all for the masses.

Because in real game performance the 30 Tflops 3080 is not three times more powerful than 2080 10,07 Tflops but up to two times more performant after this is maybe not important for your work but this is the message of Nvidia.

At least this is the current state before driver improvement.
 
Because in real game performance the 30 Tflops 3080 is not three times more powerful than 2080
That's a problem of software, not hardware. Taking prev gen games and looking at 4K 180 fps vs. 90 fps does not tell us so much about real performance because obviously the GPU is just bored in both cases.
(For me, always interested in compute perf, games are no benchmark anyways.)
 
Because in real game performance the 30 Tflops 3080 is not three times more powerful than 2080 10,07 Tflops but up to two times more performant after this is maybe not important for your work but this is the message of Nvidia.

At least this is the current state before driver improvement.
That's because Ampere isn't running INTs in parallel when it runs FP32 at full speed, and these INTs did give Turing some 30% or so performance boost on average. So you get less than twice the performance but you do get double the FP32 - which can be important when FP32 is what you're actually looking for.
 
10TF vs a 20TF gpu, hell of an optimization there.
NVIDIAs "TFLOPS" as any sort of measurement unit compared to current HW (including XSX) went out the window with Ampere. Either they can't feed the FP32 units or there's some other major bottlenecks, since 3080 is getting some 60-90% increases in performance with near triple theoretical TFLOPS compared to 2080S
 
Graphics Core Next is tuned to extract maximum utilisation out of the float ALUs and for the scalar ALUs to rarely be a bottleneck - all without any compiler help. Has anyone measured how successfully these goals are achieved?
 
NVIDIAs "TFLOPS" as any sort of measurement unit compared to current HW (including XSX) went out the window with Ampere. Either they can't feed the FP32 units or there's some other major bottlenecks, since 3080 is getting some 60-90% increases in performance with near triple theoretical TFLOPS compared to 2080S
2080S is around 12 tflops on actual boost clocks. Let's say that NV's own estimation of INT SIMD taking about 30% of math on them is correct on your typical gaming code - this would mean that you'll need around 15.5 tflops of pure FP32 without INT running in parallel to reach the same performance on similar hardware. That's about half of 3080 flops and +60+90% of performance from these seems very reasonable without the INTs running in parallel all the time. Your typical current gen game isn't limited only by math either, remains to be seen how many CPU, b/w and other limitations a 30 tflops GPU is hitting in this gen games - a sign of it being limited by something other than FP32 is already out there in the form of TSE results being a lot better than TS ones.
 
That's a problem of software, not hardware. Taking prev gen games and looking at 4K 180 fps vs. 90 fps does not tell us so much about real performance because obviously the GPU is just bored in both cases.
(For me, always interested in compute perf, games are no benchmark anyways.)

This is why I said I know for your case compute is more important ;) * but for gaming performance it is not the case at least for the moment. And like other people said there are other bottleneck point than the Tflops.

*Maybe they need to do a CDNA gaming version for your needs
 
No, the number is correct. Except Volta/Turing no other GPU runs FP32 and INT32 concurrently. So the TFLOPs number from other GPUs arent "real", too.

This simple point seems to be escaping many people. It does mean that the 5700xt is quite efficient as it manages to stay in range of the 2070 super without the benefit of the extra INT pipeline.
 
*Maybe they need to do a CDNA gaming version for your needs
hehe, yeah! Arcturus would be the gaming GPU of my dreams. Bye bye restrictive ROPs and RT cores :p
But that's just dreams and won't ever happen. Luckily Vega is fast enough for my compute needs even while keeping clocked at 150 MHz with fans off. (still unsure if i can believe this myself - current test scene is small, though)
 
without the benefit of the extra INT pipeline.
bro it has scalar units for that purpose (whole two, one per SIMD, for Navi, that is).
Since GCN1 pretty sure.

It's more of a testament to nV prowess for getting that much out of relatively simple SMs pre-Volta.
 
This simple point seems to be escaping many people. It does mean that the 5700xt is quite efficient as it manages to stay in range of the 2070 super without the benefit of the extra INT pipeline.
GCN introduced the dedicated INT pipeline back in, erm, 2012 and it's still there in Navi.
 
Judging from the technical description of the architecture, Navi 10 has a very efficient shader execution. If performance did not match the thoretical output, it may be due to something out of the workgroups/CU, in this case the main culprits may be in rasterization/primitive culling, shader scheduling (ACEs/load distribution circuitry), limited datapaths, texture unit capability and ROP capability. If indeed there were one or more such bottlenecks and they managed to solve it in this new iteration of the RDNA architecture, then they may show a big performance jump. There are a lot of "ifs" though.
 
Status
Not open for further replies.
Back
Top