AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

But TFLOPS is a factor of the number of cores? Cores * Clock Speed = TFLOPS.

You also have to factor in the width of those cores. An Ampere SM has twice the FP ALUs of an RDNA CU. TFLOPS = cores x clock x FP ALUs.

Based on the numbers shared so far #cores correlates more closely to actual gaming performance than peak TFLOPs. Of course compute workloads are a whole other story.

6900xt (80) vs 3090 (82)
6800xt (72) vs 3080 (68)
6800 (60) vs 3070 (46)
 
From the white paper, it's not a GPU:
"Unlike the graphics-oriented AMD RDNA™ family, the AMD CDNA family removes all of the fixed-function hardware that is designed to accelerate graphics tasks such as rasterization, tessellation, graphics caches, blending, and even the display engine."
 
This generation it seems the number of “cores” (SM/CU) will be a much better proxy for gaming performance than raw TFLOPs.
I'm not convinced. I think Ampere's TFLOPS will swing into action more and more. As the percentage of frametime spent running rasterisation falls, there should be a big bias in favour of Ampere.

It will be hard to disentangle from ray tracing, though. Perhaps in a year's time we'll see that "ray tracing off" is simply not benchmarked for new games.

I've been trying to work out why AMD has been so coy about its ray tracing performance. The "futuristic city" reflections galore demo would appear to be "worst case" for ray tracing performance. In featuring a fair amount of dynamic geometry and an outdoor city that should make reflections more difficult (BVH being rebuilt) it seems like a tough scenario. Something doesn't add up there.
 
I'm not convinced. I think Ampere's TFLOPS will swing into action more and more. As the percentage of frametime spent running rasterisation falls, there should be a big bias in favour of Ampere.

I don't know why that would be the case. Are there upcoming algos that are flops heavy and cache friendly? I imagine the memory hierarchy would be as or even more important than raw alu throughput.

It will be hard to disentangle from ray tracing, though. Perhaps in a year's time we'll see that "ray tracing off" is simply not benchmarked for new games.

I've been trying to work out why AMD has been so coy about its ray tracing performance. The "futuristic city" reflections galore demo would appear to be "worst case" for ray tracing performance. In featuring a fair amount of dynamic geometry and an outdoor city that should make reflections more difficult (BVH being rebuilt) it seems like a tough scenario. Something doesn't add up there.

Well mirror reflections aren't really worst case because the rays tend to be relatively coherent. Diffuse bounce lighting for GI is a much tougher workload.
 
I'm not convinced. I think Ampere's TFLOPS will swing into action more and more. As the percentage of frametime spent running rasterisation falls, there should be a big bias in favour of Ampere.

It will be hard to disentangle from ray tracing, though. Perhaps in a year's time we'll see that "ray tracing off" is simply not benchmarked for new games.

I've been trying to work out why AMD has been so coy about its ray tracing performance. The "futuristic city" reflections galore demo would appear to be "worst case" for ray tracing performance. In featuring a fair amount of dynamic geometry and an outdoor city that should make reflections more difficult (BVH being rebuilt) it seems like a tough scenario. Something doesn't add up there.


Maybe the drivers aren't 100% ready yet ?

Or, yeah, the perfs in real games are not where they want to be yet... We'll have the answer soon...
 
Maybe the drivers aren't 100% ready yet ?

Or, yeah, the perfs in real games are not where they want to be yet... We'll have the answer soon...

You market what you are good at or better than competition. I don't know why is that so surprising.
 
You market what you are good at or better than competition. I don't know why is that so surprising.

Well yeah, but having 0 numbers or slide about the main "progress" in gaming gpu world, it doesn't smell like "we're a little under", but "yeah we suck". I don't know,it's a very bad look imo, vs just showing your cards. But, what is done is done, and the tests we'll be here soon, no big deal now.
 
Well yeah, but having 0 numbers or slide about the main "progress" in gaming gpu world, it doesn't smell like "we're a little under", but "yeah we suck". I don't know,it's a very bad look imo, vs just showing your cards. But, what is done is done, and the tests we'll be here soon, no big deal now.

I mean we have to define what suck is. If you are expecting anything more than matching 3070 in RT you will likely be disappointed.
 

Interesting, the 3090 is ~18% faster than the 3080 in Godfall. It's one of the few titles where actual performance is close to theoretical.

Also, there was a pretty transparent hint that RDNA2 is faster than Ampere at 1080p in Watch Dogs Legion. That could indicate an advantage for AMD at lower resolutions and we can speculate why that may be the case. Could be just faster geometry throughput or Infinity Cache is more helpful at lower res.
 
I don't know why that would be the case. Are there upcoming algos that are flops heavy and cache friendly? I imagine the memory hierarchy would be as or even more important than raw alu throughput.
I suppose watch where RTX 3070 is streaking ahead of 2080Ti?:


12% at 4K in Godfall.

Well mirror reflections aren't really worst case because the rays tend to be relatively coherent. Diffuse bounce lighting for GI is a much tougher workload.
Diffuse GI requires very low ray density though.

To be honest, I haven't found any material that really goes into the relative costs of real time ray traced techniques.
 
To be honest, I haven't found any material that really goes into the relative costs of real time ray traced techniques.
When it was new, this slide from an Nvidia presentation helped me for a broad overview. Nothing going into detail though, but maybe it can help others as well.
 

Attachments

  • RT techniques.PNG
    RT techniques.PNG
    350.4 KB · Views: 33
This generation it seems the number of “cores” (SM/CU) will be a much better proxy for gaming performance than raw TFLOPs.

Well for me personally I've never been a fan of using such high level specs (which TFLOPs is as it's just clock speed x "cores") to compare "general performance" across uarchs especially the more divergent they are, which is typically extremely the case if we're looking cross vendor (and this applies to things other than GPUs, eg CPUs). We wouldn't ever even think about just distilling CPUs down nowadays to just clock speed x "cores" so I've thought TFLOPs (or any "OPs" for that matter) was important in that context.

To me it's more useful from an academic stand point and as a useful term to more easily context certain discussions. For instance if we wanted to compare the 5700 and 5700XT using TFLOPs would be easier to digest than having to do the CU x clockspeed math. In terms of gaming Ampere (since I think there's a lot of discussion around it with respect to this topic) it's TFLOPs rating is what it is, it has high potential peak FP32 through put due to the uarch relative to other resources, which will show in some cases and not in other cases.
 
Interesting, the 3090 is ~18% faster than the 3080 in Godfall. It's one of the few titles where actual performance is close to theoretical.

Also, there was a pretty transparent hint that RDNA2 is faster than Ampere at 1080p in Watch Dogs Legion. That could indicate an advantage for AMD at lower resolutions and we can speculate why that may be the case. Could be just faster geometry throughput or Infinity Cache is more helpful at lower res.

Suprisingly, the gap reduces to 15% at 4K. You'd expect that it would have increased. Also these numbers are with DXR off.

Yes that seemed like what he was hinting at, and also mentioned they are using the 3950X. Wonder why they aren't using the 5950X, which I'd expect AMD is pushing (along with SAM).
 
Last edited:
Back
Top