AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

There is barely any RT shadows at all in Dirt 5, it's using RT sparingly and selectively, yet the hit to AMD hardware is similar to NVIDIA hardware, AMD GPUs are just starting off from a higher fps position. This alone speaks volumes about AMD RT capabilities.
RTX 3080 with RT is about 78% of without, RTX 2080 Ti 75%, RX 6800 XT 80% and RX 6800 83%.
The differences aren't big, but both Radeons lose relatively slightly less performance, if they were as bad as some seem to suggest that shouldn't happen. Yes, Ampere is stronger in RT, but the difference can't be worlds apart like suggested in this thread.
 
News from hothardware:


The answer of the rasterizer let me get more ??? This is teh answer from HotHardware from AMD Scott Herkelman:

If this is in reference to Render Backends (RB+), there are 8. If this is referring to Shader Engine, there are 4.
 
Last edited:
But didn't amd beet nvidia this time in tile-deferred pixel shading when they are everywhere faster then nvidia in normal rasterizsation games?
4K performance is not looking great. Being great at 1440p when your competitor is great at 4K doesn't make a halo product.

If it's because of a lack of FLOPS or because of low cache hitrate, is that because the GPU is shading too many pixels that aren't seen?

Also tiled deffered pixel shading belongs to culling belongs how it handls small and big poligons, if its strip or list polygons.. So many thinx have an influenece.
I don't know how we'll ever find out, when it comes to games.

The best example is tesselation. Everybody asked why AMD runs bad at Crysis. The looked at synthetic tesselation benchmarked and saw that amd is only bad at tasselation. Then they check crysis again and bingo. Crysis was totaly overtassaleted. How will you find out when you have any hints where AMDs weeknes is?

Of cause synthetics don't show you all information but it give you realy strong hints what could go wrong in games.
10 years ago those tessellation comparisons were interesting. Unigine Heaven was interesting too.

AMD defaults to "AMD optimised tessellation" these days, as I understand it. It could be having an effect on performance and IQ in current games, but there's no tech journalism these days that goes that deep as far as I can tell.

Another point of comparison is geometry shading. It turned out to be a dumb idea. NVidia scorched ahead in very specific synthetics - because the GS export data didn't leave the chip.

If you happened to watch Scott's Interview with HotHardware, he said the goal of IC is not just performance. It was a tradeoff vs die area, performance and power.
He specifically said if they would have needed a wider bus to get the same BW for more performance. And the power needed by wider bus and more memory chips means higher TBP. He also added that the memory controllers + PHY would occupy a significant footprint on the chip.
I did watch and wasn't edified.

In the end AMD beat NVidia by less than 5% in performance per watt (though there's more memory on 6800XT) and for the time being, 4K performance is not very good. 6900XT is not really going to make that better, either, since 20% more FLOPS (and other substantial advantages) in 6800XT versus 6800 brings about 11% more performance on average (varying from 4 to 19% based on Techspot).

A downclocked N22/N23 in mobile form would be very efficient looking at the chart below
View attachment 4965
That chart is almost as scummy as the NVidia equivalent, both making 2x performance per watt claims by cherry picking places on the curves that don't relate to the best performing cards being compared.

And according to banned member Navi 2x is getting a a lot of interest from Laptop OEMs for its efficiency which is what Scott also mentioned.
I'm glad to see that laptops is a place AMD can compete again, but they've plunked themselves back at Fury X versus 980 Ti in halo performance terms...
 
I've shown earlier today how mesh shading requires substantially different optimisation on Nvidia versus XSX.

Really interesting video thanks for sharing. Sharing within a workgroup is standard fare for compute APIs so it’s surprising that Turing prefers mesh shader workgroup size of just a single 32-wide wavefront. Is there some special sauce in RDNA that enables more efficient cross group data sharing beyond the usual LDS stuff?

The presenter emphasized that it’s hard to write mesh shaders that beat dedicated geometry setup and culling hardware units at their own game. But mesh shaders came very close. Maybe those hardware units go away next generation and all the old geometry pipeline stuff will be emulated on general compute shaders.
 
They seem to forget that this is AMDs first generation RT hardware while NV is over two years ahead on their second gen RT hw.

Nvidia "second gen" RT have the same performance than the first one, less than 5% diff at best cases.

In today's world I think it's better to have better rasterization than RT since the number of games with RT are very limited and the number of games with useful RT are like unicorns. And all of the "but future proof!" is no sense to me. Nvidia is barely making it in tern of performance and AMD is barely barely making so I don't think today's GPU will be able to use full RT in next gem games.

With that said AMD have the advantage of games RT implementation being though with AMD capabilities and limitations in mind and then (maybe) added extra effect for the PC ports. So I think AMD RT will age better than Nvidia's.


I really have no idea why we have RT GPUs...*at least* we are 1 gem away from truly usable perf. and the space on Die could be use to have faster and better GPUs capable of more "traditional" effect that would be more useful than slightly better shadows that eat 20-30-50% of the frame rate just so you can "admire" the difference in a static pic in a side by side comparative. Ridiculous...
 
Or there is a problem in a RBE, or in the scheduling HW of that SE.
Yes anything that justifies disabling a whole SE but taking into account probability theres more chances for defects to spread across CUs of all 4SEs vs crippling defects on just 1SE while the other 3SE remain intact. Wouldn't 64CU & 56CU configurations be more likely?
 
Really interesting video thanks for sharing. Sharing within a workgroup is standard fare for compute APIs so it’s surprising that Turing prefers mesh shader workgroup size of just a single 32-wide wavefront. Is there some special sauce in RDNA that enables more efficient cross group data sharing beyond the usual LDS stuff?
Frankly I was gobsmacked by this. If the video was from autumn 2018, just after Turing had arrived, I'd consider it an "early code" problem.

Ampere might be a whole lot "better". Honestly, I'm suspicious, so while it's a useful example of wild divergence between two platforms, I am doubtful that it'll have a substantial impact in games.

I'm not sure how mesh shading will come to PC games, because it looks like the fallback for older hardware isn't easy (whoops, performance sucks, never mind). I haven't spent much time on this subject. Maybe incompatible hardware is simply capped at "medium" geometry?

The presenter emphasized that it’s hard to write mesh shaders that beat dedicated geometry setup and culling hardware units at their own game. But mesh shaders came very close. Maybe those hardware units go away next generation and all the old geometry pipeline stuff will be emulated on general compute shaders.
My understanding of mesh shaders is that they are compute shaders with some connectivity in hardware to link up with the rasteriser. I don't really understand the hardware model of mesh shaders, though. It's useful to start with the mental model: "a mesh shader needs no geometry to consume", so it's an entirely compute-centric view of geometry.

Also, we're now learning that RDNA 2 features heavy primitive shader usage ("NGG") in place of traditional old geometry pipeline stages. The driver is re-writing lots of code, it seems. It's unclear to me how much like "compute shader" that is.

Will there ever be a D3D13? To remove things like VS?... Dunno.
 
...

I really have no idea why we have RT GPUs...*at least* we are 1 gem away from truly usable perf. and the space on Die could be use to have faster and better GPUs capable of more "traditional" effect ...

You have to start somewhere. You can't allocate time and money on RT, for years without testing it "in the field" on a commercial product imo. First T&L implentation (in a gaming card) was not great, first shaders neither, etc.
 
Nvidia "second gen" RT have the same performance than the first one, less than 5% diff at best cases.
It's quite a bit better in absolute frametimes. I think the "percentage performance loss when activated" metric is junk. This isn't MSAA, this is a whole new ballgame.

I really have no idea why we have RT GPUs...*at least* we are 1 gem away from truly usable perf.
I was hopeful that Ampere would be 50-100% faster than it turned out. It seems that bandwidth killed the ray (dio) star.

and the space on Die could be use to have faster and better GPUs capable of more "traditional" effect that would be more useful than slightly better shadows that eat 20-30-50% of the frame rate just so you can "admire" the difference in a static pic in a side by side comparative. Ridiculous...
Apparently AMD decided to spend very little die space on ray acceleration... Do we even know how much space NVidia has spent?

If bandwidth/latency really are the killers (demonstrated by Ampere?) then we're going nowhere without substantially cleverer algorithms. We're not going to get 2TB/s bandwidth in consumer cards any time soon.

Using yet more bandwidth by exporting ray query results to memory (DXR 1.0, as I understand it) to be consumed by the next shading pass seems painfully like the failed experiment that was GS.
 
You can't allocate time and money on RT, for years without testing it "in the field" on a commercial product imo. .

Yes you can and it's exactly what they did. and both decided to use their user base and beta tester and making them pay extra premium price for that "privilege".

It's quite a bit better in absolute frametimes. I think the "percentage performance loss when activated" metric is junk. This isn't MSAA, this is a whole new ballgame.

No. It's only better because it starts higher. We are talking about the performance on the RT cores here. With Ampere at best they are less than 5% faster.

Apparently AMD decided to spend very little die space on ray acceleration... Do we even know how much space NVidia has spent?
.

I prefer AMD approached. for today's games. If they can get a DLSS competitor then their RT will be "usable enough" for at least 1440p. But that is a big if.
 
You quoted me for the "Apparently AMD decided to spend very little die space on ray acceleration... Do we even know how much space NVidia has spent?", but I never said that :D

Yes you can and it's exactly what they did. and both decided to use their user base and beta tester and making them pay extra premium price for that "privilege".

I don't get it. If you "could", then neither amd or nvidia would have RT right now. Maybe my first sentence was wrongly formulated...

What I said was you can't have a perfect solution from the start when it's a new field. And if you decide to wait for ... The right process node, another big leap in the theoretical field, idk, then you will never release it... Devs/games will eat what you give them, it will never be enough anyway...

Now, as a consumer yeah you can say "it's not fast enough yet, I won't weight RT in the balance", and that's perfectly fine. But nVidia, AMD, Intel, Imgtech whoever ... have to start somewhere...
 
Yes anything that justifies disabling a whole SE but taking into account probability theres more chances for defects to spread across CUs of all 4SEs vs crippling defects on just 1SE while the other 3SE remain intact. Wouldn't 64CU & 56CU configurations be more likely?

Not when you also take into account binning for performance.
Edit- Or in this case, binning for performance differentials.
 
Yes you can and it's exactly what they did. and both decided to use their user base and beta tester and making them pay extra premium price for that "privilege".

No matter when they chose to bring hardware accelerated RT to market the fact is that there would be zero game support on day one and it would be many years before there is wide market adoption. So given those facts what percentage of die area would you suggest be dedicated to the first RT implementation? 25%, 50%?
 
[...]
Apparently AMD decided to spend very little die space on ray acceleration... Do we even know how much space NVidia has spent?
[...]

This may be not best the place but comparing TU102 and TU116;

TU102
a single Dispatch and BVH transversal area is ~1.4065mm2, TU102 has 36 TPC of which Dispatch and BVH transversal area in total is ~50.6325mm2 , 6.68% of total die size(754mm2).

TU116 Doesn't have RT accelerators,
a single Dispatch area is around ~0.5873mm2, TU116 has 12 TPC of which Dispatch area in total is ~7.0471mm2 , 2.4814% of total die size(284mm2).

Minus TU102's Dispatch and BVH transversal area and TU116's Dispatch area is ~0.8192mm2.

Finally, for TU102 that means ~29.4912mm2 BVH transversal in total, which is 3.9% of total die size. This is only a comparison of BVH area there're probably other regions that needs to be beefed up for RT acceleration.

TU102 die shot with annotations
TU116 die shot with annotations
 
Anybody know what Scott Herkelman wants to say with this sentence?

If this is in reference to Render Backends (RB+), there are 8. If this is referring to Shader Engine, there are 4.

P.s. please share and like the interview. Only with these interviews we get contact to persons inside amd.

News from hothardware:


The answer of the rasterizer let me get more ??? This is teh answer from HotHardware from AMD Scott Herkelman:
 
but the difference can't be worlds apart like suggested in this thread.
Nothing is suggested in this thread, game benchmaks, synthetics, 3D Mark, word of developers and actual technical details point to that fact exactly.

Dirt 5 uses very low amount of RT, that means the game is bound by rasterization not by RT, which won't make it the optimal case of showing RT differences, which leads to me believe that AMD will use that tactic a lot in their sponsored games, fill them with low effort RT effects just to claim they support DXR with good enough performance.

Even their marketing appears to be aware of their modest RT solution when they stated they target 1440p, which was the target of Turing GPUs 2 years ago.
 
Nvidia "second gen" RT have the same performance than the first one, less than 5% diff at best cases.

Difficult to believe this statement. For example blender in benchmark below is 3080 11.9s versus 2080ti 19.9s or 2080 super 26.1s. Pure ray tracing perf in ampere is much improved. Also minecraft or quake2 rtx would show how pure raytracing perf has advanced. I'm not 100% sure where 6800xt falls, but I believe it's somewhere around 38s for same bmw blender benchmark.

https://www.phoronix.com/scan.php?page=article&item=blender-290-rtx3080&num=3
 
Last edited:
Back
Top