AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

It makes me think that NVidia developed ray tracing acceleration for the professional rendering industry and then persuaded Microsoft that it was time to add it to D3D. Same as how tensor cores were designed for the machine learning industry.

The key problem is that brute force ray tracing as seen in Control, produces terrible performance. Brute force appears to have hit a ceiling with Ampere (if we talk about rays per unit bandwidth).


Brute force hardware is good for "professional" ray tracing.


There's little doubt that 720p video looks more realistic than 4K gameplay :)

So apparently what we need is upscaled 320p real time path tracing for gaming :)


How do you define "brute force" ? Devs have a lot of work to do to make improve their RT solutions too.
 
There's little doubt that 720p video looks more realistic than 4K gameplay :)

So apparently what we need is upscaled 320p real time path tracing for gaming :)

You might be on to something there.

How do you define "brute force" ? Devs have a lot of work to do to make improve their RT solutions too.

Yeah obviously raytracing today in games is as far from brute force as you can get. The ray counts are ridiculously low and we’re relying on stochastic accumulation and denoising to get by.

I don’t even know what brute force hardware even means. Both AMDs and Nvidia’s RT patents spend a lot of time on techniques to avoid casting unnecessary rays. So again, not brute force.
 
Last edited:
How do you define "brute force" ? Devs have a lot of work to do to make improve their RT solutions too.
Unbiased path tracing is effectively the definition of brute force: no limit on the count of bounces and the new rays produced by bounces. It's subtler than that, but that's the basic model. It's what professional rendering solutions aim to produce (though they can make more approximations). When you have seconds or minutes to spend on a single frame, you're more likely to be using (at least some) brute force :)

For real time rendering you have to make careful choices about the counts of rays and the counts of bounces.

There are other choices to make, such as the level of detail in the geometry used for ray tracing. e.g. you might use low quality trees when trying to decide how trees are reflected.

Then you can work out which parts of the scene contribute the most to the final visual quality.

Also you can decide how far rays are allowed to travel.

Luckily, temporal and spatial "averaging" (sorts of denoising) helps substantially. At typical gaming framerates, the ray tracing for one frame can substantially help other frames. Similarly, like we see with variable rate shading which is designed to allow developers to lower the pixel shading resolution in some parts of the frame (e.g. near the edge or very dark), you can vary the quality of ray tracing across the frame.

Just relying upon denoising gives you the poor performance seen in Control when trying to do lots of ray tracing techniques simultaneously.
 
Unbiased path tracing is effectively the definition of brute force: no limit on the count of bounces and the new rays produced by bounces. It's subtler than that, but that's the basic model. It's what professional rendering solutions aim to produce (though they can make more approximations). When you have seconds or minutes to spend on a single frame, you're more likely to be using (at least some) brute force :)

For real time rendering you have to make careful choices about the counts of rays and the counts of bounces.

There are other choices to make, such as the level of detail in the geometry used for ray tracing. e.g. you might use low quality trees when trying to decide how trees are reflected.

Then you can work out which parts of the scene contribute the most to the final visual quality.

Also you can decide how far rays are allowed to travel.

Luckily, temporal and spatial "averaging" (sorts of denoising) helps substantially. At typical gaming framerates, the ray tracing for one frame can substantially help other frames. Similarly, like we see with variable rate shading which is designed to allow developers to lower the pixel shading resolution in some parts of the frame (e.g. near the edge or very dark), you can vary the quality of ray tracing across the frame.

Just relying upon denoising gives you the poor performance seen in Control when trying to do lots of ray tracing techniques simultaneously.


But from what I get from DF videos about RT, and other media, devs are already optimising where RT is done, and how it's done, they are careful about amount of rays&bounces.
 
But from what I get from DF videos about RT, and other media, devs are already optimising where RT is done, and how it's done, they are careful about amount of rays&bounces.
I'm hopeful there's many more steps along the road of real time ray tracing, in other words I think devs have barely started to optimise.
 
But from what I get from DF videos about RT, and other media, devs are already optimising where RT is done, and how it's done, they are careful about amount of rays&bounces.

With the data we have now, It seems Nvidia is much better at RT. With multiplatform game, PS5 and Xbox Series X|S are very important for publisher and studios. It means the requirement will be made around this console. Out of raytracing it will probably help AMD because this time they aren't so far we see it with AC Valhalla and Dirt 5. I mean if RDNA2 performance in rasterization were not good, it would not have help a lot.

For RT it will probably be use for shadows or specular reflection and maybe in some rare title the two effects together. GI will be done using other methods.

Some studios like 4A games or Quantic Dreams want to create rendering pipline centered around RT but other engine like UE 5 will not use RT at all.
 
Last edited:
Yes ant this is strange. You get only 4 primitves after culling but you have 8 scan converter which convert the primitives to pixels. This will be a total unbalanc? This makes no sense.



Also this makes no sense. If both have the same geometry processor Navi21 could not be faster than Navi10.

Maybe @CarstenS can provide some detail information what he have measured with values? Thank you in advanced.
I've provided answers to your questions already. Maybe I was too vague before. It's not unbalanced to have a primitive rasterizer perform coarse rasterization and feed the output to multiple fine rasterizers to output pixels. Also, don't bother comparing to Navi10 you'll only get confused. Certainly testing Navi21 is the best way to understand the performance characteristics.
 
I've provided answers to your questions already. Maybe I was too vague before. It's not unbalanced to have a primitive rasterizer perform coarse rasterization and feed the output to multiple fine rasterizers to output pixels. Also, don't bother comparing to Navi10 you'll only get confused. Certainly testing Navi21 is the best way to understand the performance characteristics.

https://patents.google.com/patent/US10062206B2/en

Not 100% sure but I think I understood that reference.

XS5LK.gif
 
Yes ant this is strange. You get only 4 primitves after culling but you have 8 scan converter which convert the primitives to pixels. This will be a total unbalanc? This makes no sense.
Doesn't seem unbalanced to me. It looks like each Raster Unit has 2 scan converters. Each RDNA1 scan converter rasterised a triangle with 16 fragments coverage. Now with RDNA2, we have 2 scan converters working on 1 triangle, and capable of rasterising smaller triangles to larger triangles with coverage ranging from 1-32 fragments per Raster Unit.

Peak triangle throughout has increased with increased GPU clocks, and triangle size variance is handled better with higher rasterisation efficiency. So geometry throughput is closer to peak throughput.
I can't find it at the moment but there have definitely been a couple of slides that confirm it is 2 pre-culled in, 1 culled out for the Primitive units.
Found the slide - the details are under "Geometry Processor" below, where Navi21 has 4 Prim Units, sends 4 triangles to rasterise and culls 8 triangles per cycle. So each Prim Unit still culls 2 triangles and sends 1 to Raster Unit - unchanged from RDNA1.

wWVcZow.png
 
Some studios like 4A games or Quantic Dreams want to create rendering pipline centered around RT but other Engine like UE 5 will not use RT at all.

Of course UE5 will support DXR.

You are probably referring to Lumen not using HW-RT, which is a bit disappointing as it leaves fixed function hardware unused and thus wasting performance. I think it is possible Lumen might get replaced by RTXGI, as it basically does the same (multi-bounce, dynamic GI) as Lumen but using hardware accelerated RT to update light probes which makes it very efficient.

RTXGI works on any DXR capable GPU, so it should work for Xbox and AMD RDNA2 PC GPUs as well.
 
Of course UE5 will support DXR.

You are probably referring to Lumen not using HW-RT, which is a bit disappointing as it leaves fixed function hardware unused and thus wasting performance. I think it is possible Lumen might get replaced by RTXGI, as it basically does the same (multi-bounce, dynamic GI) as Lumen but using hardware accelerated RT to update light probes which makes it very efficient.

RTXGI works on any DXR capable GPU, so it should work for Xbox and AMD RDNA2 PC GPUs as well.

Lumen system is very different than RTXGI and this is not what they have in mind at least probably for this console generation, Lumen is probably easier to use for lighting artist because there is no probes at all.

https://www.eurogamer.net/articles/...eal-engine-5-playstation-5-tech-demo-analysis

Unreal Engine 4 uses RT but not UE 5 after maybe in the future they will RT for specular reflection. The engine was designed around PS5 and Xbox Series X|S.

Demon's souls uses a froxel based GI system based on probes.
 
Last edited:
I've provided answers to your questions already. Maybe I was too vague before. It's not unbalanced to have a primitive rasterizer perform coarse rasterization and feed the output to multiple fine rasterizers to output pixels. Also, don't bother comparing to Navi10 you'll only get confused. Certainly testing Navi21 is the best way to understand the performance characteristics.

Doesn't seem unbalanced to me. It looks like each Raster Unit has 2 scan converters. Each RDNA1 scan converter rasterised a triangle with 16 fragments coverage. Now with RDNA2, we have 2 scan converters working on 1 triangle, and capable of rasterising smaller triangles to larger triangles with coverage ranging from 1-32 fragments per Raster Unit.

But this make no sence. Not many triangles are these days bigger than 16 pixel. You have more issues to rasterizes small triangles which are smaller than a pixel.

For polygons which are smaller than a pixel now you can only rastzerize 4 polygons in total. That means you have 4 scanconverter which have work and the other 4 will have a rest. If you see this makes no sense for furter game title. If you think about nanite engine whith much small polygons the new rasterzier will have no advantage...
 
Last edited:
It's an intermediate step, think of it as a sorting mechanism.

If I get it right, you now have the 4 coarse rasterizer á 16 pixels, which is fine for now and probably wasn't worth it to redesign the coarse rasterizers. This is the hard limit for coarse grained geometry and that's what traditional fill rate tests measure via one full screen quad (or at least very few very large triangles).

If the size check concludes that you would waste rasterizing performance because the triangles are too small, it skips the coarse rasterizer and sends the data to the smaller rasterizers, which operate more efficiently at smaller polygons.

edit for spelling
 
Last edited:
Yes ant this is strange. You get only 4 primitves after culling but you have 8 scan converter which convert the primitives to pixels. This will be a total unbalanc? This makes no sense.
Culling is likely to require a small fixed number of clock cycles, while scan conversion requires a variable amount of work and time. The optimal ratio between these 2 functions likely changes significantly throughout the frame but on average it might very well be a 1-to-2 ratio gives better perf than 1-to-1. For instance it could help reducing pipeline bubbles post scan conversion, thus reducing the time CUs sit idle waiting for work to do.
 
https://patents.google.com/patent/US10062206B2/en

Not 100% sure but I think I understood that reference.

XS5LK.gif
That's an interesting patent, but I wasn't referencing it.

Coarse rasterization before fine rasterization is not a new concept. For example, you can quickly coarse rasterize and perform a hierarchical depth test before performing fine rasterization. The reason I referenced it is because the driver define that's causing confusion is probably referring to fine rasterizer and may be related to RB+.

But this make no sence. Not many triangles are these days bigger than 16 pixel. You have more issues to rasterizes small triangles which are smaller than a pixel.

For polygons which are smaller than a pixel now you can only rastzerize 4 polygons in total. That means you have 4 scanconverter which have work and the other 4 will have a rest. If you see this makes no sense for furter game title. If you think about nanite engine whith much small polygons the new rasterzier will have no advantage...
There are still triangles > 16 pixels. Someone would need to analyze some games to see how often they occur.

Many triangles smaller than a pixel will be thrown out as part of the faster culling process. Nanite is using compute for most small triangles and it remains to be seen if other engines will follow suit. It's likely IHVs will continue to improve rasterization performance while trying to spend as few transistors as possible.
 
It's an intermediate step, think of it as a sorting mechanism.

If I get it right, you now have the 4 coarse rasterizer á 16 pixels, which is fine for now and probably wasn't worth it to redesign the coarse rasterizers. This is the hard limit for coarse grained geometry and that's what traditional fill rate tests measure via one full screen quad (or at least very few very large triangles).

If the size check concludes that you would waste rasterizing performance because the triangles are too small, it skips the coarse rasterizer and sends the data to the smaller rasterizers, which operate more efficiently at smaller polygons.

edit for spelling

Thank you for the answer. But how can a rasterizer be more efficent. Even at gcn, triangles smaller than a polygon can be converted in 1 clock and also now you have only 4 rasterizer which can handle 1 polygon per clock. It is the same like gcn. Only the culling engine before is new. So how do you improve rasterizsation with 2 kindes of rasterizer when both do the same?

I see only advantage for big bpolygons over 4x4 pixel size.

Edit: Also it is strange that each array have it's own rasterizer (Scan converter) if you look into linux driver:
num_se: 4 You have 4 sahder engins
num_sh_per_se: 2 you have 2 Shaderarrys for each shaderengine
num_sc_per_sh: 1 and each shaderarry have 1 scan converter for its own?:
So for scan converter you get: num_se x num_sh_per_se x num_sc_per_sh = 4x2x1 = 8

But if i follow that we have 2 rasterizer for a shaderengine it should look like:
num_se: 4
num_sc_per_se: 2

@CarstenS how is 6800xt performing against a 3090 with its 7 gpc when you have 0% culling (list and strip) polyogons? Have AMD here an advantage?
 
Last edited:
There's little doubt that 720p video looks more realistic than 4K gameplay :)

So apparently what we need is upscaled 320p real time path tracing for gaming :)

Perhaps you were joking but this is far from a stupid idea. With good post-processing AA, and perhaps clever texturing tricks, it might produce very interesting results. For instance, you might render and especially sample textures at 4K, but path-trace at 320p, and just interpolate the path-tracing results for pixels where you don't path-trace.
 
Of course UE5 will support DXR.

You are probably referring to Lumen not using HW-RT, which is a bit disappointing as it leaves fixed function hardware unused and thus wasting performance. I think it is possible Lumen might get replaced by RTXGI, as it basically does the same (multi-bounce, dynamic GI) as Lumen but using hardware accelerated RT to update light probes which makes it very efficient.

RTXGI works on any DXR capable GPU, so it should work for Xbox and AMD RDNA2 PC GPUs as well.

This is actually an open question. With alternative geometry representations you technically have neither triangle meshes and their associated UV mapping in memory, nor do you possibly have the need to have such at all. With a few recent advancements, very recent, you can even get detailed sharp geo out of indirect tracing with implicit surfaces, which is the best guess as to what UE5 is using.

Maybe they'll have it for animated meshes. As of the last presentation those were the only triangle meshes used, and afaik implicit surface animation is still a semi open question. Never the less Dreams manages to do it somehow. It's certain UE5 has it's own top level acceleration structure using SDF's, and I'd be surprised if they didn't embed any triangle meshes in a blas into that top level. Either way UE5 already supports at least basic animated characters using SDF's, so that's going to be used for diffuse GI and possibly even semi-glossy reflections, at the very least. Meaning the only likely use of DXR in UE5 will be for sharp reflections, and maybe for caustics if anyone really wants that accuracy.

As to "wasting performance" it's just the opposite. Alternate tracing structures are so much faster than hardware raytracing regardless of hardware vendor that it's no question which should be used. That demo was doing multi bounce recursive diffuse GI and glossy reflections on a scale and detail level that would crush a 3090 running at 720p; while doing so on a sub 3070 equivalent at 1440p. While I do expect them to bake into probes as it solves a lot of energy loss problems they're encountering, even then doing that with Lumen is faster than hardware raytracing.
 
Last edited:
I have not doubt post PS5 and Xbox Series, the situation will change with RT being more important but I think the RT will be very selective for the next 5 years.

https://gfxspeak.com/2020/10/09/unreal-engine-almost/

Looking at the results one might ask does this make ray tracing obsolete? I asked a few of my friends about it.

Steven Parker, Mr. Ray Tracing at Nvidia said, “Ray tracing is great for high polygon count. And secondary illumination still matters, where RT is invaluable.”

Brian Savery, Mr. Ray Tracing at AMD, told me, “Much more so. The difference is that pure ray tracing is more dependent on the number of pixels times the number of ray samples. Rasterization techniques are more dependent on the number of polygons. Having these “magic” LOD’s is very nice with ray tracing as they fit well with BVH acceleration structures and diffuse rays, etc. That is, if you have a sharp reflection you might trace against a “fine” LOD vs a rough reflection or shadow might trace against the “coarse” LOD.”

David Laur, Mr. Ray Tracing at Pixar said, “In fact, this situation can be when ray tracing is most relevant. The cost of shading is the critical metric. A good ray tracer will only shade one or a couple of hit points on tiny triangles that are visible and won’t run expensive shading at all on parts of triangles that are hidden. Most scanline/gl style renderers will need to fully shade every triangle no matter what, then z-buffer it in, whereupon it might be totally hidden, or all smashed into a subpixel. In the case of a triangle much smaller than a pixel, the ray tracer can also adjust things like texture mipmap level lookups based on incident ray-cone footprint size, e.g. when far away or after a couple of bounces off of curved car body surfaces, tree leaf textures can be accessed at their coarsest level, but those same nearby leaves seen directly from the camera would shade in finer detail.”

And Tim Sweeney, the boss (and founder) of Epic, told me, “Certainly fixed-function rasterization hardware will be obsolete in less than a decade. What replaces it is a mix of raytracing in the traditional sense, and new hybrid compute shader algorithms that map pixels to primitives and shader inputs using a variety of algorithms that are optimized to particular data representations, geometry densities, and static versus dynamic tradeoffs.”
 
Back
Top