It's not DXR 1.0 vs 1.1, it's hybrid rasterization+rt vs rt onlyAMD GPUs perform worse with DXR1.1 3D Mark test than DXR1.0.
It's not DXR 1.0 vs 1.1, it's hybrid rasterization+rt vs rt onlyAMD GPUs perform worse with DXR1.1 3D Mark test than DXR1.0.
It makes me think that NVidia developed ray tracing acceleration for the professional rendering industry and then persuaded Microsoft that it was time to add it to D3D. Same as how tensor cores were designed for the machine learning industry.
The key problem is that brute force ray tracing as seen in Control, produces terrible performance. Brute force appears to have hit a ceiling with Ampere (if we talk about rays per unit bandwidth).
Brute force hardware is good for "professional" ray tracing.
There's little doubt that 720p video looks more realistic than 4K gameplay
So apparently what we need is upscaled 320p real time path tracing for gaming
There's little doubt that 720p video looks more realistic than 4K gameplay
So apparently what we need is upscaled 320p real time path tracing for gaming
How do you define "brute force" ? Devs have a lot of work to do to make improve their RT solutions too.
Unbiased path tracing is effectively the definition of brute force: no limit on the count of bounces and the new rays produced by bounces. It's subtler than that, but that's the basic model. It's what professional rendering solutions aim to produce (though they can make more approximations). When you have seconds or minutes to spend on a single frame, you're more likely to be using (at least some) brute forceHow do you define "brute force" ? Devs have a lot of work to do to make improve their RT solutions too.
Unbiased path tracing is effectively the definition of brute force: no limit on the count of bounces and the new rays produced by bounces. It's subtler than that, but that's the basic model. It's what professional rendering solutions aim to produce (though they can make more approximations). When you have seconds or minutes to spend on a single frame, you're more likely to be using (at least some) brute force
For real time rendering you have to make careful choices about the counts of rays and the counts of bounces.
There are other choices to make, such as the level of detail in the geometry used for ray tracing. e.g. you might use low quality trees when trying to decide how trees are reflected.
Then you can work out which parts of the scene contribute the most to the final visual quality.
Also you can decide how far rays are allowed to travel.
Luckily, temporal and spatial "averaging" (sorts of denoising) helps substantially. At typical gaming framerates, the ray tracing for one frame can substantially help other frames. Similarly, like we see with variable rate shading which is designed to allow developers to lower the pixel shading resolution in some parts of the frame (e.g. near the edge or very dark), you can vary the quality of ray tracing across the frame.
Just relying upon denoising gives you the poor performance seen in Control when trying to do lots of ray tracing techniques simultaneously.
I'm hopeful there's many more steps along the road of real time ray tracing, in other words I think devs have barely started to optimise.But from what I get from DF videos about RT, and other media, devs are already optimising where RT is done, and how it's done, they are careful about amount of rays&bounces.
But from what I get from DF videos about RT, and other media, devs are already optimising where RT is done, and how it's done, they are careful about amount of rays&bounces.
I've provided answers to your questions already. Maybe I was too vague before. It's not unbalanced to have a primitive rasterizer perform coarse rasterization and feed the output to multiple fine rasterizers to output pixels. Also, don't bother comparing to Navi10 you'll only get confused. Certainly testing Navi21 is the best way to understand the performance characteristics.Yes ant this is strange. You get only 4 primitves after culling but you have 8 scan converter which convert the primitives to pixels. This will be a total unbalanc? This makes no sense.
Also this makes no sense. If both have the same geometry processor Navi21 could not be faster than Navi10.
Maybe @CarstenS can provide some detail information what he have measured with values? Thank you in advanced.
I've provided answers to your questions already. Maybe I was too vague before. It's not unbalanced to have a primitive rasterizer perform coarse rasterization and feed the output to multiple fine rasterizers to output pixels. Also, don't bother comparing to Navi10 you'll only get confused. Certainly testing Navi21 is the best way to understand the performance characteristics.
Doesn't seem unbalanced to me. It looks like each Raster Unit has 2 scan converters. Each RDNA1 scan converter rasterised a triangle with 16 fragments coverage. Now with RDNA2, we have 2 scan converters working on 1 triangle, and capable of rasterising smaller triangles to larger triangles with coverage ranging from 1-32 fragments per Raster Unit.Yes ant this is strange. You get only 4 primitves after culling but you have 8 scan converter which convert the primitives to pixels. This will be a total unbalanc? This makes no sense.
Found the slide - the details are under "Geometry Processor" below, where Navi21 has 4 Prim Units, sends 4 triangles to rasterise and culls 8 triangles per cycle. So each Prim Unit still culls 2 triangles and sends 1 to Raster Unit - unchanged from RDNA1.I can't find it at the moment but there have definitely been a couple of slides that confirm it is 2 pre-culled in, 1 culled out for the Primitive units.
Some studios like 4A games or Quantic Dreams want to create rendering pipline centered around RT but other Engine like UE 5 will not use RT at all.
Of course UE5 will support DXR.
You are probably referring to Lumen not using HW-RT, which is a bit disappointing as it leaves fixed function hardware unused and thus wasting performance. I think it is possible Lumen might get replaced by RTXGI, as it basically does the same (multi-bounce, dynamic GI) as Lumen but using hardware accelerated RT to update light probes which makes it very efficient.
RTXGI works on any DXR capable GPU, so it should work for Xbox and AMD RDNA2 PC GPUs as well.
I've provided answers to your questions already. Maybe I was too vague before. It's not unbalanced to have a primitive rasterizer perform coarse rasterization and feed the output to multiple fine rasterizers to output pixels. Also, don't bother comparing to Navi10 you'll only get confused. Certainly testing Navi21 is the best way to understand the performance characteristics.
Doesn't seem unbalanced to me. It looks like each Raster Unit has 2 scan converters. Each RDNA1 scan converter rasterised a triangle with 16 fragments coverage. Now with RDNA2, we have 2 scan converters working on 1 triangle, and capable of rasterising smaller triangles to larger triangles with coverage ranging from 1-32 fragments per Raster Unit.
Culling is likely to require a small fixed number of clock cycles, while scan conversion requires a variable amount of work and time. The optimal ratio between these 2 functions likely changes significantly throughout the frame but on average it might very well be a 1-to-2 ratio gives better perf than 1-to-1. For instance it could help reducing pipeline bubbles post scan conversion, thus reducing the time CUs sit idle waiting for work to do.Yes ant this is strange. You get only 4 primitves after culling but you have 8 scan converter which convert the primitives to pixels. This will be a total unbalanc? This makes no sense.
That's an interesting patent, but I wasn't referencing it.https://patents.google.com/patent/US10062206B2/en
Not 100% sure but I think I understood that reference.
There are still triangles > 16 pixels. Someone would need to analyze some games to see how often they occur.But this make no sence. Not many triangles are these days bigger than 16 pixel. You have more issues to rasterizes small triangles which are smaller than a pixel.
For polygons which are smaller than a pixel now you can only rastzerize 4 polygons in total. That means you have 4 scanconverter which have work and the other 4 will have a rest. If you see this makes no sense for furter game title. If you think about nanite engine whith much small polygons the new rasterzier will have no advantage...
It's an intermediate step, think of it as a sorting mechanism.
If I get it right, you now have the 4 coarse rasterizer á 16 pixels, which is fine for now and probably wasn't worth it to redesign the coarse rasterizers. This is the hard limit for coarse grained geometry and that's what traditional fill rate tests measure via one full screen quad (or at least very few very large triangles).
If the size check concludes that you would waste rasterizing performance because the triangles are too small, it skips the coarse rasterizer and sends the data to the smaller rasterizers, which operate more efficiently at smaller polygons.
edit for spelling
There's little doubt that 720p video looks more realistic than 4K gameplay
So apparently what we need is upscaled 320p real time path tracing for gaming
Of course UE5 will support DXR.
You are probably referring to Lumen not using HW-RT, which is a bit disappointing as it leaves fixed function hardware unused and thus wasting performance. I think it is possible Lumen might get replaced by RTXGI, as it basically does the same (multi-bounce, dynamic GI) as Lumen but using hardware accelerated RT to update light probes which makes it very efficient.
RTXGI works on any DXR capable GPU, so it should work for Xbox and AMD RDNA2 PC GPUs as well.
Looking at the results one might ask does this make ray tracing obsolete? I asked a few of my friends about it.
Steven Parker, Mr. Ray Tracing at Nvidia said, “Ray tracing is great for high polygon count. And secondary illumination still matters, where RT is invaluable.”
Brian Savery, Mr. Ray Tracing at AMD, told me, “Much more so. The difference is that pure ray tracing is more dependent on the number of pixels times the number of ray samples. Rasterization techniques are more dependent on the number of polygons. Having these “magic” LOD’s is very nice with ray tracing as they fit well with BVH acceleration structures and diffuse rays, etc. That is, if you have a sharp reflection you might trace against a “fine” LOD vs a rough reflection or shadow might trace against the “coarse” LOD.”
David Laur, Mr. Ray Tracing at Pixar said, “In fact, this situation can be when ray tracing is most relevant. The cost of shading is the critical metric. A good ray tracer will only shade one or a couple of hit points on tiny triangles that are visible and won’t run expensive shading at all on parts of triangles that are hidden. Most scanline/gl style renderers will need to fully shade every triangle no matter what, then z-buffer it in, whereupon it might be totally hidden, or all smashed into a subpixel. In the case of a triangle much smaller than a pixel, the ray tracer can also adjust things like texture mipmap level lookups based on incident ray-cone footprint size, e.g. when far away or after a couple of bounces off of curved car body surfaces, tree leaf textures can be accessed at their coarsest level, but those same nearby leaves seen directly from the camera would shade in finer detail.”
And Tim Sweeney, the boss (and founder) of Epic, told me, “Certainly fixed-function rasterization hardware will be obsolete in less than a decade. What replaces it is a mix of raytracing in the traditional sense, and new hybrid compute shader algorithms that map pixels to primitives and shader inputs using a variety of algorithms that are optimized to particular data representations, geometry densities, and static versus dynamic tradeoffs.”