Game development presentations - a useful reference

Epic went with DXR 1.1 style inline RT which seems to run better on XSX at least. That means no fancy hardware ray sorting which they acknowledge. They also acknowledge the efficiency losses due to divergent threads but it’s not as bad compared to DXR 1.0 overhead.

Hopefully they will test DXR 1.0 on hardware that does support ray sorting for a future presentation.
It seems to me that inline is used selectively and 1.0 is used at other times.

Being able to load BVHs that have been offline-optimised for consoles is a massive deal. @JoeJ would be happy...
 
I'd prefer they go with DIY sorting in the shader for shader coherence (not really an option for ray coherence). Drive the hardware to programmability.
 
I'd prefer they go with DIY sorting in the shader for shader coherence (not really an option for ray coherence). Drive the hardware to programmability.

If they are keen enough on hand-optimizing through software to be using software rasterization for nanite, I expect them to take simmilar efforts with their RT code. Just a matter of time.
 
Being able to load BVHs that have been offline-optimised for consoles is a massive deal. @JoeJ would be happy...
Thanks, but i'm happy after we can do this on PC too.
I would not mind selecting my GPU model before downloading a game... :mad:
(Edit: I would not mind to bake BVH after installing the game and waiting for that either)

However, i have some hope on NVs DMM. Displacement mapped meshes, BVH is built only over the low poly base mesh.
Building this at runtime might be acceptable, and it's a really simple solution to achieve high detail, so maybe others adopt the standard.
No LOD solution, but could work well enough so we can postpone this problem another decade, until RT gets flexible enough.
 
Last edited:
Thanks, but i'm happy after we can do this on PC too.
I would not mind selecting my GPU model before downloading a game... :mad:
(Edit: I would not mind to bake BVH after installing the game and waiting for that either)

However, i have some hope on NVs DMM. Displacement mapped meshes, BVH is built only over the low poly base mesh.
Building this at runtime might be acceptable, and it's a really simple solution to achieve high detail, so maybe others adopt the standard.
No LOD solution, but could work well enough so we can postpone this problem another decade, until RT gets flexible enough.

Epic's double TLAS LOD seems to work "good enough". I'm not sure you actually need any huge amount of detail in RT scene representations with a hybrid renderer either. The Matrix Awakens demo still looks better than that nigh pathtraced Cyberpunk mode, and runs 4x as fast even with that demo being severely unoptimized on PC. Combined with Matrix having an extreme disparity between nanite models and RT representation, while cyberpunk assumedly has no disparity, I emphasize that it's not worth caring about hyper detailed RT for hybrid.

As for the future, for the future certainly. But the unending and labyrinthine code complexity of Lumen suggests to me that for tracing at least Brian Karis's "You don't need a unified representation if you can settle for good enough" maybe isn't the best strategy. Point rendering's problems seem solveable, and their speed is... well unlimited detail with raytraced AO and soft shadows is running at 1080p on a PS4, so.
 
Epic's double TLAS LOD seems to work "good enough". I'm not sure you actually need any huge amount of detail in RT scene representations with a hybrid renderer either. The Matrix Awakens demo still looks better than that nigh pathtraced Cyberpunk mode, and runs 4x as fast even with that demo being severely unoptimized on PC. Combined with Matrix having an extreme disparity between nanite models and RT representation, while cyberpunk assumedly has no disparity, I emphasize that it's not worth caring about hyper detailed RT for hybrid.
Depends. If you use RT mainly for GI, geometry approximation is fine.
But i already have GI which does that. In such case we want RT for high frequency direct lighting, which is what approximate GI solutions (including RTX GI for example) can't do. And for this we want high detail geometry. How else should we get proper shadows and sharp reflections? Plus, those are exactly the things RT can do at ideal performance.

Thus i keep saying RT is currently pretty broken and flexibility limitations hinder it's own applications.
DMM might be a work around for details, but i'm not sure if it's meant for rasterization as well (slow sub pixel triangles, quality way below Nanite).
So in the worst case we might use it for RT only, causing doubling the storage and memory costs, and we still need a shadow bias to deal with the geometry mismatch.
The day when RT actually solves more problems than it adds seems at least as far away as the day when it becomes actually affordable. : /
 
How high resolution are your meshes that you need DMM for proper performance? Everything sub-pixel triangles?
Not sure yet. Currently i use my geoemtry mainly for terrain, which is generated from 3D simulation, not the usual heightmap approaches.
3D is orders of magnitudes slower than 2D so i have no hope i could generate milimeter scale data this way.
For such high detail i would need to amplify the low res geometry on the client, e.g. using some sample based texture synthesis approach. Not sure if i have the time to work on this.
For architecture and any human made stuff Nanite is generally better than my approach.

However, the problem is not tied to high detail exclusvely, as LOD is mainly useful to increase efficiency. We want LOD no matter what's the resolutin in question.
The reason the industry mostly gets around it is they have converged to a workflow of compositing every scene from small models, e.g. the repetive building modules we saw in the Matrix demo. For such small models, discrete LODs work fine.
Such modules are great for many things: Model a library of doors and windows, and build a whole city from it to save costs and storage.
But it does not work well for anything natural. Terrain has some global flow at all scales, every tree and rock is similar, but unique. Composing this from model instances is state of the art, but very compromised. And if we try to do better, we need some continuous LOD solution.

It's interesting to remeber games like Doom and Quake, becasue for them the whole levels were all unique geoemtry (BSP). Later they added support to place some detailed static model like a statue here and there, and after some years, low poly BSP geometry was replaced entirely with instanced model building blocks.
Now, armed with advanced procedural generation methods, we might eventually want to get back to this. To get something like Rage, but unique and detailed geometry as well, not just unique texturing.

Personally i like games like Amid Evil or Prodeus a lot. They are retro and low poly, but i know in advance every place is unique, so my motivation to explore is way higher than with current AAA games.
I also have serious issues with orientation in modern games, because no matter in which direction i look, it looks the same.
So that's my motivations to work on this. Though, i'm not happy with progress and might better stop to focus just on GI.
 
Hopefully they will test DXR 1.0 on hardware that does support ray sorting for a future presentation.
On page 73, they specifically state:

And on PC we don’t have those intrinsics so we still need bindless.
And because bindless support is currently under development in Unreal Engine, we don’t support inline ray tracing in DXR on PC and still use ray tracing pipelines.
 
the earlier Intel paper introducing 'traversal shaders' before
According to Epic, traversal shaders have limitations on consoles:

-The BVH traversal on consoles is implemented as a regular compute shader and only AABB and triangle intersections are HW accelerated. Which means rays are not going to be reordered to improve coherence by the hardware.
-In complex scenes the difference in traversal iterations can be massive even between similar rays
-Shader might be just generating incoherent rays

in the end, it manifests as long tails in ray tracing shaders when the entire GPU is waiting for a few lingering threads that take many more iterations than others. And in some extreme cases, GPU will spend more time waiting than doing anything.

Page 80.
 
As for the future, for the future certainly. But the unending and labyrinthine code complexity of Lumen suggests to me that for tracing at least Brian Karis's "You don't need a unified representation if you can settle for good enough" maybe isn't the best strategy.
Weird argument to make while at the same time arguing for hybrid rendering.

Regardless of how GI is implemented a complete ray tracing solution for primary, shadow and reflection rays simplifies a lot of things too. Complete going forward will mean CLOD, not nearly enough rays for Monte Carlo to do it all.
 
According to Epic, traversal shaders have limitations on consoles:

-The BVH traversal on consoles is implemented as a regular compute shader and only AABB and triangle intersections are HW accelerated. Which means rays are not going to be reordered to improve coherence by the hardware.
-In complex scenes the difference in traversal iterations can be massive even between similar rays
-Shader might be just generating incoherent rays

in the end, it manifests as long tails in ray tracing shaders when the entire GPU is waiting for a few lingering threads that take many more iterations than others. And in some extreme cases, GPU will spend more time waiting than doing anything.

Page 80.
Not sure which documant you're referring to. But the quote sounds more applicable to inline vs. 1.0, or just raytracing in general.
Traversal shaders on console (RDNA2 HW) may not be as efficient as on other HW (Intel), but the point is that they are at all possible.
So calling this possiblity 'limited' is not right. Instead it's an upside of AMDs compute approach that they can implement any API changes if they decide to do so.

Contrary, NV can probably not support traversal shaders on their fixed function traversal units. Nor could they implement a compute traversal fallback which has access to accelerated intersection. (Just guessing)
So the feature may not become relevant or exposed at all until this changes i'm afraid.

It's things like that why i have sympathized with vendor API ideas in the past. The differences between vendor implementations hold RT back much more than raster, making progress and innovation slow and hard for bad reasons.
I never understood why everybody hates Khronos APIs for their extensions and praises Microsofts for their lack of it. Nobody is forced to use them.
RT would be a new and good reason to add extension mechanisms to DirectX, imo.

Coming back to traversl shaders, i'm not sure if Intels 'material sorting' could also sort rays to programmable traversal branches. Awesome if so. Because their HW groups 8 threads not 32, it's easier for them to achieve benefit from such sorting methods, and they suffer less from divergence in general.
I think such HW support is necessary to make traversal shaders really attractive. Otherwise we amplify the problem of divergent data paths just to turn discrete LOD into stochastic continuous LOD.
But if such support is given, this nice and simple LOD solution becomes attractive enough to consider replacing rasterization entirely with RT, because rasterization could do the same only with rendering everything twice.
 
According to Epic, traversal shaders have limitations on consoles:

-The BVH traversal on consoles is implemented as a regular compute shader and only AABB and triangle intersections are HW accelerated. Which means rays are not going to be reordered to improve coherence by the hardware.

That might be a god send. It means devs will experiment way more with different aproaches to dispaching ray-casts, binning them, reordering, different strucures... Perhaps some will try to mix triangles tracing with other primitives...

The consoles will end up being a lab for experimentation that will ultimatelly inform the next interations of PC HW RT.
 
That might be a god send. It means devs will experiment way more with different aproaches to dispaching ray-casts, binning them, reordering, different strucures... Perhaps some will try to mix triangles tracing with other primitives...
Your words in their ears, but i guess this won't happen.
Cross platform is increasingly important, so the weakest platform holds the others back. No matter if we talk about performance or flexibility. :(
Hopefully i'm wrong...
 
Back
Top