Direct3D feature levels discussion

Mega Geometry sounds mostly like a s/w feature so yeah should work on all RTX cards.
I still wonder what happened to Ada's DMMs though, they are missing in the RTX Kit...
 
I still wonder what happened to Ada's DMMs though, they are missing in the RTX Kit...
They are missing from any of NVIDIA's projects, Portal RTX and all path traced games lacked any DMMs implementation, despite implementing SER and OMMs. Even the NVRTX branch of UE5 lacks DMMs entirely despite supporting OMMs and SER. NVIDIA said DMMs and Nanite work the same way, so DMMs was dropped from UE5. It seems RTX Mega Geometry is the more potent replacement of DMMs, vastly better performance and works on all RTX GPUs too.
 

So, are neural shaders Blackwell only or not? Overclock3d says they are but Microsoft talks about bringing support to other GPU vendors as well. If Ada would not support Neural Rendering, then I'm pretty sure the other GPUs would not get it either as they are far behind Nvidia in that stuff.

But maybe they are talking about future hardware from AMD, Intel and Qualcomm? I'm really confused.
 
NVIDIA said DMMs and Nanite work the same way
DMMs is on-chip geometric displacement which is then traced against inside the RT core. So you feed a low detail geo + displacement map (a texture basically) into the GPU which then generates the displaced geometry and traces against that and outputs the results (high detail geo + tracing results on that geo). This isn't at all similar to Nanite which does real geometry and feeds that into the GPU - even when doing tessellation/displacement it sends the already displaced geo into the GPU. So while the results can be similar especially in the last case the way they are achieved is different.
The MG feature seem to just build into how Nanite and other games with high / virtualized geo are handling geometry and RT against that geometry. So not exactly a substitute but a different approach to the same problem. It is of course more flexible as handling the T/BLAS updates at high speed with full detail geo is obviously better than trying to "downgrade" the geo into a low poly + displacement representation.
 

So, are neural shaders Blackwell only or not? Overclock3d says they are but Microsoft talks about bringing support to other GPU vendors as well. If Ada would not support Neural Rendering, then I'm pretty sure the other GPUs would not get it either as they are far behind Nvidia in that stuff.

But maybe they are talking about future hardware from AMD, Intel and Qualcomm? I'm really confused.
My interpretation is that it is a blackwell feature, as hardware is required to perform this. It sounds as though it has some relationship to leveraging the existing compute units to make a micro network for NN processing.

But we definitely could use a lot more information on this one.
 

So, are neural shaders Blackwell only or not? Overclock3d says they are but Microsoft talks about bringing support to other GPU vendors as well. If Ada would not support Neural Rendering, then I'm pretty sure the other GPUs would not get it either as they are far behind Nvidia in that stuff.

But maybe they are talking about future hardware from AMD, Intel and Qualcomm? I'm really confused.
Presumably pre-Blackwell GPUs are missing the instructions required to support this. Could still be implemented probably but with a lower efficiency (i.e. performance).
"Other vendors" here mean RDNA4 most likely, possibly RDNA3 as it should be able to run these workloads too but it remains to be seen if performance will be enough.
No idea if Arc A or B are capable of running something like this.
Qualcomm doesn't have any ML capabilities outside of an "NPU" which is wholly external to the GPU so using it for that is quite unlikely I'd say.

In any case I don't think that this is going to be utilized much until the next console gen h/w. Which means that we're probably some 2-5 years away from games starting to use it en masse.

So, does that mean DMM is deprecated even before it got utilized?
Seems so. But who knows, maybe it will still be a feature in some future DXR update.
 
My interpretation is that it is a blackwell feature, as hardware is required to perform this. It sounds as though it has some relationship to leveraging the existing compute units to make a micro network for NN processing.

But we definitely could use a lot more information on this one.
I asked Bryan Cantanzaro (Nvidia DL/AI team) if coop vector support being added to DirectX will enable neural shader support for all RTX GPUs and he said "I believe it is for all RTX GPUs".
 
In any case I don't think that this is going to be utilized much until the next console gen h/w. Which means that we're probably some 2-5 years away from games starting to use it en masse.
I honestly don't think it will be that long, NVIDIA has a solid history of getting stuff from research to hardware to application pretty quickly, all the path tracing stuff (ReSTIR, Ray Reconstruction, Neural Radiance Cache, ...etc) went into games pretty quickly, not to mention SER and OMMs got deployed pretty quickly as well. RTX Hair and RTX Geometry are also getting rapid deployment in games,

I expect neural shaders/materials/textures to be deployed in select games soon enough.
 
I asked Bryan Cantanzaro (Nvidia DL/AI team) if coop vector support being added to DirectX will enable neural shader support for all RTX GPUs and he said "I believe it is for all RTX GPUs".
Ah. So the neural shaders will have access to tensor cores then is what I’m understanding. And that makes sense since all RTX GPUs have tensor cores.
 
From the paper: https://research.nvidia.com/labs/rtr/neural_texture_compression/

5.2 Decompression
Inlining the network with the material shader presents a few challenges as matrix-multiplication hardware such as tensor cores operate in a SIMD-cooperative manner, where the matrix storage is interleaved across the SIMD lanes [54, 86]. Typically, network inputs are copied into a matrix by writing them to group-shared memory and then loading them into registers using specialized matrix load intrinsics. However, access to shared memory is not available inside ray tracing shaders. Therefore, we interleave the network inputs in-registers using SIMD-wide shuffle intrinsics.

We used the Slang shading language [25] to implement our fused shader along with a modified Direct3D [44] compiler to generate NVVM [52] calls for matrix operations and shuffle intrinsics, which are currently not supported by Direct3D. These intrinsics are instead directly processed by the GPU driver. Although our implementation is based on Direct3D, it can be reproduced in Vulkan [23] without any compiler modifications, where accelerated matrix operations and SIMD-wide shuffles are supported through public vendor extensions. The NV_cooperative_matrix extension [22] provides access to matrix elements assigned to each SIMD lane. The mapping of these per-lane elements to the rows and columns of a matrix for NVIDIA tensor cores is described in the PTX ISA [54]. The KHR_shader_subgroup extension [21] enables shuffling of values across SIMD lanes in order to assign user variables to the rows and columns of the matrix and vice versa. These extensions are not restricted to any shader types, including ray tracing shaders.

5.2.1 SIMD Divergence.
In this work, we have only evaluated performance for scenes with a single compressed texture-set. However, SIMD divergence presents a challenge as matrix acceleration requires uniform network weights across all SIMD lanes. This cannot be guaranteed since we use a separately trained network for each material texture-set. For example, rays corresponding to different SIMD lanes may intersect different materials.
In such scenarios, matrix acceleration can be enabled by iterating the network evaluation over all unique texture-sets in a SIMD group. The pseudocode in Appendix A describes divergence handling. SIMD divergence can significantly impact performance and techniques like SER [53] and TSU [31] might be needed to improve SIMD occupancy. A programming model and compiler for inline networks that abstracts away the complexity of divergence handling remains an interesting problem and we leave this for future work.

edit: fixed line breaks
 
Last edited:
I honestly don't think it will be that long, NVIDIA has a solid history of getting stuff from research to hardware to application pretty quickly, all the path tracing stuff (ReSTIR, Ray Reconstruction, Neural Radiance Cache, ...etc) went into games pretty quickly, not to mention SER and OMMs got deployed pretty quickly as well. RTX Hair and RTX Geometry are also getting rapid deployment in games,

I expect neural shaders/materials/textures to be deployed in select games soon enough.
Select games sure but not in general. It is a more invasive feature for art pipelines than even RT so expecting a lot of developers to implement it as an alternative to what other platforms/GPUs will have to use is naive. Most games won't do this until consoles will get proper support - and even then it'll likely take some time to be ubiquitous. We're at 7th year of RT h/w and yet there's still just one game which requires it so 2-5 years for neural materials seems optimistic even.

Ah. So the neural shaders will have access to tensor cores then is what I’m understanding. And that makes sense since all RTX GPUs have tensor cores.
Cooperative vectors *are* access to "tensor cores" (any ML h/w really; up to the driver to route it wherever an IHV would prefer). What's different from how things are now is that you can run AI payloads from any other shader type (compute, pixel, etc.) For that the h/w must support such launches with minimal overhead - or performance will be suboptimal. I guess we'll see if and how it will run on pre-Blackwell RTX cards.
 
Mega Geometry sounds mostly like a s/w feature so yeah should work on all RTX cards.
I still wonder what happened to Ada's DMMs though, they are missing in the RTX Kit...

Still not sure exactly what Mega Geometry does and where it fits in the pipeline. Did Nvidia share any details? All this time I thought BVH updates were already happening on the GPU so I’m not sure how this helps with CPU overhead.
 
Neural materials are significantly more ambitious than neural texture compression; the former replaces textures and shader code and is intended to enable a generational leap in material quality, the latter just replaces traditional compression for textures to save VRAM and system memory space (and presumably disk space, once developers can target a minimum spec that supports it so they don't need to include a traditionally-compressed fallback).

Neural materials would require significant changes to tooling, art pipelines, and game engines, and it would also require developers to target a much higher level of material fidelity to extract the benefit from it. The earliest we could expect to see it would be in some Nvidia-sponsored AAA technical showcase releasing years from like Witcher 4 or the next Metro game. It would take much longer (if ever) for it to become standard. NTC would require less work and can benefit titles at the existing level of material fidelity. It could see widespread adoption very soon, but it could also be left to languish like RTX IO/DirectStorage.
 
Still not sure exactly what Mega Geometry does and where it fits in the pipeline. Did Nvidia share any details? All this time I thought BVH updates were already happening on the GPU so I’m not sure how this helps with CPU overhead.
We have to wait for the SDK to know the details. BVH builds and updates happen on both CPU and GPU, and MG is supposed to help with performance i.e. both of these.
 
Neural materials would require significant changes to tooling, art pipelines, and game engines, and it would also require developers to target a much higher level of material fidelity to extract the benefit from it. The earliest we could expect to see it would be in some Nvidia-sponsored AAA technical showcase releasing years from like Witcher 4 or the next Metro game. It would take much longer (if ever) for it to become standard.

Yeah I don’t see this happening anytime soon even in Nvidia sponsored games and certainly not at launch. Sounds like an artist’s nightmare. Best case is probably Nvidia ecosystem stuff like Remix.

I wonder how inference works. Does the network spit out material values for 1/8/32 pixels at a time?
 
Neural materials are significantly more ambitious than neural texture compression; the former replaces textures and shader code and is intended to enable a generational leap in material quality, the latter just replaces traditional compression for textures to save VRAM and system memory space (and presumably disk space, once developers can target a minimum spec that supports it so they don't need to include a traditionally-compressed fallback).

Neural materials would require significant changes to tooling, art pipelines, and game engines, and it would also require developers to target a much higher level of material fidelity to extract the benefit from it. The earliest we could expect to see it would be in some Nvidia-sponsored AAA technical showcase releasing years from like Witcher 4 or the next Metro game. It would take much longer (if ever) for it to become standard. NTC would require less work and can benefit titles at the existing level of material fidelity. It could see widespread adoption very soon, but it could also be left to languish like RTX IO/DirectStorage.
If the neural compression is the same as the earlier nvidia paper then I actually do not expect it will be adapted that fast. It does not work on textures individually but on a group of textures that are used for the same material. The thing is that in todays pipelines it's not always clear cut which textures belong with each other. Some textures are used by multiple materials, when this happens the gains might become less that what Nvidia is claiming. It is also possible to switch textures some textures on the fly. At this point is becomes less clear which textures need to be compressed together and you will need tooling that let artists mark textures that can be compressed together. Then there is the thing that in nvidia's paper they point sample the textures and let dlss filter out the noise. This might work in examples they showed, but if you add all the other sources of noise in todays games that could become problematic.

I'm actually rooting more for the neural materials to be adapted. And not for the performance reasons that nvidia is marketing them but for another reason: anti-aliasing. In todays games you already see some complex materials and some of them are experiencing aliasing. So you can have just a quad with a material and you get noise when you move. I have seen this happen in Starfield and in UE5's City Sample demo. The thing is, not all artists know how to combat aliasing and with todays shader graphs they can be allowed to make really complex materials and might not even notice that their materials contribute to aliasing. In Nvidia's paper about neural appearance models they actually store difference NN weights for different filter radiuses to combat aliasing.
 
The biggest issue with NTC is the lack of anisotropic filtering. What's the point of having high detailed textures if they turn into a blurry mess when viewed from sharp angles?
Anyway I don't think that any of that is production ready and should be viewed more as a practical research.
With matrix math support coming to DX a lot of these things will get more attention from actual game tech developers, and many issues will likely to be solved before this tech going into any sort of production.
Hence why my estimate is that we won't see it adopted for years still, and the lack of h/w support isn't even the main reason for that right now.
 
The biggest issue with NTC is the lack of anisotropic filtering. What's the point of having high detailed textures if they turn into a blurry mess when viewed from sharp angles?
It filters textures at sharp angles and does so more accurately than HW AF filtering when compared to the ground truth (the 1024 samples rendering) because it filters after the shading rather than before it.
It makes sense given that HW AF typically samples an ellipsoidal texture area while the screen space pixel jittering for STF translates into sampling a trapezoidal area in texture space which is quite similar to what HW AF does and would provide the same results given enough frames for accumulation.
 
Back
Top