DirectX Developer Day 2020

I think it is not so complicated.
DX12U == XboxSX
TURING/AMPERE >> DX12U
RDNA2 is when I have a doubt. It could be RDNA2 == DX12U or RDNA2 >> DX12U

View attachment 3773

Actually you can put both XSX and RDNA to ">>DX12U" pile just from RT already
"[Series X] goes even further than the PC standard in offering more power and flexibility to developers," reveals Goossen. "In grand console tradition, we also support direct to the metal programming including support for offline BVH construction and optimisation. With these building blocks, we expect ray tracing to be an area of incredible visuals and great innovation by developers over the course of the console's lifetime."
https://www.eurogamer.net/articles/digitalfoundry-2020-inside-xbox-series-x-full-specs
(or at least I read it like going direct to metal allows things DX12U wouldn't)

Out of curiosity, which Turing features go beyond DX12U?
 
Actually you can put both XSX and RDNA to ">>DX12U" pile just from RT already

https://www.eurogamer.net/articles/digitalfoundry-2020-inside-xbox-series-x-full-specs
(or at least I read it like going direct to metal allows things DX12U wouldn't)

Out of curiosity, which Turing features go beyond DX12U?
Off the top of my head for XSX:
  • ExecuteIndirect functionality is greater
  • VRS functionality is greater (wider)
  • and as you mentioned DXR spec is likely greater (wider)

There could be more options as well as we learn more in time.
 
I don’t know how anyone can interpret “all” to mean “baseline”. Microsoft even put the word ALL in caps. They seem to actually mean all and any other interpretation is just wishful thinking.



Is there a practical difference between a traversal shader and an any-hit shader using inline tracing? What can you not achieve using the latter?

Edit: I suppose a traversal shader would also exercise control over TLAS node traversal. Performance is likely disastrous though.

I'm a bit late to this discussion since I'm catching up on reading here. With respect to differences for DXR 1.1 PC and DXR Xbox Series X.
So yes, RT is typically associated with a drop in performance and that carries across to the console implementation, but with the benefits of a fixed console design, we should expect to see developers optimise more aggressively and also to innovate. The good news is that Microsoft allows low-level access to the RT acceleration hardware.

"[Series X] goes even further than the PC standard in offering more power and flexibility to developers," reveals Goossen. "In grand console tradition, we also support direct to the metal programming including support for offline BVH construction and optimisation. With these building blocks, we expect ray tracing to be an area of incredible visuals and great innovation by developers over the course of the console's lifetime."
 
In the RTX architecture Sampler Feedback is enabled from the Texture Space Shading hardware capability.
Is TSS a requirement for Sampler Feedback , or are there other methods available to achieve the same result?
 
In the RTX architecture Sampler Feedback is enabled from the Texture Space Shading hardware capability.
Is TSS a requirement for Sampler Feedback , or are there other methods available to achieve the same result?
Sampler feedback can improve the performance of texture space shading but it's not required for texture space shading. It can be used to reduce the amount of shading.
 
Sampler feedback can improve the performance of texture space shading but it's not required for texture space shading. It can be used to reduce the amount of shading.

Still don't get what the point of texture space shading is. You're just oversampling the one thing that's guaranteed to be easy to filter. LEAN mapping or etc. has to be cheaper than brute forcing the same thing.
 
Still don't get what the point of texture space shading is. You're just oversampling the one thing that's guaranteed to be easy to filter. LEAN mapping or etc. has to be cheaper than brute forcing the same thing.
I'm far from an expert, but have been reading about this some recently thanks to links posted here. Apparently you get less aliasing with texture space shading. I think because when you shade in screen space you can get large jumps across a texture for neighboring pixels, but when you shade in texture/object space you have more samples to choose from or filter across.

Performance wise you can decouple lighting calculations from frame rate. Maybe updating lighting at 60 fps is sufficient, but you want to rasterize as fast as the monitor's refresh rate so the game feels responsive. Maybe someone else can chime in. I haven't read about LEAN mapping in years so I can't compare it to that.
 
but when you shade in texture/object space you have more samples to choose from or filter across.
The image quality is improved for two reasons: The samples are temporally stable because they remain constant (e.g. a texel on a static object vs. sampling multiple texels per pixel which can cause undersampling and so aliasing).
And secondly, we could shade both mip levels that contribute to a pixel individually in TS, and then combining them for the pixel removes the final source of temporal instability.

The TS term became more confusing with Turing, because NV calls their new feature like this. It gives the image quality improvements, and it allows to shade only once for both views of VR.
But it does not allow to decouple shading from framerate, which is that largest promise of TS.
To make this possible we need to cache and reuse the results over multiple frames, and we need unique texture for each surface which breaks instancing.
Probably better call the caching idea 'Object Space lighting' from now on?
 
The image quality is improved for two reasons: The samples are temporally stable because they remain constant (e.g. a texel on a static object vs. sampling multiple texels per pixel which can cause undersampling and so aliasing).
And secondly, we could shade both mip levels that contribute to a pixel individually in TS, and then combining them for the pixel removes the final source of temporal instability.

The TS term became more confusing with Turing, because NV calls their new feature like this. It gives the image quality improvements, and it allows to shade only once for both views of VR.
But it does not allow to decouple shading from framerate, which is that largest promise of TS.
To make this possible we need to cache and reuse the results over multiple frames, and we need unique texture for each surface which breaks instancing.
Probably better call the caching idea 'Object Space lighting' from now on?

Ahh, ok that last one works. Though the first should be mostly taken care of by proper filtering, even if it is a bit expensive. Not to say LEAN mapping would be perfect here, but to get full stability you'd need to supersample everything otherwise there's you're still missing stuff like geometry information, differing shadowing and lighting, etc.

I do wonder if there's some clever way to take care of the second without vastly increasing the shading work as well. The mips are quantized but, I dunno, something something read from both and interpolate somehow then shade once. Then again the shading work should be quick as all the data should be incredibly coherent and easily cache available.
 
Not exactly certain where to post this, but thought it applies to DX.

A New PIX tool released today -- https://devblogs.microsoft.com/pix/pix-2004-27/

Notable changes and improvements include Improved Buffer Viewer, and support for CPU memory allocation data and file I/O data in Timing Captures and a new pixtool flag to help automate grabbing an unknown number of captures, as well as Full support for Sampler Feedback.

Full support for Sampler Feedback, including:
  • Added support for Sampler Feedback maps in Pixel History.
  • MIN_MIP feedback maps are now displayed correctly when accessing only one mip level.
  • Fixed issues when replaying MIN_MIP Sampler Feedback maps that would either cause StartAnalysis to fail or the feedback map to be in the incorrect resource state.
  • Fixed issues resolving MIP_REGION_USED feedback maps that had a full mip chain.
 
I do wonder if there's some clever way to take care of the second without vastly increasing the shading work as well. The mips are quantized but, I dunno, something something read from both and interpolate somehow then shade once. Then again the shading work should be quick as all the data should be incredibly coherent and easily cache available.

Yeah, i guess if we go to shade multiple LODs we probably have better reasons than just IQ. E.g. if blending discrete LODs is easier than morphing them to get continuous LOD.
If we ignore geometry and only think about cached shading in texture, it could be easier over 5 frames to shade one mip0 texel of a 2*2 quad per frame, and in the fifth frame shade the single corresponding mip 1 texel. That's a stupid example but it illustrates 'shading twice' does not necessarily mean to double the cost. It could be the same + 25% as with the memory requirements or less, and it could be finally cheaper than the interpolation you have in mind.

The larger and real problem is the geometry. Because we do not use voxels it's not possible to adapt geometry detail as simple as texture mips work. Only after working on geometry LOD for quite some time i realized the concept of mip mapping is totally flawed.
The issue is: If we want fine LOD transition for geometry, we end up at requiring different geometry for each detail level, otherwise the whole LOD idea makes no sense. We want to close small holes, fuse disjoint stuff together, remove noise etc.
So the topology of geometry has to change just as often as mip levels of texture do. Different geometry means inconsistent UV maps, and mip maps do no longer work at all.
(Ofc. we could optimize to break mip mapping only where and if topology actually does change (which is usually rare), but let's ignore this and assume worst case.)

Even if we finally had such fine grained geometic LOD which breaks UVs, mips would still help with filtering. So if we imagine to blend 2 LODs of geometry in whatever way smoothly, each LOD would ideally have 2 mips so texture is well filtered.
So if we had this awesome TS caching engine, we might end up requiring to shade actually 4 textures everywhere, not just 2.
It's clear that even with TS we are still interested in 'on the fly' filtering methods like LEAN mapping, or doing it all stochastically with TAA, and whatever else...
 
Last edited:
Thinking of the above, i realize the filtering could be cached as well. If we keep an additional copy of unshaded texture, we could update blocks of that even less often than the shading happens, and on update apply anisotropic filtering depending on current camera.
The shading stage so has prefiltered texture available, and for rendering simple bilinear lookup would be enough.
 
Yar, discontinuous meshes are totally undoable without some weird fancy stuff like microflake voxel stuff, otherwise you need a minimum complexity of the model to make it continuous, not really possible for stuff like trees. Still, volumetric impostors already work well for distant complex objects, I'm sure they'll get better and better.

But for continuous meshes LEADR mapping does a solid job. Differing LOD levels would be a problem still, but one might imagine some scheme wherein you could LEADR map the lower LOD level until it looked as much like the transition point between the two as possible. As far as a single LOD level goes though, the results can feel invisible and could probably be done in realtime with the upcoming consoles (though LEADR mapping isn't energy preserving some hack or two might be enough). The results wouldn't be perfect, but possibly quite good for quite a few years.


 
Last edited:
Yeah, agree and that's some nice resources.
What's missing is the bridge between those two examples, reducing the geometric detail of the dino instead increasing it to show high frequency details.
I remember only this single paper where they tried a complete LOD solution for this: http://multires.caltech.edu/pubs/hybrid.pdf
Difference to usual polygon reduction is the use of quadrangulation to achieve seamless texture UVs, so displacement mapping remains possible.
Though, the paper is basically a report of failure. They still need manual work for the base mesh and 3 hours of processing for that single buddha model.
This is really hard, but unfortunately i have similar requirements to have an efficient mapping from my surfel GI to visible geometry.
I'm almost done with this finally, and If i can get the tools fast enough to process fine details and large worlds, i can use it for visual LOD and TS as well, so i might give it a try with little extra work if i'm lucky.

Not sure yet how to render smooth transition from discrete LODs that differ in topology and UVs.
Morphing patches of mesh would be an option, but very complicated.
Someone here pointed out the idea of using screen door transparency (milk or jlippo... thanks for that! :) ). This would be easy and maybe even faster although needs to render twice:
Not sure how messy shadow maps become with this, but... until i get there SMs are probably replaced by RT i guess :)

I also have some rough ideas of how DXR 1.0 without traversal shaders could work with discrete LODs, but not sure yet.
Probably it works to displace a ray from the surface similar to how displacement mapping works. Could use the same data, and could prevent self intersection issues to hide hard discrete transitions of LOD.
 
is ms adding any vr centric instructions to directx ?

Like all the various view port related items back when the Nvidia 10560/1070/1080 series was released?
 
With programmable traversal you can alter the ray without having a hit.

This Nvidia patent describes more fine grained control over BVH traversal With LOD as an example use case. Not exposed via DXR of course.

Query-specific behavioral modification of tree traversal

https://patents.google.com/patent/US20200051315A1/en

this capability can be used by applications to dynamically and/or selectively enable/disable sets of objects for intersection testing versus specific sets or groups of queries, for example, to allow for different versions of models to be used when game state changes (for example, when doors open or close) or to provide different versions of a model which are selected as a function of the length of the ray to realize a form of geometric level of detail, or to allow specific sets of objects from certain classes of rays to make some layers visible or invisible in specific views. This capability also allows changing alpha to opaque and vice versa. The mode flags may enable traversal behavior to be changed in accordance with such aspects as, for example, a depth (or distance) associated with each bounding volume and/or primitive, size of a bounding volume or primitive in relation to a distance from the origin or the ray, particular instances of an object, etc.

The patent helpfully references a few other recent nvidia patents on details of their hardware RT implementation.
 
Back
Top