Direct3D feature levels discussion

Correct me If I am wrong, but doesn't their result align with NVIDIA? At bin sizes 6 to 13, the Work Graph methods are slower than Multi Pass ExecuteIndirect, only at 14 bins does the Work Graph carve a ~20% win, and at 15 bins it goes down to ~5%.
Both samples perform very different things ...

Nvidia's sample implements a multi-BRDF deferred shading model by starting out with a broadcasting root node to do tiled light culling and then outputs a deferred shading record. For all of the broadcasting leaf nodes, each of them represents their own specialized BRDF shading model and takes the deferred shading record generated by the tiled light culling pass to compute the material colour ...

AMD's sample implements a compute rasterizer and starts off by sending a work load record to the broadcasting root node which will do vertex shading and triangle bounding box computation and outputs both the split records and rasterization records. The split records will be used as inputs for broadcasting nodes (one specialized each for small/large bounding boxes) that do hierarchal bounding box subdivision/culling and both will output their own rasterization records. Lastly, we have thread leaf nodes each of which will take a rasterization record and represent the different triangle bin sizes for scan conversion (rasterization). One notable property of thread launch nodes is that their implementation can operate as some sort of producer-consumer queue ...

In the AMD sample, there's two different work graph implementations of the compute rasterizer. We have a version of the algorithm that does a dynamic dispatch for the nodes that do hierarchal bounding box subdivision/culling which is always slower than the multi-pass ExecuteIndirect implementation in this sample and a fixed dispatch version of the algorithm where they found out that triangle bins with tile sizes upto ~16K pixels is the fastest method in this case. The concept of binning in the Nvidia sample only applies during the purposes of the light culling pass where they split up the image into 32 pixels sized screen space tiles ...
 
Last edited:
We should probably move that discussion to a dedicated thread? Although this one looks like a future tier for work graphs:
“Mesh nodes” extend work graphs by introducing a new kind of leaf node that drives a mesh shader, and which allows a normal graphics PSO to be referenced from the work graph. And yes, you did read this right – full PSO changing can now be done as well! The feature is called mesh nodes as it allows a work graph to feed directly into a mesh shader, turning the work graph itself into an amplification shader on steroids.
 
Last edited:
Xbox has also had 'GPU-driven' PSO switching for a long time via it's own more advanced version of ExecuteIndirect but it's nice to see work graphs following along with mesh nodes ...
 
Xbox has also had 'GPU-driven' PSO switching for a long time via it's own more advanced version of ExecuteIndirect but it's nice to see work graphs following along with mesh nodes ...
Do you see this as a standard feature coming to DirectX eventually down the line ?
 
Do you see this as a standard feature coming to DirectX eventually down the line ?
One of the earliest mentions of the functionality was in one of Graham Wihlidal's presentations ... (page 22)

Work Graphs might potentially be a more portable hardware abstraction to realize fully (single API call) GPU driven renderers across IHVs than an extended ExecuteIndirect API. Also Work Graphs provide a way to safely (no deadlocks) implement persistent threads as well which is unrelated functionality to ExecuteIndirect ...

You can read more up in the experimental Vulkan Work Graph extension proposal on why AMD passed up solutions #1&2 for similar reasons ...
 
The major difference is that the core processing of the super-resolution upscaler is performed on the GPU driver side. Regarding the processing system that relies on GPU functionality, it is more reasonable to use the latest version of the GPU driver rather than the DirectSR runtime side. For example, even with the same DLSS, the GeForce RTX 40/30/20 series uses different generations of Tensor Core that process DLSS, so it is obvious that it is better to process it with a driver suitable for each GPU model and generation.

Whether the core part of the super-resolution upscaler is processed by the common code for all GPUs on the DirectSR side or by the GPU driver side seems to vary depending on the mindset of each GPU manufacturer.
In any case, to reiterate, the game side can also support super-resolution upscalers that depend on specific GPU functions by simply performing the pre-processing necessary for super-resolution upscalers and preparing parameters. I would be happy to become one.
So it's a driver side supplied "extension" of FSR2 base option built-in into the DirectSR API.
The biggest issue which comes to mind immediately is the inability to use anything but a driver provided upscaler.
So AMD users will be limited to base FSR2 integrated into DirectSR, NV users to DLSS and FSR2, Intel user to XeSS and FSR2.
AMD and Nvidia users lose XeSS option in this scenario.

but it seems that development of DirectSR has just started, and it is a test for developers. The release date has not been determined.
However, the entire DirectX development team is said to be working on the development, and Microsoft's GPU debugging tool PIX is also being made compatible with DirectSR.

The timing of DirectSR's availability has not yet been determined, but we should be able to see an actual demo with video at next year's GDC. I want to wait with anticipation.
That seems like a really long development time for something as trivial.
Hopefully they'll notice the issue of limiting user options and think about how to solve it.
 
So it's a driver side supplied "extension" of FSR2 base option built-in into the DirectSR API.
The biggest issue which comes to mind immediately is the inability to use anything but a driver provided upscaler.
So AMD users will be limited to base FSR2 integrated into DirectSR, NV users to DLSS and FSR2, Intel user to XeSS and FSR2.
AMD and Nvidia users lose XeSS option in this scenario.
Sounds a bit like how I thought it would work.
Not read article but, in my scenario the game would be packaged with the DLL's. But game can use system ones if availavlable and higher version.
That way you can use any of the upscalers if supported by the hardware.
 
Im reading that shader model 6.8 is required for work graphs, is there any possibility for back porting to older hardware? Which devices support this today?
 
Im reading that shader model 6.8 is required for work graphs, is there any possibility for back porting to older hardware? Which devices support this today?
Work graph are supported on RDNA3 and RTX 30+40 only right now. Doubt that it's because of the SM 6.8 support.
 
WCCFTech is trying to make it sound bigger than it is really I think.
At least my understanding was that FSR2(.x.x) is just one of the included scalers, with at least MSs own scaler coming along at some point too. Meaning that even if your video card drivers don't ship with any scaler (FSR for AMD, XeSS for Intel, DLSS for NVIDIA) you'll still get to pick FSR2 (or the MS one probably in future). The whole point of the API is that devs have to just support that API and it will guarantee compatibility with all scalers, nothing to do with FSR.
 
Back
Top