Direct3D feature levels discussion

Lurkmass · Mar 15, 2024

DavidGraham said:
Correct me If I am wrong, but doesn't their result align with NVIDIA? At bin sizes 6 to 13, the Work Graph methods are slower than Multi Pass ExecuteIndirect, only at 14 bins does the Work Graph carve a ~20% win, and at 15 bins it goes down to ~5%.

Both samples perform very different things ...

Nvidia's sample implements a multi-BRDF deferred shading model by starting out with a broadcasting root node to do tiled light culling and then outputs a deferred shading record. For all of the broadcasting leaf nodes, each of them represents their own specialized BRDF shading model and takes the deferred shading record generated by the tiled light culling pass to compute the material colour ...

AMD's sample implements a compute rasterizer and starts off by sending a work load record to the broadcasting root node which will do vertex shading and triangle bounding box computation and outputs both the split records and rasterization records. The split records will be used as inputs for broadcasting nodes (one specialized each for small/large bounding boxes) that do hierarchal bounding box subdivision/culling and both will output their own rasterization records. Lastly, we have thread leaf nodes each of which will take a rasterization record and represent the different triangle bin sizes for scan conversion (rasterization). One notable property of thread launch nodes is that their implementation can operate as some sort of producer-consumer queue ...

In the AMD sample, there's two different work graph implementations of the compute rasterizer. We have a version of the algorithm that does a dynamic dispatch for the nodes that do hierarchal bounding box subdivision/culling which is always slower than the multi-pass ExecuteIndirect implementation in this sample and a fixed dispatch version of the algorithm where they found out that triangle bins with tile sizes upto ~16K pixels is the fastest method in this case. The concept of binning in the Nvidia sample only applies during the purposes of the light culling pass where they split up the image into 32 pixels sized screen space tiles ...

DegustatoR · Mar 18, 2024

GDC 2024: Work graphs and draw calls - a match made in heaven!

Introducing "mesh nodes", which make draw calls an integral part of the work graph, providing a higher perf alternative to ExecuteIndirect dispatches.

gpuopen.com

We should probably move that discussion to a dedicated thread? Although this one looks like a future tier for work graphs:

“Mesh nodes” extend work graphs by introducing a new kind of leaf node that drives a mesh shader, and which allows a normal graphics PSO to be referenced from the work graph. And yes, you did read this right – full PSO changing can now be done as well! The feature is called mesh nodes as it allows a work graph to feed directly into a mesh shader, turning the work graph itself into an amplification shader on steroids.

Lurkmass · Mar 18, 2024

Xbox has also had 'GPU-driven' PSO switching for a long time via it's own more advanced version of ExecuteIndirect but it's nice to see work graphs following along with mesh nodes ...

DmitryKo · Mar 19, 2024

DegustatoR said:
this one looks like a future tier for work graphs

Graphics nodes were published right in the intitial Work Graphs specs v.43 - though last December they added Work Graphs Tier 1_1 to indicate support for Mesh nodes, initial implementation is still not public. Draw nodes / DrawIndexed nodes are also specified, but not yet implemented.

Lurkmass · Mar 19, 2024

DmitryKo said:
Graphics nodes were published right in the intitial Work Graphs specs v.43 - though last December they added Work Graphs Tier 1_1 to indicate support for Mesh nodes, initial implementation is still not public. Draw nodes / DrawIndexed nodes are also specified, but not yet implemented.

They dropped Draw/indexed nodes recently (may/may not revisit) to focus on Mesh nodes ...

iroboto · Mar 19, 2024

Lurkmass said:
Xbox has also had 'GPU-driven' PSO switching for a long time via it's own more advanced version of ExecuteIndirect but it's nice to see work graphs following along with mesh nodes ...

Do you see this as a standard feature coming to DirectX eventually down the line ?

Lurkmass · Mar 19, 2024

iroboto said:
Do you see this as a standard feature coming to DirectX eventually down the line ?

One of the earliest mentions of the functionality was in one of Graham Wihlidal's presentations ... (page 22)

Work Graphs might potentially be a more portable hardware abstraction to realize fully (single API call) GPU driven renderers across IHVs than an extended ExecuteIndirect API. Also Work Graphs provide a way to safely (no deadlocks) implement persistent threads as well which is unrelated functionality to ExecuteIndirect ...

You can read more up in the experimental Vulkan Work Graph extension proposal on why AMD passed up solutions #1&2 for similar reasons ...

TopSpoiler · Mar 23, 2024

(Japanese) https://www.4gamer.net/games/033/G003329/20240323004/

4Gamer has posted detailed DirectSR article.

DegustatoR · Mar 23, 2024

The major difference is that the core processing of the super-resolution upscaler is performed on the GPU driver side. Regarding the processing system that relies on GPU functionality, it is more reasonable to use the latest version of the GPU driver rather than the DirectSR runtime side. For example, even with the same DLSS, the GeForce RTX 40/30/20 series uses different generations of Tensor Core that process DLSS, so it is obvious that it is better to process it with a driver suitable for each GPU model and generation.

Whether the core part of the super-resolution upscaler is processed by the common code for all GPUs on the DirectSR side or by the GPU driver side seems to vary depending on the mindset of each GPU manufacturer.
In any case, to reiterate, the game side can also support super-resolution upscalers that depend on specific GPU functions by simply performing the pre-processing necessary for super-resolution upscalers and preparing parameters. I would be happy to become one.

So it's a driver side supplied "extension" of FSR2 base option built-in into the DirectSR API.
The biggest issue which comes to mind immediately is the inability to use anything but a driver provided upscaler.
So AMD users will be limited to base FSR2 integrated into DirectSR, NV users to DLSS and FSR2, Intel user to XeSS and FSR2.
AMD and Nvidia users lose XeSS option in this scenario.

but it seems that development of DirectSR has just started, and it is a test for developers. The release date has not been determined.
However, the entire DirectX development team is said to be working on the development, and Microsoft's GPU debugging tool PIX is also being made compatible with DirectSR.

The timing of DirectSR's availability has not yet been determined, but we should be able to see an actual demo with video at next year's GDC. I want to wait with anticipation.

That seems like a really long development time for something as trivial.
Hopefully they'll notice the issue of limiting user options and think about how to solve it.

Flappy Pannus · Mar 23, 2024

DegustatoR said:
That seems like a really long development time for something as trivial.
Hopefully they'll notice the issue of limiting user options and think about how to solve it.

Yeah wonder if that's speculation on their part - that seems a ridiculously long way away for just a demo.

Jay · Mar 23, 2024

DegustatoR said:
So it's a driver side supplied "extension" of FSR2 base option built-in into the DirectSR API.
The biggest issue which comes to mind immediately is the inability to use anything but a driver provided upscaler.
So AMD users will be limited to base FSR2 integrated into DirectSR, NV users to DLSS and FSR2, Intel user to XeSS and FSR2.
AMD and Nvidia users lose XeSS option in this scenario.

Sounds a bit like how I thought it would work.
Not read article but, in my scenario the game would be packaged with the DLL's. But game can use system ones if availavlable and higher version.
That way you can use any of the upscalers if supported by the hardware.

iroboto · Mar 23, 2024

Im reading that shader model 6.8 is required for work graphs, is there any possibility for back porting to older hardware? Which devices support this today?

DegustatoR · Mar 23, 2024

iroboto said:
Im reading that shader model 6.8 is required for work graphs, is there any possibility for back porting to older hardware? Which devices support this today?

Work graph are supported on RDNA3 and RTX 30+40 only right now. Doubt that it's because of the SM 6.8 support.

trinibwoy · Mar 23, 2024

DegustatoR said:
AMD and Nvidia users lose XeSS option in this scenario.

If there’s demand for a cross platform XeSS implementation they can just add it to base DirectSR. There’s no need for every developer to independently make that decision.

DegustatoR · Mar 23, 2024

trinibwoy said:
If there’s demand for a cross platform XeSS implementation they can just add it to base DirectSR. There’s no need for every developer to independently make that decision.

I would honestly prefer them using XeSS instead of FSR2 in the first place.

del42sa · Mar 24, 2024

https://wccftech.com/microsoft-directsr-api-is-based-on-amd-fsr-2-2-2/

Microsoft DirectSR API Is Based on AMD FSR 2.2.2

https://www.4gamer.net/games/033/G003329/20240323004/

Kaotik · Mar 24, 2024

del42sa said:
https://wccftech.com/microsoft-directsr-api-is-based-on-amd-fsr-2-2-2/

Microsoft DirectSR API Is Based on AMD FSR 2.2.2

https://www.4gamer.net/games/033/G003329/20240323004/

WCCFTech is trying to make it sound bigger than it is really I think.
At least my understanding was that FSR2(.x.x) is just one of the included scalers, with at least MSs own scaler coming along at some point too. Meaning that even if your video card drivers don't ship with any scaler (FSR for AMD, XeSS for Intel, DLSS for NVIDIA) you'll still get to pick FSR2 (or the MS one probably in future). The whole point of the API is that devs have to just support that API and it will guarantee compatibility with all scalers, nothing to do with FSR.

DegustatoR · Mar 24, 2024

Kaotik said:
At least my understanding was that FSR2(.x.x) is just one of the included scalers

The slides state that the built-in upscaler is based on FSR2. No other options mentioned.

Remij · Mar 24, 2024

DegustatoR said:
The slides state that the built-in upscaler is based on FSR2. No other options mentioned.

Which makes sense as it's the most widely supported one (hardware-wise) out of the bunch.

Kaotik · Mar 24, 2024

DegustatoR said:
The slides state that the built-in upscaler is based on FSR2. No other options mentioned.

But do mention built in scalers in plural.

Direct3D feature levels discussion

Lurkmass

DegustatoR

GDC 2024: Work graphs and draw calls - a match made in heaven!

Lurkmass

DmitryKo

Lurkmass

iroboto

Daft Funk

Lurkmass

TopSpoiler

DegustatoR

Flappy Pannus

Jay

iroboto

Daft Funk

DegustatoR

trinibwoy

Meh

DegustatoR

del42sa

Microsoft DirectSR API Is Based on AMD FSR 2.2.2

Kaotik

Drunk Member

Microsoft DirectSR API Is Based on AMD FSR 2.2.2

DegustatoR

Remij

Kaotik

Drunk Member

Direct3D feature levels discussion

Daft Funk

Daft Funk

Meh

Microsoft DirectSR API Is Based on AMD FSR 2.2.2​

Drunk Member

Microsoft DirectSR API Is Based on AMD FSR 2.2.2​

Drunk Member

Microsoft DirectSR API Is Based on AMD FSR 2.2.2

Microsoft DirectSR API Is Based on AMD FSR 2.2.2