Direct3D feature levels discussion

NVIDIA said they are working with Microsoft to standardize Mega Geometry, maybe DXR1.3?

He also emphasized that since this mega geometry is a software technology, it can be supported by all RTX graphics cards up to the RTX 20 series. He said that this is not simply a technology exclusive to Nvidia, but that they are working with Microsoft to standardize it.​
 
I'm always pushing for multiple options.
But do wonder why MS doesn't move DX to legacy and just go all in with Vulcan.
Moving forward what are they actually getting from putting in whatever work they do?

Do wonder if they are having those conversations now, with multi plat push etc now also
that would remove another control they have about pc gaming.. and only anti-cheating kernel driver will remains (that a lot of people don't want or don't care at all)
 
NVIDIA said they are working with Microsoft to standardize Mega Geometry, maybe DXR1.3?


He also emphasized that since this mega geometry is a software technology, it can be supported by all RTX graphics cards up to the RTX 20 series. He said that this is not simply a technology exclusive to Nvidia, but that they are working with Microsoft to standardize it.​


if it "can" be supported in the same way that would be added for sure, RTX20 and beyond has a big share of currently working gaming pcs.
 
that would remove another control they have about pc gaming.. and only anti-cheating kernel driver will remains (that a lot of people don't want or don't care at all)
What control does it actually give them though?
Could understand in the past. Not sure what they get from the investment and I don't see them doing much from looking from the outside in regarding pushing things forward etc

Take that investment and work on windows game ui
 
I'm always pushing for multiple options.
But do wonder why MS doesn't move DX to legacy and just go all in with Vulcan.
Moving forward what are they actually getting from putting in whatever work they do?

Do wonder if they are having those conversations now, with multi plat push etc now also
Microsoft can add new features to DirectX on its own. Adding features to Vulkan requires going through the Khronos Group. DX12 is a clean API focused on Windows PCs and Xbox. Vulkan has to support everything from console to mobile to embedded devices to PCs. PlayStation and Apple have their own graphics APIs, there's no reason Microsoft should get rid of theirs. And I think it goes without saying that the DirextX12 API team does not work on UI, ditching DirectX 12 will do nothing to improve Windows UI. Microsoft is perfectly capable of improving Windows UI/UX without abandoning core Windows features; if it does ditch Windows features and lay off the teams responsible for them that would be a pure cost-cutting decision, that money won't be redirected to improving anything else.

Also, the direction WebGPU is going in seems to be the final nail in the coffin for any hope that Vulkan could become the universal graphics API to replace all others.
 
Last edited:
Also, the direction WebGPU is going in seems to be the final nail in the coffin for any hope that Vulkan could become the universal graphics API to replace all others.
For now my post maybe premature but I hope you realize that just because WebGPU opted for a textual shading language like WGSL over a binary representation like SPIR-V doesn't mean that Vulkan can't be a 'universal' graphics API because why else would Microsoft choose to adopt SPIR-V or put in the effort to create a Vulkan driver if that weren't the case ?

Also if Microsoft are planning to "significantly diverge" DirectX SPIR-V from Khronos' Vulkan SPIR-V what exactly would Microsoft or IHVs gain from moving from DXIL ? Redundancy ? (No since vendors would have to effectively write two separate compilers for depending on the SPIR-V 'variant' in question and one of OpenCL's fundamental problem would've been resolved!) Cross-platform development ? (No because any developer planning to port their applications between PC (DirectX SPIR-V) or mobile (Vulkan SPIR-V) devices will need to meet BOTH of their additional constraints!) Having more vendors like Qualcomm participate in DirectX ? (Well no that just dissaudes any other potential HW vendors from cooperating because they can't implement Microsoft's higher standards.) Does Microsoft ultimately have any intention of heavily fracturing the SPIR-V standard ?

As the biggest opponent of using SPIR-V for WebGPU's binary representation, what does Apple do next with their game porting toolkit (D3D/Metal translation layer) ? Do they effectively just grind the development of GPTK to a halt just to keep spiting SPIR-V and to a lesser extent Khronos as well even if it means making their own users worse off in the end ? What other choices does Apple really have besides NOT implementing a SPIR-V compiler for shader model 7 (SPIR-V becomes the only supported IR for D3D at that point) ?
 
Last edited:
The entire presentation of DirectX state of the union from GDC 2025.

-Work Graphs Mesh Nodes is getting new updates related to procedural generation of geometry and continuous level of details.
-DXR 1.2 releasing at the end of April:
  • Support for Opacity Micro Maps for up to 2.3x performance gains in path traced titles.
  • Support for Shader Execution Reordering for up to 2x gains in path traced titles.
  • In Alan Wake 2 both improved performance by 30% on average.
  • Hardware support is only on NVIDIA RTX for now.
 
Last edited:
Should we think of work graphs as a UE6 era technology that will leave lots of hardware behind? Or will devs build separate work graph and non-work-graph implementations of their engines?
 
If by "UE6 era" you mean "10th-console generation" then yes. It seems most studios/publishers with their own engine make major changes approximately once per console generation, so we're not going to see much usage of work graphs until the 10th-gen arrives. Once studios are willing to drop 9th-gen consoles, they'll be willing to drop support for PC GPUs without work graphs as well. Studios that upgraded their engines for the current-gen already dropped support for non-DX12U hardware. Only the first-generation of DX12U hardware lacks work graph support (Ampere and RDNA3 support it, I can't find anything on Intel but I would be surprised if Battlemage never gets support), so the amount of HW that would be supported before mandatory work graph implementation but dropped because of it is not too great. By that time, the RTX 20XX and RX 6XXX series cards will already be obsolete by overall performance anyways.

For Unreal Engine specifically, UE5 might get optional support considering Epic was one of the champions for work graphs and touted their usefulness with Nanite. Nanite already has optional mesh shader support, adding work graph support without making it mandatory shouldn't be too hard.

The one wrinkle is the mesh nodes extension. We don't know when it will be released, what hardware will support it, how many developers will implement work graphs without it and how many won't bother using work graphs without mesh nodes.
 
Does the RTX 50-series support Mesh Nodes, or is it in the same boat as all previous RTX GPUs?

I assume all of them will receive support via a driver update in the near future.
 
Should we think of work graphs as a UE6 era technology that will leave lots of hardware behind? Or will devs build separate work graph and non-work-graph implementations of their engines?
Epic Games already uses ExecuteIndirect as their 'fallback' implementation (lock-free broker work queue) for Nanite on PC compared to persistent threads/cooperative dispatch (lock-based circular work queue/ring buffer) on consoles. One of the intentions behind work graphs is to indirectly expose a scheduling mechanism to enable lock-based algorithms on the GPU. If we need a set of waves to cooperate between them then we use locks for this purpose but the danger with programming locks on the GPU is that we don't have any "forward progress guarantees" that our GPU wave scheduler will schedule work for all waves AFTER acquiring the lock/mutex. If no work is scheduled on the waves carrying the lock and the other waves which are now waiting for the lock to be eventually released then this condition is described as a 'deadlock' since our program can't make any further progress. If work is scheduled for all waves/threads then lock will be eventually released hence the implicit forward progress guarantee ...

Lock-free algorithms can exhibit indefinite blocking (starvation) due either to contentious access from other waves to a shared resource or waves not being given enough execution resources. Spinlocks with a single cooperative indirect dispatch can be faster (assuming 'fair' scheduler) than barriers between indirect dispatches if the statistical distribution of execution of these jobs/tasks in the latter scenario are right skewed ...

If develoipers do need to rely on GPU-side PSO switching functionality of mesh nodes then there are very few good options in terms of working around the lack of this capability. You can do PSO switching on the CPU but that sort of defeats the point of *GPU-driven* rendering since you're bringing back render logic to the CPU and you're more likely to trigger the driver compiler to apply render state changes (always results in compilation with unique PSOs) whereas with work graphs we can let the driver exploit more optimal paths for render state changes to reduce the amount of CPU-side hitching/spiking to generate a new PSO ...

Looking back now, I guess we really didn't need to make D3D12 API design more complex to support multi-threading for pushing higher draw counts if the holy grail now is to move all render logic to the GPU and render our entire scene with a single indirect command ... (CPU overhead with respect to render logic becomes irrelevant)
 
Looking back now, I guess we really didn't need to make D3D12 API design more complex to support multi-threading for pushing higher draw counts if the holy grail now is to move all render logic to the GPU and render our entire scene with a single indirect command ... (CPU overhead with respect to render logic becomes irrelevant)
This is the great irony of DX12. It started out with tech demos like this that boasted of how many draw calls the CPU could dispatch compared to DX11, and now we have work graph demos that seek to minimize the amount of CPU dispatches.

Were there any technical barriers at the time DX12 was being designed that would have obstructed something like work graphs, or was the idea simply not thought of yet?
 
Were there any technical barriers at the time DX12 was being designed that would have obstructed something like work graphs, or was the idea simply not thought of yet?
At least on what's known of AMD's side it appears they felt like they needed major enhancements to their command processor/scheduling units -- they've only enabled it on RDNA3 and *even then* only while Hardware-Accelerated Scheduling is enabled (which turns on the ability they added for it to actually run multiple concurrent queues and assign one to each process, on their profiling tools it shows up as 'MES HWS' -- 'MES' is 'micro engine scheduler' which was only added in RDNA, but they've been enhancing it each generation)

While there's *probably* some level of way you could've done it in another way earlier, console games have been doing persistent thread shenanigans for a while, that likely doesn't cooperate well with a more general OS.
 
This is the great irony of DX12. It started out with tech demos like this that boasted of how many draw calls the CPU could dispatch compared to DX11, and now we have work graph demos that seek to minimize the amount of CPU dispatches.
D3D12 has a design philosophy of there being "multiple ways" to achieve whatever is you want with differing sets of tradeoffs for each method (depending on the hardware in question as well) so it's inevitable that you'll have a subset of APIs "competing against each other" and eventually 'abandoned' APIs too like the legacy geometry pipeline, multi-adapter, renderpasses & etc. ...
Were there any technical barriers at the time DX12 was being designed that would have obstructed something like work graphs, or was the idea simply not thought of yet?
Considering major vendors had many implementation complexities the "predecessor to work graphs" (ExecuteIndirect) like AMD hitting a slow path when changing the shader input bindings with pre-RDNA designs, or NV having implicit barriers (they call it a subchannel switch in their HW) when switching between graphics draws/compute dispatches (it posed as a severe issue up until Turing), and Intel emulating the GPU-side draw count parameter by implementing a loop to process more than 1 draw with an indirect command for a long time (before the release of Battlemage ?) how could hardware vendors realistically implement a MORE powerful API design such as feature complete Work Graphs w/ mesh nodes (let alone support a PSO swapping extension for ExecuteIndirect) to be performant enough on their current hardware designs at the time ?
 
Back
Top