Direct3D feature levels discussion

Would Intel want to completely hand over XeSS at that level to MS as in ceding all control of it going forward?
Why would Intel need to do that? AMD doesn't "cede control" of FSR2, they even continue improving it in 3.1 presumably beyond the 2.2.2 spec which went into DXSR "standard implementation". Nothing stops MS from licensing and adding current XeSS DP4a version in the exact same way, the only difference would be the licensing model.

Let's look at Nvida for instance, does Nvidia want to push DLSS as the base model?
Nvidia would want that but they'd need to make DLSS compatible with other h/w for that and the choice of what they want more is theirs to make.

One thing to note in all this is that DXSR won't solve the problem of developers still using IHV's APIs because a) there's still FG which isn't a part of that and will still require IHV provided SDK implementation and b) a new upscaler which won't fit into DXSR API (say it would need a different set of inputs from the game) will go the same route DLSS/XeSS/FSR took previously. Trying to sell DXSR as if it will solve all these issues is misleading. And that's not even mentioning the fact that there are these pesky Vulkan and Linux.
 
FSR2 is the most widely supported. While technically XeSS Dp4a edition runs on most cards it also comes with a pretty decent performance penalty in my experience.

My hot take (at least, in hardware enthusiast circles): FSR2 isn't THAT bad and 95% of the time I forget I'm running it vs something like DLSS. I actually tested this on myself today in Warzone, I forgot I switched from DLSS to FSR2 to try it out. The only way I eventually remembered was the game actually doesn't expose a sharpening slider for FSR2 so I noticed my image seemed more sharpened than usual (I usually run DLSS with 0 sharpening). Granted this is at 4k Performance on a TV about 6 feet away, so your milage may vary if you are running 1440p on a monitor less than a foot away.

FSR 2 is widely supported because it is the most outdated one. Just something runs on a 10 years old GPU doesnt make it usefull for the future. Compute denoisers run on anything, too, and yet RayReconstruction is superior and has better optimiziation potential. Within 6 months RR has beaten state of the art denoisers.
 
One thing to note in all this is that DXSR won't solve the problem of developers still using IHV's APIs because a) there's still FG which isn't a part of that and will still require IHV provided SDK implementation and b) a new upscaler which won't fit into DXSR API (say it would need a different set of inputs from the game) will go the same route DLSS/XeSS/FSR took previously. Trying to sell DXSR as if it will solve all these issues is misleading. And that's not even mentioning the fact that there are these pesky Vulkan and Linux.

That’s true but you gotta start somewhere. DXR 1.0 didn’t solve every problem either. At least with DXSR there is an avenue to introduce improvements to the common api if the IHVs play nice with each other.
 
FSR 2 is widely supported because it is the most outdated one. Just something runs on a 10 years old GPU doesnt make it usefull for the future. Compute denoisers run on anything, too, and yet RayReconstruction is superior and has better optimiziation potential. Within 6 months RR has beaten state of the art denoisers.
Something being outdated doesnt make it useless. It is good baseline tech for those that don’t use Nvidia as you can essentially use it on everything, including Xbox.

Something being superior doesn’t make it the default if it doesn’t have broad hardware compatibility.
 
Interesting. FSR is the default implementation but the api doesn’t mandate all of the inputs that FSR needs to work effectively. Weird setup.
 
Interesting. FSR is the default implementation but the api doesn’t mandate all of the inputs that FSR needs to work effectively. Weird setup.
It doesn't mandate what FSR doesn't mandate.
Not every game will benefit from the optional inputs and it's for the devs to make use of them when it's worth it.
 
It doesn't mandate what FSR doesn't mandate.
Not every game will benefit from the optional inputs and it's for the devs to make use of them when it's worth it.

Yes but that means there’s a risk that AMD will still need to nudge/help devs to do things “properly”. Not ideal. Nvidia and Intel will care less because their stuff will work with just the mandatory inputs.
 
Yes but that means there’s a risk that AMD will still need to nudge/help devs to do things “properly”. Not ideal. Nvidia and Intel will care less because their stuff will work with just the mandatory inputs.

AMD can likely mitigate this with xbox. Likely devs going forward are just going to put FSR on their xbox games.
 
Yes but that means there’s a risk that AMD will still need to nudge/help devs to do things “properly”. Not ideal. Nvidia and Intel will care less because their stuff will work with just the mandatory inputs.
All of them have optional parameters, and can be implemented badly even if they didn't.
If you don't need something and it's not optional then dev sending in crap won't be helpful.
I think the article wording is pretty bad around that as it makes it sound like it's a bad implementation if you don't use every parameter regardless if its required or not.
 
All of them have optional parameters, and can be implemented badly even if they didn't.
If you don't need something and it's not optional then dev sending in crap won't be helpful.
I think the article wording is pretty bad around that as it makes it sound like it's a bad implementation if you don't use every parameter regardless if its required or not.

Which optional parameters are used by dlss and xess?
 
Hi,
just a question for Dimitrko and NPU people:

seeing https://forum.beyond3d.com/threads/direct3d-feature-levels-discussion.56575/page-70#post-2331017
and also
seeing in https://learn.microsoft.com/en-us/windows/win32/direct3d12/core-feature-levels:
"The overall driver model for compute-only devices is the Microsoft Compute Driver Model (MCDM)"
and Compute-only device== MCDM device,
shown as new D3D12 feature level D3D_FEATURE_LEVEL_1_0_CORE

EDIT: I see also now a new DXCORE_ADAPTER_ATTRIBUTE_D3D12_GENERIC_ML in addition to DXCORE_ADAPTER_ATTRIBUTE_D3D12_CORE_COMPUTE and also a D3D_FEATURE_LEVEL_1_0_GENERIC
here: https://github.com/microsoft/DirectML/commit/0bd9f4f0c7775a77de8104abdb66ce1f8515f30d
so don't know if NPUs are D3D_FEATURE_LEVEL_1_0_CORE or D3D_FEATURE_LEVEL_1_0_GENERIC with even more or less restrictions one vs the other..

have two questions..
1) DmitryKo can you update your D3D12CheckFeatureSupport tool to run on NPUs? i.e. report Direct3D Core Compute devices information..
in case it's already supported, somebody with a MeteorLake can share D3D12CheckFeatureSupport logs?

interested also what metacommands they expose vs current GPUs..
also would be nice if they expose WMMA (tensor core ops) metacommands for DirectML..

2) seems interesting if NPUs can run general "compute shaders" D3D12 only apps.. if yes, it's time for an DX12peak benchmark like clpeak or vkpeak?
interested in seeing perf of NPUs on clpeak via CLon12 or vkpeak via Dozen (CLon12 and Dozen might need some changes/massaging to support D3D_FEATURE_LEVEL_1_0_CORE)..

i.e. are these devices able to run simple D3D12 "compute shader only" apps with no screen/swapchaing/DXGI creation?
can run,for example:
or some apps others pointed:

if yes at least there is some general use to the TOPS provided in NPUs like MeteorLake and later this year by Qualcomm Elite and also AMD XDNA1 or 2 in addition to running DIrectML workloads..

note there is a DirectML NPU sample now (which only filters direct3d devices on the sysmte to core devices to avoid GPUs):



 
Last edited:
seems interesting if NPUs can run general "compute shaders" D3D12 only apps.. if yes, it's time for an DX12peak benchmark like clpeak or vkpeak?

NPU devices should be able to run compute shaders according to Microsoft Learn documentation on feature level 1_0_CORE, but you have to request this specific level and not any higher feature levels when creating the Direct3D12 device, and also enumerate MCDM adapters using IDXCoreAdapterFactory interface instead of IDXGIFactory.

DmitryKo can you update your D3D12CheckFeatureSupport tool to run on NPUs? i.e. report Direct3D Core Compute devices information.
I'll look into this, however DXCore is a WinRT API - unlike DXGI which is plain Windows API (a.k.a. Win32) with only minimal 'lightweight COM' plumbing - and it's only implemented in Windows 10 version 2004 (build 19041) or higher, unlike DXGI which is available down from Windows Vista.

There could be a few potential problems with that. First, if DXCore API requires some DLL exports to be present in the executable, my app wouldn't even run on earlier versions of Windows.
Second, it may require me to switch to the UWP Console app model, which would add a lot of unnecessary WinRT plumbings to my executable - and also make it unable to run in earlier versions of Windows 10 which do not support the UWP Console API. Therefore I may have to restructure my code and build a separate version of my tool just for NPUs.

It would be much easier if Microsoft just extended DXGI to support these "core" devices, but so far I've found no indications to that end; I don't have an Intel NPU processor to test with.
 
Last edited:
NPU devices should be able to run compute shaders according to Microsoft Learn documentation on feature level 1_0_CORE, but you have to request this specific level only and not include any higher feature levels when creating the Direct3D12 device, and also enumerate MCDM adapters using IDXCoreAdapterFactory interface instead of IDXGIFactory.


I'll look into this, however DXCore is a WinRT API - unlike DXGI which is plain Windows API (a.k.a. Win32) - and it's only implemented in Windows 10 version 2004 (build 19041) or higher, unlike DXGI which is available down from Windows Vista.

There could be a few potential problems with that. First, if DXCore API requires some DLL exports to be present in the executable, my app wouldn't even run on earlier versions of Windows.
Second, it may require me to switch to the UWP Console app model, which would add a lot of unnecessary WinRT plumbings to my executable - and also make it unable to run in earlier versions of Windows 10 which do not support the UWP Console API. Therefore I may have to restructure my code and build a separate version of my tool just for NPUs.

It would be much easier if Microsoft just extended DXGI to support these "core" devices, but so far I've found no indications to that end; I don't have an Intel NPU processor to test with.
many thanks for detailed information..
sad to hear (and didn't know) it requires WinRT APIs and not available as a Win32 API like DXGI..
in that case I wouldn't touch your original app and perhaps "fork" as an another one, dxcore_capsviewer or something else..
anyway no pressure!, understand early days of NPUs, almost nobody has hardware to test, etc..
 
sad to hear (and didn't know) it requires WinRT APIs and not available as a Win32 API like DXGI..

FYI I've been experimenting with DXCore, and it's actually available for regular Win32 'desktop apps', like most WinRT APIs today - so there is no need to switch to the UWP app model, and you can suppress DXCore.dll exports with LoadLibrary() or delay-load linking.


It's still a typical WinRT API with no strong typing, unlike traditional 'lightweight COM' as implemented in Direct3D and DXGI. For example, adapter information is passed by reference as object 'properties' and you need to query these 'properties' with IDXCoreAdapter::IsPropertySupported() and IDXCoreAdapter::GetPropertySize() before you read them with IDXCoreAdapter::GetProperty() into a dynamically allocated typeless buffer, while the actual type information needs to be derived from WinMD metadata, as typical for .NET/CLI interfaces.

So you have to use C++ features and language projections like C++/CX or C++/WinRT, since you can't infer type information for 'properties' from header files, as it was standard in C-style coding typical for DXGI. There was the C++/Win32 project, a derivative of C++/WinRT which could use WinMD files from the Win32 metadata repository to generate C++ projection interfaces for WIn32 APIs, but this tool has been abandoned - like many other improvements to the desktop Windows platform announced alongside Project Reunion (aka Windows App SDK) back in 2021 - and only C# and Rust projections for Win32 are officially supported...



Anyway, I'll look into how DirectML's DXDispatch tool handles these DXCore 'property' types (in Adapter.cpp and Adapter.h). For now l will add a new command-line option to create the Direct3D 12 device with a minimum feature level 1_0_CORE, and hopefully NPUs would also be visible to DXGI interfaces as a standard graphics adapter - if they are only available through DXCore, that would require code refactoring to mix and match DXCore and DXGI adapters by their LUID indentifiers.
 
Last edited:
I've added the necessary C++ plumbing to query DXCore adapters with IDXCoreAdapter::GetProperty(), and adapter LUIDs do match between DXGI and DXCore "core compute" adapters on the same graphics card (and the WARP12 software renderer) - so there's a good chance NPU devices will be visible as DXGI adapters as well, considering you can create a Direct3D 12 'core' device with minimum feature level 1_0_CORE on a DXGI adapter.

@oscarbg, do you have an actual NPU to test with?

DXCore defines a few 'adapter atrribute' GUIDs to filter devices by type, like DXCORE_ADAPTER_ATTRIBUTE_D3D11_GRAPHICS, DXCORE_ADAPTER_ATTRIBUTE_D3D12_GRAPHICS, then D3D12_CORE_COMPUTE, D3D12_GENERIC_ML, and D3D12_GENERIC_MEDIA; the latest SDK headers define additional 'hardware attributes' like DXCORE_HARDWARE_TYPE_ATTRIBUTE_GPU, COMPUTE_ACCELERATOR, NPU, and MEDIA_ACCELERATOR, but these are not documented on Microsoft Learn yet.

I wonder how actual NPU devices report these attributes; on my RDNA3 video card, the IDXCoreAdapter::IsAttributeSupported method reports D3D11_GRAPHICS, D3D12_GRAPHICS and D3D12_CORE_COMPUTE, and the same for the WARP12 adapter.


AFAIK there is no publically available information about feature level 1_0_GENERIC and its dfferences in comparisin to level 1_0_CORE. Considdering they also added D3D_SHADER_MODEL_NONE, maybe they're linked to the new D3D12_GENERIC_MEDIA attribute above, something like a limited Direct3D 12 device with metacommands and/or video rendering fiunctionality, but no compute shaders and even less processing capabilities?!.


BTW, WinRT APIs use UTF8 encoded char* strings, instead of UTF16 encoded wchar_t strings in Win32 APIs like DXGI and WDDM thunking. Therefore I had to set the .UTF8 locale in C Runtime, which changes legacy string functions like printf_s() / puts() to correctly support non-ASCII UTF8-encoded stings. However UTF8 support in the UCRT was only implemented in Windows 10 version 1803 (build 17164), so my app would break in earlier versions of Windows with a non-English locale.

Enums are straightforward for the most part, though of course Microsoft had to employ some different enums from the Direct3D KMD (kernel mode driver) thunking header (d3dkmdt.h), and their integer values do not match DXGI enums for the same functionality.
But at least the reported memory sizes all use uint64_t, and not size_t integer type as in DXGI which oscillates between 64-bit and 32-bit integer types depending on the target platform...
 
Last edited:
OK then, I will stick to DXGI for now, and use DXCore to query hardware attributes and select the minimum feature level, and also add a command-line option to set it. This will be added in the next release of my tool once a new Agility SDK version rolls out, hopefully this June after Bulid 2024 concludes.

If NPUs are only available from DXCore, that would need code refactoring to into full-blown C++ to the tune of D3DX12CheckFeatureSupport. An NPU for testing would be nice too - though Zen5 'Granite Ridge' desktop parts might include one, and I'm long due for a CPU upgrade... :unsure:
 
Last edited:

RDNA2 supported rendering/blending to RGB9E5 render targets several years ago. I think this is a far more useful feature as opposed to superfluous things like VRS or "Sampler Feedback" ;)
 
Back
Top