Comparative consideration of DirectX 12 in games *spawn

DavidGraham · Nov 17, 2023

iroboto said:
DX12 Tax?

It's the tax paid by an engine when it is transitioned to a DX12 backend for the first time. You can observe this in the Witcher 3 Definition Edition, DX12 cost 30% of performance compared to DX11 with no visual improvements whatsoever. This happens in dozens of games/engines. It is one of the sad realities of PC gaming, and it's long struggle with lower level APIs.

Sometimes the DX12 tax affects NVIDIA GPUs more heavily than AMD GPUs, which results in a performance disparity between the two, a good example of this is Borderlands 3, which has a huge DX12 tax (vs DX11), but it affects NVIDIA way more than AMD.

This is the case in Valhalla, which is the first Assassin's Creed title to use DX12, and it favored AMD heavily as a result. Far Cry 6 is the same as well. So was Borderlands 3.

Recent Call of Duty titles can considered a strong case for this as well, as ever since the engine transition to DX12, the disparity between AMD and NVIDIA became huge. This never happened before the transition to DX12. In fact, Starfield can be considered another example of this, as it's the first DX12 title for the studio, but at least they are optimizing it to perform better.

The DX12 tax is also a major reason why some console ports perform better on consoles than their equivalent PC parts. If the said ports come in DX12 flavor, and if the developer doesn't do the necessary optimizations effort to avoid this tax, then these ports perform worse on PC.

iroboto · Nov 17, 2023

DavidGraham said:
The DX12 tax is also a major reason why some console ports perform better on consoles than their equivalent PC parts. If the said ports come in DX12 flavor, and if the developer doesn't do the necessary optimizations effort to avoid this tax, then these ports perform worse on PC.

Yea I’m not sure I would call that a tax then. That’s just developers that need more time to bake their game for more IHVs.

DX11 is a tax. Developers have less control over the GPU, thus compared to optimized DX12 it will run slower. the IHVs job is to build custom drivers for these titles to speed things up on behalf of the developers. If they didn’t then games would likely run like complete garbage.

By that definition, DX11 is a tax. The profiled drivers are the credit. And DX12 puts developers in the driver seat and with that comes significantly more responsibility and costs associated with getting their games to run well on all the platforms they want to support.

What you’re describing is that the marketplace removed their tax and crediting system and developers are absorbing the full cost without assistance.

DavidGraham · Nov 18, 2023

iroboto said:
Developers have less control over the GPU, thus compared to optimized DX12 it will run slower.

Yeah, We've only observed this in a handful of titles vs the dozens of titles showing the opposite. Games losing tons of performance transitioning to DX12 because developers lack the time/resources/skills to optimize properly for DX12, especially if they invested heavily in tools that rely on DX11.

Let's just say that the "DX12 tax" term is a slang term describing the current situation.

techuse · Nov 18, 2023

However you want to describe it, DX12 has been a huge loss for PC gamers.

Remij · Nov 18, 2023

techuse said:
However you want to describe it, DX12 has been a huge loss for PC gamers.

It's been rough for gamers, but a necessary step for developers. If it didn't happen when it did, it would happen later.. and I for one am glad it happened when it did and we can more clearly see the issues now and work past them.

DX12 is/was a necessary evil.

iroboto · Nov 18, 2023

Remij said:
It's been rough for gamers, but a necessary step for developers. If it didn't happen when it did, it would happen later.. and I for one am glad it happened when it did and we can more clearly see the issues now and work past them.

DX12 is/was a necessary evil.

Agreed. This is sort of the “bad” part of having developers and engines stuck on the older API for decades, and then suddenly asking them to write a high performance general one all on their own. We know it works, looking over at consoles, but execution would have to occur over likely just as long, decades, and that wasn’t really discussed as being a con when it was being marketed.

But this (appears to) makes things more sustainable (in the long run) by moving the burden of performance away from IHVs over to developers, it doesn’t seem reasonable to have IHVs support these massive development teams to make every game run fast.

techuse · Nov 18, 2023

Except it hasnt gotten any better after almost 10 years. There isn't any reason to assume it ever will. Nothing points to anything changing. It may not even be feasible to write high performance, low level code for such a variety of architectures. Moving the burden off the IHVs seems unwise. They have far more resources to allocate than developers.

Lurkmass · Nov 18, 2023

DavidGraham said:
Yeah, We've only observed this in a handful of titles vs the dozens of titles showing the opposite. Games losing tons of performance transitioning to DX12 because developers lack the time/resources/skills to optimize properly for DX12, especially if they invested heavily in tools that rely on DX11.

Let's just say that the "DX12 tax" term is a slang term describing the current situation.

There would be no 'tax' by and large if the leading graphics hardware vendor actually focused on improving their fundamentals instead. The supposed 'tax' isn't a problem at all on their competitors since they have a more powerful hardware binding model than they do ...

Remij · Nov 18, 2023

techuse said:
Except it hasnt gotten any better after almost 10 years. There isn't any reason to assume it ever will. Nothing points to anything changing. It may not even be feasible to write high performance, low level code for such a variety of architectures. Moving the burden off the IHVs seems unwise. They have far more resources to allocate than developers.

Consider that it takes time from when an API is announced/released to penetrate the market enough to coerce developers to take up support for it. It then takes time to integrate into engines.. often during this time support is half assed anyway because they're supporting multiple APIs, and then consider it also takes time after that until games are on shelves. Then wait some more time after than until the improvements come.

So I'd say despite DX12 being a thing for so long now.. we ARE indeed seeing improvements and a lot of that can only come after years of fighting with a new API. It's just as much developers needing to come to terms with their responsibilities as it is an issue of the API design itself.

Shifty Geezer · Nov 18, 2023

Are there any flagship DX12 games that showcase the advantages?

iroboto · Nov 18, 2023

Shifty Geezer said:
Are there any flagship DX12 games that showcase the advantages?

I think technically most of all of the latest games are showcases.

It’s a bit unfair to say nothing has improved when both complexity and performance required to serve that complexity has been increasing since 2015. And our DX12 games are certainly much better performing today than it was 8 years ago.

We have had very massive changes in rendering in the last 8 years, now finally making a move towards ray tracing as well and super high geometry outputs required at super high resolution and frame rates. All the while requiring high speed IO to feed this new geometric high fidelity detail. We’re solving a lot of challenges all at the same time here. I doubt DX11 is up to the task - we would have seen it

davis.anthony · Nov 18, 2023

Shifty Geezer said:
Are there any flagship DX12 games that showcase the advantages?

Alan Wake 2 is definitely there with path tracing and mesh shaders.

Andrew Lauritzen · Nov 18, 2023

techuse said:
It may not even be feasible to write high performance, low level code for such a variety of architectures. Moving the burden off the IHVs seems unwise. They have far more resources to allocate than developers.

There's certainly still issues, but a few points:

1) Most of the architecture changes were necessary to enable more programmability. Things like Nanite and raytracing are not really something you can efficiently do in DX11, especially as they evolve.
2) Some of the pain points are becoming less relevant over time as games can assume a higher baseline of hardware. Once we can all just do things bindlessly throughout (again, requires DX12/Vulkan), a bunch of the pain involved in the cross-arch descriptor management details goes away. Hardware has had to adapt to maintain the same levels of efficiency but that has largely happened because again, needed for raytracing.
3) PSO stuff will likely still remain a moderate pain until GPU architectures evolve to be less dependent on static register/resource allocation and thus so constrained by occupancy. Ironically the latest Apple GPUs have made some progress on that front it seems - would love to see more work from the PC IHVs as it's something that has been a known issue for a decade or more. And again, raytracing really pushes this to a level that requires some movement from the IHVs. This is probably one of the major constraints that makes portable performance hard/impossible to do on PC as well.
4) The rest of the API differences in 11/12 are increasingly irrelevant going forward as we've moved such a large chunk of the pipeline to compute now. Once you do that there's much less opportunity for the IHVs to do hardware-specific graphics pipeline tweaks in drivers, so which API is doing the submit call is not important.

So yeah there was definitely transition pain, but other than PSOs I think we're mostly through that. There are of course still places that the API and hardware needs to evolve further (the shading languages still absolutely suck... people need to talk about this more, it's disgraceful that we're writing so much GPU code in a language and with tooling that has barely changed from the first days of HLSL) but the CPU submission side is much less relevant in the days of visibility buffers and GPU-driven rendering.

Andrew Lauritzen · Nov 18, 2023

Lurkmass said:
There would be no 'tax' by and large if the leading graphics hardware vendor actually focused on improving their fundamentals instead.

Good article that is a nice public summary of some of this. The usual caveat to folks of not assigning too much of these details to whatever end result you are seeing but for the more technically-minded folks who are interested in some of the low level considerations that go into these APIs, this is a pretty accurate discussion!

Of course the question of "why doesn't everyone just switch to one model" is pretty complicated when you get down to it. Obviously it would be great if everything were super flexible from an application/driver POV, but as with most things some of the more restrictive models can be more efficient on certain architectures, and the details can tie in pretty heavily to other design decisions too.

That said, I think everyone realizes we're going to be going bindless due to raytracing and other considerations at this point, so I do expect this to somewhat converge the hardware in the future. DX12's model luckily first well with, with the annoying parts mostly being on the shader side, not the CPU API.

DavidGraham · Nov 18, 2023

Lurkmass said:
There would be no 'tax' by and large if the leading graphics hardware vendor actually focused on improving their fundamentals instead. The supposed 'tax' isn't a problem at all on their competitors since they have a more powerful hardware binding model than they do ...

That would be true if the tax is exclusive to NVIDIA alone, AMD is affected a great deal as well. You should also target AMD for their abhorrent ray tracing weaknesses, which are much more severe than any possible NVIDIA "API binding" deficit.

At least NVIDIA is winning in rasterization despite the possible API binding problems, it's only a handful of games causing problems on their hardware, and those stem primarily from developers not doing the full optimization pass required for their dominant hardware.

OlegSH · Nov 18, 2023

Lurkmass said:
The supposed 'tax' isn't a problem at all on their competitors since they have a more powerful hardware binding model than they do ...

I don't see anything indicating the "more powerful hardware binding model" in the links you provided about the "competitors".
Also, the information in the link you provided is outdated in many regards, for one, NVIDIA GPUs feature the scalar datapath for uniform warps since the Volta.

iroboto · Nov 18, 2023

OlegSH said:
I don't see anything indicating the "more powerful hardware binding model" in the links you provided about the "competitors".
Also, the information in the link you provided is outdated in many regards, for one, NVIDIA GPUs feature the scalar datapath for uniform warps since the Volta.

I assume this is the reference in question:

The other major downside to the D3D12 model is that handing control of the hardware heaps to the application really ties driver writers’ hands. Any time the client does a copy or blit operation which isn’t implemented directly in the DMA hardware, the driver has to spin up the 3D hardware, set up a pipeline, and do a few draws. In order to do a blit, the pixel shader needs to be able to read from the blit source image. This means it needs a texture or UAV descriptor which needs to live in the heap which is now owned by the client. On AMD, this isn’t a problem because they can re-bind descriptor sets relatively cheaply or just use one of the high descriptor set bindings which they’re not using for heaps. On Intel, they have the very convenient back-door I mentioned above where the old binding table hardware still exists for fragment shaders.

Where this gets especially bad is on NVIDIA, which is a bit ironic given that the D3D12 model is basically exactly NVIDIA hardware. NVIDIA hardware only has one texture/image heap and switching it is expensive. How do they implement these DMA operations, then? First off, as far as I can tell, the only DMA operation in D3D12 that isn’t directly supported by NVIDIA’s DMA engine is MSAA resolves. D3D12 doesn’t have an equivalent of vkCmdBlitImage(). Applications are told to implement that themselves if they really want it. What saves them, I think (I can’t confirm), is that D3D12 exposes 106descriptors to the application but NVIDIA hardware supports 220 descriptors. That leaves about 48k descriptors for internal usage. Some of those are reserved by Microsoft for tools such as PIX but I’m guessing a few of them are reserved for the driver as well. As long as the hardware is able to copy descriptors around a bit (NVIDIA is very good at doing tiny DMA ops), they can manage their internal descriptors inside this range. It’s not ideal, but it does work

====
Hopefully more work is done here in this space for DX13.

OlegSH · Nov 18, 2023

DavidGraham said:
and those stem primarily from developers not doing the full optimization pass required for their dominant hardware.

Typical performance issues stem from developers not optimizing certain parts when porting games over to PC. That's why certain optimizations have suddenly become relevant, such as the reBAR.
For example, if you mindlessly update descriptors in every frame, even for static objects, this is certainly not something you'd want to do on a PC without unified memory, which is highlighted in almost every optimization guide and is relevant for every vendor. This practice is completely unrelated to the binding model, which seem to be just an echo of the old limitations on Pascal and older GPUs.

Andrew Lauritzen · Nov 19, 2023

descriptors article said:
That leaves about 48k descriptors for internal usage. Some of those are reserved by Microsoft for tools such as PIX but I’m guessing a few of them are reserved for the driver as well. As long as the hardware is able to copy descriptors around a bit (NVIDIA is very good at doing tiny DMA ops), they can manage their internal descriptors inside this range. It’s not ideal, but it does work

This was very much a design consideration on DX12 to leave some space for the drivers to have their own internal descriptors. I think it has worked out well in practice with no major issues arising from that part that I know of.

techuse · Nov 19, 2023

Andrew Lauritzen said:
There's certainly still issues, but a few points:

1) Most of the architecture changes were necessary to enable more programmability. Things like Nanite and raytracing are not really something you can efficiently do in DX11, especially as they evolve.
2) Some of the pain points are becoming less relevant over time as games can assume a higher baseline of hardware. Once we can all just do things bindlessly throughout (again, requires DX12/Vulkan), a bunch of the pain involved in the cross-arch descriptor management details goes away. Hardware has had to adapt to maintain the same levels of efficiency but that has largely happened because again, needed for raytracing.
3) PSO stuff will likely still remain a moderate pain until GPU architectures evolve to be less dependent on static register/resource allocation and thus so constrained by occupancy. Ironically the latest Apple GPUs have made some progress on that front it seems - would love to see more work from the PC IHVs as it's something that has been a known issue for a decade or more. And again, raytracing really pushes this to a level that requires some movement from the IHVs. This is probably one of the major constraints that makes portable performance hard/impossible to do on PC as well.
4) The rest of the API differences in 11/12 are increasingly irrelevant going forward as we've moved such a large chunk of the pipeline to compute now. Once you do that there's much less opportunity for the IHVs to do hardware-specific graphics pipeline tweaks in drivers, so which API is doing the submit call is not important.

So yeah there was definitely transition pain, but other than PSOs I think we're mostly through that. There are of course still places that the API and hardware needs to evolve further (the shading languages still absolutely suck... people need to talk about this more, it's disgraceful that we're writing so much GPU code in a language and with tooling that has barely changed from the first days of HLSL) but the CPU submission side is much less relevant in the days of visibility buffers and GPU-driven rendering.

Are you able to shed light on why Fortnite is still massively faster in DX11 even on the newest GPUs? What issues are causing such a huge degradation that Epic has been unable to address?

Comparative consideration of DirectX 12 in games *spawn

DavidGraham

iroboto

Daft Funk

DavidGraham

techuse

Remij

iroboto

Daft Funk

techuse

Lurkmass

Remij

Shifty Geezer

uber-Troll!

iroboto

Daft Funk

davis.anthony

Andrew Lauritzen

Moderator

Andrew Lauritzen

Moderator

DavidGraham

OlegSH

iroboto

Daft Funk

OlegSH

Andrew Lauritzen

Moderator

techuse

Similar threads