Comparative consideration of DirectX 12 in games *spawn

Are you able to shed light on why Fortnite is still massively faster in DX11 even on the newest GPUs? What issues are causing such a huge degradation that Epic has been unable to address?
There's a lot of variables in that statement even... which modes, which settings, what do you define as "massively", what sort of system with which hardware, etc. Even if this was an area I have looked at (it isn't), that's not as simple (and loaded) a question as you imply.

Doing some quick googling I'm see some benchmarks that seem to show the opposite (and also significantly more stuttering on DX11 ironically), for instance:

Another video from chapter 4 shows them as very similar, on NVIDIA this time:

And another where they trade blows on NVIDIA:

I'm sure you can find benchmarks that show DX11 performing better, but that's kind of the point.
 
Last edited:
There's a lot of variables in that statement even... which modes, which settings, what do you define as "massively", what sort of system with which hardware, etc. Even if this was an area I have looked at (it isn't), that's not as simple (and loaded) a question as you imply.

Doing some quick googling I'm see some benchmarks that seem to show the opposite (and also significantly more stuttering on DX11 ironically), for instance:

Another video from chapter 4 shows them as very similar, on NVIDIA this time:

And another where they trade blows on NVIDIA:

I'm sure you can find benchmarks that show DX11 performing better, but that's kind of the point.
Any time you are GPU limited DX11 is much faster.
 
There's a lot of variables in that statement even... which modes, which settings, what do you define as "massively", what sort of system with which hardware, etc. Even if this was an area I have looked at (it isn't), that's not as simple (and loaded) a question as you imply.

Doing some quick googling I'm see some benchmarks that seem to show the opposite...
Nice sync'd inputs! My take-away - I can see DX12 working harder but not making any visual difference. In the last video, DX11 even looks better although the lighting is different.

This is counter to expectations of DX12 based on announcements for its release. It was supposed to speed up the PC, with things like zillions of asteroids being enabled. Something seems to have happened between the vision and the reality. What are we looking at for real with DX12? More features and better quality eventually, but at lower performance? That is, given a game DX11 can do, DX11 will do it faster with less energy, and DX12 only comes in to its own when doing something DX11 can't do?

All that said, the consolification of PCs to match consoles low overheads may not have progressed as anticipated, but then it's not like the consoles have a huge advantage. Or is there? What does a PS5/XBSX spec PC with similar CPU manage on the same games via DX11 and DX12? Have graphics moved on so much that the console advantages and original targets for DX12 are no longer bottlenecked in the same way, and the limitations are now more general and similar across hardwares and APIs?
 
Nice sync'd inputs! My take-away - I can see DX12 working harder but not making any visual difference. In the last video, DX11 even looks better although the lighting is different.

This is counter to expectations of DX12 based on announcements for its release. It was supposed to speed up the PC, with things like zillions of asteroids being enabled. Something seems to have happened between the vision and the reality. What are we looking at for real with DX12? More features and better quality eventually, but at lower performance? That is, given a game DX11 can do, DX11 will do it faster with less energy, and DX12 only comes in to its own when doing something DX11 can't do?

All that said, the consolification of PCs to match consoles low overheads may not have progressed as anticipated, but then it's not like the consoles have a huge advantage. Or is there? What does a PS5/XBSX spec PC with similar CPU manage on the same games via DX11 and DX12? Have graphics moved on so much that the console advantages and original targets for DX12 are no longer bottlenecked in the same way, and the limitations are now more general and similar across hardwares and APIs?
DX12 allows all the new UE5 rendering tech but for comparison purposes those were disabled to match the workload. When CPU limited DX12 pulls ahead in Fortnite.
 
There were minor differences of a few % for Nvidia GPUs. AMD at the time was on the completely broken r600 so it was all over the place which shouldn't be attributed to DX10.
More than a few %

Crysis ran consistently better under DX9 than DX10 at the same settings.

I seem to remember Far Cry 2 ram better under DX9 too.
 
I don't see anything indicating the "more powerful hardware binding model" in the links you provided about the "competitors".
Also, the information in the link you provided is outdated in many regards, for one, NVIDIA GPUs feature the scalar datapath for uniform warps since the Volta.
It's very much still relevant. If you look at any Nvidia HW's performance against non-uniform constant buffer loads (cbuffer{float4} load linear/random), they're catastrophically much slower as opposed to any other buffer loads.
Cbuffer loads: Nvidia Maxwell (and newer GPUs) have a special constant buffer hardware unit. Uniform address constant buffer loads are up to 32x faster (warp width) than standard memory loads. However non-uniform constant buffer loads are dead slow. Nvidia CUDA documents tell us that constant buffer load gets serialized for each unique address. Thus we can see up to 32x performance drop compared to best case. But in my test case (each lane = different address), we se up to 200x slow down. This result tells us that there's likely a small constant buffer cache on each SM, and if your access pattern is bad enough, this cache starts to trash badly. Unfortunately Nvidia doesn't provide us a public document describing best practices to avoid this pitfall.
Based on the author's notes above, the results seems to suggest that even on modern Nvidia HW like the RTX 3090 still has special constant memory for speeding up constant buffer loads. Using bindless constant buffers will force their hardware/driver into a slow path since they can't use their constant memory in that case.
 
I'm trying to work out if this is sarcasm or not?
I believe so which ties in with my question is if consoles present a clear advantage. Like for like (ish) PC and console running DX11 and DX12 should show the console advantage, if any. But I'm feeling that the limitations DX12 aimed to overcome, things like draw-calls being very expensive on PC and cheap and chips on console, are no longer important and consoles no longer gain a significant API overhead advantage.
 
It's very much still relevant. If you look at any Nvidia HW's performance against non-uniform constant buffer loads (cbuffer{float4} load linear/random), they're catastrophically much slower as opposed to any other buffer loads.

Based on the author's notes above, the results seems to suggest that even on modern Nvidia HW like the RTX 3090 still has special constant memory for speeding up constant buffer loads. Using bindless constant buffers will force their hardware/driver into a slow path since they can't use their constant memory in that case.

While this may be true in theory we’re missing one important ingredient in this discussion. Where are the amazing looking games with great performance on architectures with “powerful” binding models?
 
While this may be true in theory we’re missing one important ingredient in this discussion. Where are the amazing looking games with great performance on architectures with “powerful” binding models?
Is this supposed to be a facetious question ?
 
🤷‍♂️ See my post. It doesn't seem to match the other results people are getting. I hesitate to suggest this since HUB is a pretty thorough and careful site, but the only notes I see on quality settings on that video are "Epic Quality"... are we sure that he doesn't have Nanite/VSM/Lumen on (and/or other DX12-exclusive settings that default on on Epic...) in DX12? [Edit] I guess he tests that separately in the next test in theory, but I still wonder if this is actually apples to apples. There is no real reason for performance results to differ so much if the settings were the same, *especially* when GPU bound.

Just checked super quickly in the current season and DX12 is a bit faster than DX11 on my machine (4090, 1440p). To the limits of my ability to A/B test I'd call them functionally similar again, although ironically the shader stutter in DX11 was really bad for the first minute or so. When you look at the ground performance is similar. When you look off into the distance performance on DX12 is a bit better (but we're talking in the range of 3.5ms vs 3.8ms per frame which look impressive in % graphs but aren't really a huge difference in reality). What do you get on your machine?

So yeah I dunno, any number of variables could be different between my test, the youtube tests and the HUB test. I'm still leaning on something was probably a bit wonky in the HUB test, but as I said in my other post... there's a zillion variables and I'd take literally any single result with a huge grain of salt.
 
Last edited:
Nice sync'd inputs! My take-away - I can see DX12 working harder but not making any visual difference. In the last video, DX11 even looks better although the lighting is different.
The pixels should be effectively the same between the paths - could be glitches but assuming the UE5 stuff is disabled it should look the same. Not sure what you mean by "working harder" - to the limits of the noise in tests like this I think those results are functionally the same. Perhaps you could argue that the mins/hitches are lower under DX12 but you'd really need to run more data to even draw such a conclusion.

It was supposed to speed up the PC, with things like zillions of asteroids being enabled. Something seems to have happened between the vision and the reality. What are we looking at for real with DX12?
And it does, but games haven't really gone that direction per se, especially since they need to still support other platforms often. By the time we got to wanting to really up the geometric density, most engines have moved to GPU-driven rendering making that point a bit moot. There are a handful of titles that really try and push the multithreaded submission hard (ex. Ashes) but they are by far the minority.

What it does do is enable a lot of new stuff as mentioned. These things are not arbitrarily tied to DX12, they do effectively require a lot of the changes that DX12 makes to function. You'd better believe they would have put raytracing in DX11 too if it was straightforward.

More features and better quality eventually, but at lower performance? That is, given a game DX11 can do, DX11 will do it faster with less energy, and DX12 only comes in to its own when doing something DX11 can't do?
For simple stuff, I'd expect the two APIs to be similar. Most engines have gotten to the place where they are in the same ballpark now, which is all I'd expect with content that was designed for DX11. There are of course details and outliers, but the overall discussion is getting a bit moot at this point; we need the new APIs for overall progress in other areas so they will become effectively required.
 
If you look at any Nvidia HW's performance against non-uniform constant buffer loads (cbuffer{float4} load linear/random), they're catastrophically much slower as opposed to any other buffer loads.
How is this related to the "DX12 tax" or to the "more powerful hardware binding model"?
If non-uniform constant buffer loads were critical, this would have become apparent in games, similar to the overhead of Structured Buffers on Pascal and earlier architectures, which was resolved in Volta and subsequent architectures.

the results seems to suggest that even on modern Nvidia HW like the RTX 3090 still has special constant memory for speeding up constant buffer loads.
How is that bad when there is also both a scalar unit and compiler optimizations for uniform address loads?

Using bindless constant buffers will force their hardware/driver into a slow path since they can't use their constant memory in that case
Structured buffers are no longer slow, so I would not consider that as a problem, the general load/store path is fast on Turing+.
 
Last edited:
The pixels should be effectively the same between the paths - could be glitches but assuming the UE5 stuff is disabled it should look the same. Not sure what you mean by "working harder" -
CPU occupancy is higher for DX12 drawing a lot more watts in the first video. Notably more RAM used too on both vids with stats.
And it does, but games haven't really gone that direction per se, especially since they need to still support other platforms often. By the time we got to wanting to really up the geometric density, most engines have moved to GPU-driven rendering making that point a bit moot.
Like fast IO is made somewhat redundant by engines moving to super-efficient steaming architectures. Hardware design can't keep up with software!
 
CPU occupancy is higher for DX12 drawing a lot more watts in the first video. Notably more RAM used too on both vids with stats.
This actually seems correct. It should be higher under dx12 not lower. Bottlenecks on draw call submission would reduce cpu occupancy due to single threaded submissions.
 
This actually seems correct. It should be higher under dx12 not lower. Bottlenecks on draw call submission would reduce cpu occupancy due to single threaded submissions.
Yes but it's achieving the same results as the DX11 version. ;)
 
Back
Top