Comparative consideration of DirectX 12 in games *spawn

iroboto · Nov 19, 2023

Shifty Geezer said:
Yes but it's achieving the same results as the DX11 version.

But is it achieving the same results?

And secondly does occupancy mean more CPU saturation ?

Andrew Lauritzen · Nov 20, 2023

Shifty Geezer said:
CPU occupancy is higher for DX12 drawing a lot more watts in the first video. Notably more RAM used too on both vids with stats.

Perhaps, but we're stretching the utility of those measurements. Things like power use are easy to swing wildly with voltage or turbo tweaks, among other things. Stuff like OS yields vs. busy waits can make a large difference to power use, but also can negatively affect latency.

The reality is these systems - all the way from game down to OS - are not really being optimized for power efficiency on PC. Even Steam Deck has only the most cursory nods with things like FPS caps. Thus even "good" things like multithreading can run more CPU threads at higher frequencies and cost more power overall. Every time you add thread synchronization that's some overhead and even if the work could now be down at lower clocks, PC OSes don't tend to race to idle in games as that can cause other problems (see all the cases where people say that disabling core parking, affinitizing threads manually and using "high performance" are the ways they fix their games).

That's not to say that the delivered power efficiency is not worse here, just that it's a fickle measurement and not something that really generalizes to a measure of "goodness" of a game's path. Similarly GPU power is not a good metric for whether a game is "utilizing the GPU well". There's a lot of layers of subtlety and complexity between these measurements and the goals of a given game/OS in terms of delivered performance.

Shifty Geezer · Nov 20, 2023

iroboto said:
But is it achieving the same results?

Ah, I missed an important metric which was framerate. DX12 is running 30% faster.

Shifty Geezer · Nov 20, 2023

Andrew Lauritzen said:
Perhaps, but we're stretching the utility of those measurements. Things like power use are easy to swing wildly with voltage or turbo tweaks, among other things. Stuff like OS yields vs. busy waits can make a large difference to power use, but also can negatively affect latency.

I assume the comparison is performed fairly on the same system, not cranking up the voltage for one test versus another. Just from your first video, voltages are the same.

However, as I've just mentioned to Iroboto, I realise I missed the framerate. DX is running 20-30% faster. So it's working.

Andrew Lauritzen said:
That's not to say that the delivered power efficiency is not worse here, just that it's a fickle measurement and not something that really generalizes to a measure of "goodness" of a game's path. Similarly GPU power is not a good metric for whether a game is "utilizing the GPU well". There's a lot of layers of subtlety and complexity between these measurements and the goals of a given game/OS in terms of delivered performance.

As an end result, I think it fair to say power draw for what's on screen is a reasonable measure. If I offer your two consoles producing what looks nigh identical on screen results where one uses 100W to produce those visuals and the other 200W, regardless of how they are achieving that, the one producing the result in less power is more efficient. And if that's the only difference between them, then the lower power option is the 'better' for that game. I can appreciate that down the line, the 200W console may do more, but that doesn't invalidate the data-point on efficiency.

iroboto · Nov 20, 2023

Shifty Geezer said:
Ah, I missed an important metric which was framerate. DX12 is running 30% faster.

Oh. Well not always. I was trying to get at that the results indicated sometimes it would be way ahead, sometimes similar and sometimes ahead or behind. But the latency spikes were different, and the performance profiles appeared to be different based on the hardware benchmarked.

So I was trying to just say, we aren’t actually seeing the “same” result, it’s quite varied where dx12 lands.

Andrew Lauritzen · Nov 20, 2023

Shifty Geezer said:
As an end result, I think it fair to say power draw for what's on screen is a reasonable measure. If I offer your two consoles producing what looks nigh identical on screen results where one uses 100W to produce those visuals and the other 200W, regardless of how they are achieving that, the one producing the result in less power is more efficient. And if that's the only difference between them, then the lower power option is the 'better' for that game. I can appreciate that down the line, the 200W console may do more, but that doesn't invalidate the data-point on efficiency.

Right, but my point is that while that conclusion is valid for your specific game running on your specific console on that specific day, it just doesn't really generalize to anything useful. The issue is when people say "therefore console/API/whatever is more power efficient" in general, which is almost always the implication. But if these cases aren't even considering power efficiency as an optimization target how is that result meaningful at all? If my game runs all the cores at 100% with busy waits to reduce latency in your twitch shooter to minimal levels and burns power doing it then CPUs with higher clocks and more cores will look way worse. Is that because they are fundamentally "less efficient" to produce the "same" result? Yes in that one case, but that says literally nothing about that CPU's efficiency vs. another in any other task.

Ultimately PC gaming is designed to get from point A to B as quickly as possible, using any resources at its disposal to do it. It is expected that those resources will be used "inefficiently" in terms of power because the whole thing is designed that way. You could save a heap-load of power with minor settings tweaks if that was a goal at all, far more than the differences between PC hardware/APIs/etc. If you permit me a car analogy, it's like asking how much gas a race car used to get its 0-60 time... the same car can often run much more efficiently in other tests but that's not the target of that test. PC gaming is basically always running in the best 0-60 mode; reviewers and users pay some lip service to power use/efficiency but ultimately people don't care a whole lot on plugged-in devices.

DavidGraham · Nov 21, 2023

Shifty Geezer said:
This is counter to expectations of DX12 based on announcements for its release. It was supposed to speed up the PC, with things like zillions of asteroids being enabled. Something seems to have happened between the vision and the reality. What are we looking at for real with DX12? More features and better quality eventually, but at lower performance? That is, given a game DX11 can do, DX11 will do it faster with less energy, and DX12 only comes in to its own when doing something DX11 can't do?

DX12 -in particular- focused primarily on speed, it promised less CPU overhead (so less chance for the game to become single threaded), more multi core utilization, and vastly more draw calls count than ever before (so drawing more stuff on screen without tremendously hurting performance). So far none of that materialized in a good capacity. If not coded for carefully, DX12 actually decreases fps, increases CPU overhead, introduces frequent "Pipeline State Objects" stuttering, has VRAM management issues, and longer loading times "due to PSO compilation".

The Khronos group (makers of Vulkan) has come out and stated that Vulkan has the same problems too, it increased CPU overhead, decreased GPU performance, prevented certain GPU archs from reaching their full potential, and forced many developers to develop workarounds that behave like the old APIs (see below).

However, DX12 has indeed paved the way for two major things: Ray Tracing and Nanite. Though it's worth mentioning, that Crytek has demonstrated that they can do Ray Tracing on DX11 just fine, and successfully did so in 3 games (Crysis 1/2/3 Remastered), they remastered the 3 games in DX11 then augmented them with a VulkanRT backend. It was a hack, but it worked.

The Khronos statement. The premise is unrealistic. Here Khronos explains why (in summary Vulkan/DX12 constrain gaming dynamism).

Many of these assumptions have since proven to be unrealistic.

On the application side, many developers considering or implementing Vulkan and similar APIs found them unable to efficiently support important use cases which were easily supportable in earlier APIs. This has not been simply a matter of developers being stuck in an old way of thinking or unwilling to "rein in" an unnecessarily large number of state combinations, but a reflection of the reality that the natural design patterns of the most demanding class of applications which use graphics APIs — video games — are inherently and deeply dependent on the very "dynamism" that pipelines set out to constrain.

Khronos admits that problems Vulkan and DX12 sought to solve, are transferred to the game instead of the driver, without giving developers the necessary knowledge and tools to solve it (unlike the driver which handled it gracefully), thus directly causing the stutter problem.

As a result, renderers with a choice of API have largely chosen to avoid Vulkan and its "pipelined" contemporaries, while those without a choice have largely just devised workarounds to make these new APIs behave like the old ones — usually in the form of the now nearly ubiquitous hash-n-cache pattern. These applications set various pieces of "pipeline" state independently, then hash it all at draw time and use the hash as a key into an application-managed pipeline cache, reusing an existing pipeline if it exists or creating and caching a new one if it does not. In effect, the messy and inefficient parts of GL drivers that pipelines sought to eliminate have simply moved into applications, except without the benefits of implementation specific knowledge which might have reduced their complexity or improved their performance.

They also admit what we've all suspected from the beginning: DX12/Vulkan DO NOT reduce CPU overhead, but actually directly INCREASE it.

On the driver side, pipelines have provided some of their desired benefits for some implementations, but for others they have largely just shifted draw time overhead to pipeline bind time (while in some cases still not entirely eliminating the draw time overhead in the first place). Implementations where nearly all "pipeline" state is internally dynamic are forced to either redundantly re-bind all of this state each time a pipeline is bound, or to track what state has changed from pipeline to pipeline — either of which creates considerable overhead on CPU-constrained platforms.

Worse yet, these "low level" APIs (quotation marks are Khronos's not mine), actually limited certain GPU archs from accessing their full potential.

For certain implementations, the pipeline abstraction has also locked away a significant amount of the flexibility supported by their hardware, thereby paradoxically leaving many of their capabilities inaccessible in the newer and ostensibly "low level" API, though still accessible through older, high level ones. In effect, this is a return to the old problem of the graphics API artificially constraining applications from accessing the full capabilities of the GPU, only on a different axis.

Vulkan-Docs/proposals/VK_EXT_shader_object.adoc at main · KhronosGroup/Vulkan-Docs

The Vulkan API Specification and related tools. Contribute to KhronosGroup/Vulkan-Docs development by creating an account on GitHub.

github.com

Scott_Arm · Nov 21, 2023

This entire forum is one continuous argument about dx12.

cwjs · Nov 21, 2023

DavidGraham said:
They also admit what we've all suspected from the beginning: DX12/Vulkan DO NOT reduce CPU overhead, but actually directly INCREASE it.

This is not what's being said here. They're discussing significant overhead to pipeline binding, which brings challenges and could certainly cause a game to run slower, but is absolutely not the same as the api not providing options to reduce cpu overhead.

Andrew Lauritzen · Nov 21, 2023

Scott_Arm said:
This entire forum is one continuous argument about dx12.

It probably needs its own thread at this point TBH... no new information is being presented and it's increasingly becoming an irrelevant discussion outside of people just wanting to air various grievances.

Dictator · Nov 21, 2023

Scott_Arm said:
This entire forum is one continuous argument about dx12.

As it should be.

IMO i like DX12 for changing things kind of, making devs be more considerate, and giving us great things like RT, but from a user experience perspective it has been a net detriment even when one considers the amount of good DX12 titles. There are so many bad DX12 titles that it is overwhelming sometimes to think about.

There is a reason 2022 was one of my least favourite years of DF coverage...

Andrew Lauritzen · Nov 21, 2023

DavidGraham said:
They also admit what we've all suspected from the beginning: DX12/Vulkan DO NOT reduce CPU overhead, but actually directly INCREASE it.

Note that this entire extension discussion you pulled is talking *specifically about PSOs*. I don't think the fact that PSOs have added overhead is particularly contentious at this point, but similarly other areas of the API (submission and draw calls) are clearly better. Sadly in many games the negatives on the PSO side have outweighed other benefits in the API of course, but that doesn't mean it's fair to make a statement like the above with qualification. And frankly no statement that broad will ever be true in general for all games - it depends a lot on the specifics of the content and implementation.

Most of the stuff in that discussion is true, but you sort of have to be pretty familiar with the technical details of GPU drivers and hardware to understand the cases they are talking about. I caution against broadening the conclusions there too much if you are not.

Dictator said:
IMO i like DX12 for changing things kind of, making devs be more considerate, and giving us great things like RT, but from a user experience perspective it has been a net detriment even when one considers the amount of good DX12 titles.

You'd still call it a net detriment even when considering RT and similar features? i.e. you'd prefer to have games use DX11 even if it means not having the RT path? As I've argued, I don't think it's realistic or fair to separate the "bad stuff that is related to new APIs" from the good stuff that they enable, even if it makes the conclusions more nuanced.

Lurkmass · Nov 21, 2023

Andrew Lauritzen said:
Note that this entire extension discussion you pulled is talking *specifically about PSOs*. I don't think the fact that PSOs have added overhead is particularly contentious at this point, but similarly other areas of the API (submission and draw calls) are clearly better. Sadly in many games the negatives on the PSO side have outweighed other benefits in the API of course, but that doesn't mean it's fair to make a statement like the above with qualification. And frankly no statement that broad will ever be true in general for all games - it depends a lot on the specifics of the content and implementation.

Most of the stuff in that discussion is true, but you sort of have to be pretty familiar with the technical details of GPU drivers and hardware to understand the cases they are talking about. I caution against broadening the conclusions there too much if you are not.

PSOs aren't a completely bad idea either since they're designed to give IHVs the opportunity to remove more special graphics state and it's associated fixed function hardware without having the driver to implement complex runtime patching schemes. PSOs exist to avoid unnecessarily punishing hardware designs moving in a more "general purpose" direction where more state is implemented in software. PSOs aren't even all that controversial in the context of compute pipelines since there's far less special states as opposed to graphics pipelines.

Andrew Lauritzen · Nov 21, 2023

Lurkmass said:
PSOs aren't a completely bad idea either since they're designed to give IHVs the opportunity to remove more special graphics state and it's associated fixed function hardware without having the driver to implement complex runtime patching schemes. PSOs exist to avoid unnecessarily punishing hardware designs moving in a more "general purpose" direction where more state is implemented in software. PSOs aren't even all that controversial in the context of compute pipelines since there's far less special states as opposed to graphics pipelines.

Indeed, as the aforementioned Fortnite benchmarks show, there are definitely cases in which compile stutter can be reduced/better in the new APIs. While I broadly agree that the issue got worse in the new APIs for most users for many of the reasons discussed (the problem was legitimately simpler on the driver side; applications do not even have the data they would need to do it optimally on a given piece of hardware), it's still kind of annoying when people pretend there was never an issue in previous APIs. There has always been shader compilation stutter issues, and it has gotten worse as games use more shaders.

We really need some hardware movement to address the fundamental problem, which by and large is a consequence of the way GPUs do static resource/register allocation and occupancy. It's definitely a bigger problem in graphics than compute, but it's still a problem in both. Ex. permutations are still an issue when Nanite moves to base pass compute materials.

That said, it's not unreasonable to say that in a lot of ways PSOs have been a net negative on the user experience. Given no shortage of unreasonable statements, I don't think it's really worth arguing about that one

Lurkmass · Nov 21, 2023

Andrew Lauritzen said:
We really need some hardware movement to address the fundamental problem, which by and large is a consequence of the way GPUs do static resource/register allocation and occupancy. It's definitely a bigger problem in graphics than compute, but it's still a problem in both. Ex. permutations are still an issue when Nanite moves to base pass compute materials.

I still somewhat disagree with the idea giving graphics programmers the ability to do generalized indirect shader dispatches since that will just encourage spilling. Shader compilers don't just exist to do static register allocation to minimize register pressure/increase occupancy but they're also there to prevent spilling as much as possible too. Apple are able to more easily get away with what they can because they've effectively siloed themselves off of the rest of the industry. If anyone else had attempted to pull off a similar move in a competitive environment such as desktop or mobile graphics space, they'll either sink (hardware complexity/unsatisfactory performance) or swim (apps start taking advantage of the feature). Even Apple's dynamic register caching solution has limits where there's a *specific threshold* that just enough spilling will start cratering their performance.

Andrew Lauritzen said:
That said, it's not unreasonable to say that in a lot of ways PSOs have been a net negative on the user experience. Given no shortage of unreasonable statements, I don't think it's really worth arguing about that one

While PSOs aren't good for user experience, there's been undeniable benefits from a hardware/driver design perspective in how they implement (hardware or software) specific states ...

davis.anthony · Nov 21, 2023

I don't think supporting GPU's that were not released as DX12 GPU's helped things.

The HD7000 series is 'fully' DX12 compatible despite being a DX11 GPU and having no support for any of the new DX12 features such as RT, mesh shaders and other features.

DX12 should have represented a hard line of compatibility like DX8, 9, 10 and 11 before it.

Scott_Arm · Nov 21, 2023

Dictator said:
As it should be.
IMO i like DX12 for changing things kind of, making devs be more considerate, and giving us great things like RT, but from a user experience perspective it has been a net detriment even when one considers the amount of good DX12 titles. There are so many bad DX12 titles that it is overwhelming sometimes to think about.

There is a reason 2022 was one of my least favourite years of DF coverage...

I feel like every time dx gets brought up we end up with an inevitable derailment of the thread with a bunch of benchmark comparisons between dx11 and dx12 arguing that dx12 is worse and was a mistake without any real technical detail at all. It's just benchmarks and then a conclusion that repeats. Plus it's usually a self-selecting sample because it's normally games that have renderers that can have a dx11 equivalent. Plus all the vendor bias stuff creeps in. I just miss when a lot more devs posted on B3D and you could get bits and pieces of technical insight, and now I feel like there's a small handful of topics that every thread eventually derails into.

Lurkmass · Nov 21, 2023

davis.anthony said:
I don't think supporting GPU's that were not released as DX12 GPU's helped things.

The HD7000 series is 'fully' DX12 compatible despite being a DX11 GPU and having no support for any of the new DX12 features such as RT, mesh shaders and other features.

DX12 should have represented a hard line of compatibility like DX8, 9, 10 and 11 before it.

Technically that's not the case with D3D11. You can use DX9 hardware (feature level 9.x) with D3D11 ...

davis.anthony · Nov 21, 2023

Lurkmass said:
Technically that's not the case with D3D11. You can use DX9 hardware (feature level 9.x) with D3D11 ...

Which DX9 GPU's had tessellation hardware that was fast enough to actually use for DX11?

Lurkmass · Nov 21, 2023

davis.anthony said:
Which DX9 GPU's had tessellation hardware that was fast enough to actually use for DX11?

Doesn't really matter since your incorrect example wasn't all that helpful for your justification of hard resets and compatibility breaks. Developers could very much use D3D11 without all the new features on DX9 hardware. Hardware tessellation is a bit ironic since it's irrelevant these days when the older ways aged better so that would serve as a possible case against obsoleting perfectly functional older hardware especially when nobody knows if new feature xyz will stand the test of time to remain relevant ...

Comparative consideration of DirectX 12 in games *spawn

iroboto

Daft Funk

Andrew Lauritzen

Moderator

Shifty Geezer

uber-Troll!

Shifty Geezer

uber-Troll!

iroboto

Daft Funk

Andrew Lauritzen

Moderator

DavidGraham

Vulkan-Docs/proposals/VK_EXT_shader_object.adoc at main · KhronosGroup/Vulkan-Docs

Scott_Arm

cwjs

Andrew Lauritzen

Moderator

Dictator

Andrew Lauritzen

Moderator

Lurkmass

Andrew Lauritzen

Moderator

Lurkmass

davis.anthony

Scott_Arm

Lurkmass

davis.anthony

Lurkmass

Similar threads