Digital Foundry Article Technical Discussion [2023]

Status
Not open for further replies.
Dx11 can max out a gpu more than dx12.. the new api's features are propaganda
There is no meaningful difference on the GPU side with DX12 vs 11 other than the additional features and flexibility in 12. There's nothing super useful you can accomplish on the GPU with 11 that you can't do in a very 1:1 manner in 12 as well. When people talk about difficulties it is almost always with the CPU side.

I also don't know that it's fair to characterize the (primarily CPU) issues with the modern APIs as a problem of "understanding the complexity". It's not that DX12 is too complicated for people to use optimally; perhaps this is an issue for hobbyists but not for big game engines. It's more that some of the changes in the abstraction layer actually made problems *more difficult* in a global sense (across both the app and driver) than they were in DX11 where the driver actually had a simpler problem to solve. In the pursuit of eliminating unpredictable driver thread compilation (which absolutely happened in DX11) in a supposedly "portable" way, we expanded the permutation space of shaders by several orders of magnitude to accommodate potential future GPU or driver needs, even if those will never actually happen. So while a game can indeed pick better points to do PSO creations than the driver could do in the past, it has so many more PSOs to deal with than the whole system ever did in DX11 that the downsides can sometimes outweigh the upsides.

Don't overreact here folks... the new APIs are clearly better overall and a lot of the changes were needed to support things like raytracing and evolved shader programming models and so on. Realize that when we're reflecting on unforeseen issues in the APIs we're doing it in a technical post mortem kind of manner, not trying to call out a single factor that accounts for various issues in modern games. There's no single factor at play here, there's a combination of many factors that folks have mentioned, both technical and management. The real story is always more complicated, nuanced and different from game to game.

No its not.. its the other features that are rarely used like mesh shaders sample feedback streaming that i.m talking about
Nanite uses mesh shaders for what it's worth... the reality is for stuff that is disruptive to content especially it takes a lot longer before it gains enough market penetration to be able to rely on it without having an infeasible number of paths. Hell it was already hard enough to draw a line at DX12 for Nanite/VSM let alone smaller features like those!

Sampler feedback in particular has always had a pretty narrow window of utility in my opinion. It's addressing a problem that mostly doesn't exist... we've been using virtual texturing just fine for a long time without sampler feedback. Maybe in the far future when it's supported everywhere and there's only one path it will make more sense, but even the theoretical benefits in the best case are pretty marginal so it just doesn't really pass the cost/benefit test. I'm not sure why anyone is expecting anything important from this feature, nor why it was ever advertised to consumers...

No one is forcing devs to use DX12 either.
Hey now, every other reply is complaining about HW raytracing. You can't have it both ways! :D
 
Sampler feedback in particular has always had a pretty narrow window of utility in my opinion. It's addressing a problem that mostly doesn't exist... we've been using virtual texturing just fine for a long time without sampler feedback. Maybe in the far future when it's supported everywhere and there's only one path it will make more sense, but even the theoretical benefits in the best case are pretty marginal so it just doesn't really pass the cost/benefit test. I'm not sure why anyone is expecting anything important from this feature, nor why it was ever advertised to consumers...
So I guess an issue you point out here is that UE5 is just NOW using virtual texturing. So so so so so many engines do not. UE4 included which is still being used for tons of releases.
 
it's fair to characterize the (primarily CPU) issues with the modern APIs
These issues are decimating performance and experience for players, we have now documented many many cases where DX11 gives more performance than DX12/Vulkan whether on 1080p or 2160p. That's not acceptable.

It's not that DX12 is too complicated for people to use optimally
I don't agree with this, not when we have multiple developers going on record and stating that DX12/Vulkan is more work and a lot more complex than before. Even Khronos themselves stated the same. In Khronos own words, Vulkan increased CPU overhead, decreased GPU performance, prevented certain GPU archs from reaching their full potential, and forced many developers to develop workarounds that behave like the old APIs. Below are the highlights of their statement.

Many of these assumptions have since proven to be unrealistic.

The entire premise is unrealistic. Here Khronos explains why (in summary Vulkan/DX12 constrain gaming dynamism).

On the application side, many developers considering or implementing Vulkan and similar APIs found them unable to efficiently support important use cases which were easily supportable in earlier APIs. This has not been simply a matter of developers being stuck in an old way of thinking or unwilling to "rein in" an unnecessarily large number of state combinations, but a reflection of the reality that the natural design patterns of the most demanding class of applications which use graphics APIs — video games — are inherently and deeply dependent on the very "dynamism" that pipelines set out to constrain.

And here Khronos admits it all in one paragraph, they simply admit that problems Vulkan and DX12 sought to solve, are transferred to the game instead of the driver, without giving developers the necessary knowledge and tools to solve it (unlike the driver which handled it gracefully), thus directly causing the stutter problem.

As a result, renderers with a choice of API have largely chosen to avoid Vulkan and its "pipelined" contemporaries, while those without a choice have largely just devised workarounds to make these new APIs behave like the old ones — usually in the form of the now nearly ubiquitous hash-n-cache pattern. These applications set various pieces of "pipeline" state independently, then hash it all at draw time and use the hash as a key into an application-managed pipeline cache, reusing an existing pipeline if it exists or creating and caching a new one if it does not. In effect, the messy and inefficient parts of GL drivers that pipelines sought to eliminate have simply moved into applications, except without the benefits of implementation specific knowledge which might have reduced their complexity or improved their performance.

They also admit what we've all suspected from the beginning: DX12/Vulkan DO NOT reduce CPU overhead, but actually directly INCREASE it.

On the driver side, pipelines have provided some of their desired benefits for some implementations, but for others they have largely just shifted draw time overhead to pipeline bind time (while in some cases still not entirely eliminating the draw time overhead in the first place). Implementations where nearly all "pipeline" state is internally dynamic are forced to either redundantly re-bind all of this state each time a pipeline is bound, or to track what state has changed from pipeline to pipeline — either of which creates considerable overhead on CPU-constrained platforms.

Worse yet, these "low level" APIs (quotation marks are Khronos's not mine), actually limited certain GPU archs from accessing their full potential.

For certain implementations, the pipeline abstraction has also locked away a significant amount of the flexibility supported by their hardware, thereby paradoxically leaving many of their capabilities inaccessible in the newer and ostensibly "low level" API, though still accessible through older, high level ones. In effect, this is a return to the old problem of the graphics API artificially constraining applications from accessing the full capabilities of the GPU, only on a different axis.

 
Sampler feedback in particular has always had a pretty narrow window of utility in my opinion. It's addressing a problem that mostly doesn't exist... we've been using virtual texturing just fine for a long time without sampler feedback. Maybe in the far future when it's supported everywhere and there's only one path it will make more sense, but even the theoretical benefits in the best case are pretty marginal so it just doesn't really pass the cost/benefit test. I'm not sure why anyone is expecting anything important from this feature, nor why it was ever advertised to consumers...

To Microsoft’s credit they never hyped sampler feedback as anything more than icing on the cake. It was marketed to consumers as an optimization on top of virtual texturing / PRT techniques which have been around a long time. The problem is that there’s no cake. Virtual texturing really should have been a basic feature of all game engines by now.
 
Sampler feedback in particular has always had a pretty narrow window of utility in my opinion. It's addressing a problem that mostly doesn't exist... we've been using virtual texturing just fine for a long time without sampler feedback. Maybe in the far future when it's supported everywhere and there's only one path it will make more sense, but even the theoretical benefits in the best case are pretty marginal so it just doesn't really pass the cost/benefit test. I'm not sure why anyone is expecting anything important from this feature, nor why it was ever advertised to consumers...
Sampler Feedback alone is not such a big deal. It's meant to be combined with high bandwidth streaming from an NVMe SSD using DirectStorage. That is Sampler Feedback Streaming, which is a very big deal. In addition to virtual texturing, it could further save VRAM by a lot. Take a look at this presentation from Intel:

https://www.intel.cn/content/dam/de...back-texture-space-shading-direct-storage.pdf using Hardware improves it a lot compared to a software approach. Slide 33 demonstrates an efficient usage of video memory that is far beyond what Unreal Engine 5 is doing right now. There is also another feature of SF, texture space shading, which improves performance. Steam HW Survey clearly shows that DX12U is supported in a ton of GPUs now and the Series consoles support it, so there is no reason to not implement these features into the engine.

Epic should really be more serious about these features, a next gen engine not using next gen hardware is not ideal at all. This is exactly the reason why PC gaming is in such a troubled state now – games are trying to push next gen fidelity while still relying on old rendering methodes.
 
These issues are decimating performance and experience for players, we have now documented many many cases where DX11 gives more performance than DX12/Vulkan whether on 1080p or 2160p. That's not acceptable.


I don't agree with this, not when we have multiple developers going on record and stating that DX12/Vulkan is more work and a lot more complex than before. Even Khronos themselves stated the same. In Khronos own words, Vulkan increased CPU overhead, decreased GPU performance, prevented certain GPU archs from reaching their full potential, and forced many developers to develop workarounds that behave like the old APIs. Below are the highlights of their statement.



The entire premise is unrealistic. Here Khronos explains why (in summary Vulkan/DX12 constrain gaming dynamism).



And here Khronos admits it all in one paragraph, they simply admit that problems Vulkan and DX12 sought to solve, are transferred to the game instead of the driver, without giving developers the necessary knowledge and tools to solve it (unlike the driver which handled it gracefully), thus directly causing the stutter problem.



They also admit what we've all suspected from the beginning: DX12/Vulkan DO NOT reduce CPU overhead, but actually directly INCREASE it.



Worse yet, these "low level" APIs (quotation marks are Khronos's not mine), actually limited certain GPU archs from accessing their full potential.



Very revealing truth about dx12 and vulkan hyped api's😁
 
Sampler Feedback alone is not such a big deal. It's meant to be combined with high bandwidth streaming from an NVMe SSD using DirectStorage. That is Sampler Feedback Streaming, which is a very big deal. In addition to virtual texturing, it could further save VRAM by a lot. Take a look at this presentation from Intel:

https://www.intel.cn/content/dam/de...back-texture-space-shading-direct-storage.pdf using Hardware improves it a lot compared to a software approach. Slide 33 demonstrates an efficient usage of video memory that is far beyond what Unreal Engine 5 is doing right now. There is also another feature of SF, texture space shading, which improves performance. Steam HW Survey clearly shows that DX12U is supported in a ton of GPUs now and the Series consoles support it, so there is no reason to not implement these features into the engine.

Epic should really be more serious about these features, a next gen engine not using next gen hardware is not ideal at all. This is exactly the reason why PC gaming is in such a troubled state now – games are trying to push next gen fidelity while still relying on old rendering methodes.

Before saying it does better wait real game performance. The same UE4 games are not the future. We will see Unreal Engine 5 games and I think it means better performance on CPU side than UE 4.

Don't forget than game are made most of the time in 4 to 5 years.
 
Sampler feedback in particular has always had a pretty narrow window of utility in my opinion. It's addressing a problem that mostly doesn't exist... we've been using virtual texturing just fine for a long time without sampler feedback. Maybe in the far future when it's supported everywhere and there's only one path it will make more sense, but even the theoretical benefits in the best case are pretty marginal so it just doesn't really pass the cost/benefit test. I'm not sure why anyone is expecting anything important from this feature, nor why it was ever advertised to consumers...

Because of Microsoft, they made such a massive deal of this back when they unveiled the Xbox Series consoles and the Velocity Architecture.
 
https://www.dexerto.com/gaming/acti...-from-pc-gamers-than-console-players-2138822/

I wouldn't say there is a crisis in PC gaming. Crisis for the people like us who "wear the latest fashion"? Maybe. But that's relative. Companies want to get their AAA games out as soon as possible, while giving players a polished console gaming experience. The PC port must get out in the same day, it seems, with a polished gaming experience later. Moral: only buy games like Diablo 4 day one, or most Capcom games.
 
I liked John's observation that there was a golden age of console ports on PC straddling mid PS360 to mid PSbone, where you could just throw anything at a PC and it'd brute force great performance.
 
I liked John's observation that there was a golden age of console ports on PC straddling mid PS360 to mid PSbone, where you could just throw anything at a PC and it'd brute force great performance.

That's only because generational GPU performance improvements back then were excellent and not the much smaller ones we have today.

The DX10 and DX11 GPU era was awesome and for me personally is what I consider to be PC at the peak of its powers.
 
I don't agree with this, not when we have multiple developers going on record and stating that DX12/Vulkan is more work and a lot more complex than before. Even Khronos themselves stated the same. In Khronos own words, Vulkan increased CPU overhead, decreased GPU performance, prevented certain GPU archs from reaching their full potential, and forced many developers to develop workarounds that behave like the old APIs. Below are the highlights of their statement.
David... I was highly involved in the DX12 spec. My name is on the Vulkan spec. You don't need to quote public statements from any of these people to me. Humility aside, I may be one of the most qualified to talk about these issues on the planet and I'm telling you that you are falling into the trap that I described in the last post. You are cherry picking post mortem discussions that support your agenda and not understanding the subtleties of what is being discussed. I realize I am being condescending here but I hope you can imagine how annoying it is to have such things quoted at you indignantly.

Sampler Feedback alone is not such a big deal. It's meant to be combined with high bandwidth streaming from an NVMe SSD using DirectStorage. That is Sampler Feedback Streaming, which is a very big deal.
The two features are not connected. You can happily use DirectStorage streaming with any other method of determining the resident set. The point here is to understand what the API feature actually does, which is not actually a ton: it marks some regions based on the texture addressing logic. Usually the marking part is implemented in "software" in the driver as well so there's really no magic. While it's potentially convenient to have hardware compute things like the regions touched by a big aniso kernel, it's not that hard to do something similar and conservative cheaply.

https://www.intel.cn/content/dam/de...back-texture-space-shading-direct-storage.pdf using Hardware improves it a lot compared to a software approach. Slide 33 demonstrates an efficient usage of video memory that is far beyond what Unreal Engine 5 is doing right now.
The cases where they are claiming big gains are not actually dependent on the sampler feedback feature. The slide you reference is comparing to no virtual texturing at all... it's also an obviously made-up case. Are you expecting people to ship a game with 350GB of texture downloads in the near future? UE4 and 5 can do this stuff just fine for a long while.

There is also another feature of SF, texture space shading, which improves performance.
Texture space shading is a separate technique and it also does not rely on sampler feedback, though there can be some more benefits from it than pure texture streaming. With pure texture streaming the absolute best you can really gain with sampler feedback + TR is eliminating some filtering borders, which can be maybe a few % of VRAM use of the physical page pool (which is maybe 500MB-1GB typically depending on the game). Not nothing but hardly something that consumers need to care about.

Texture space shading the gain can be a bit more because you're not just eliminating the VRAM but potentially reducing the over-shade work by similar single digit percentages. That said, the majority of over-shade in texture space shading typically comes from visibility (i.e. occluded stuff), not padding. Texture space shading is neat and will likely get some use in the future, but it has tradeoffs that make it not a win in all cases. It also doesn't compare quite as well to modern visibility buffer (like Nanite uses) and VRS than it did to older deferred shading. The big advantage of texture space shading is it gives you a separate dial to scale shading costs without adjusting the visibility sampling rate, similar to VRS. That said if you are aiming for relatively sharp shading results it will usually end up being similar amounts of total work to a visibility buffer approach.

This is exactly the reason why PC gaming is in such a troubled state now – games are trying to push next gen fidelity while still relying on old rendering methodes.
Guys I have no problem with you taking a consumer standpoint of "the end result is unacceptable and we should call out companies for it" as DF is doing. That's great and that's really all the feedback that is needed. I'm also fine with trying to outline some of the challenges that contribute to the various complexities from my perspective, which admittedly is just as one developer but who has experience in many of the areas being discussed. Where I have a problem is where you guys go all Dunning-Kruger and get indignant about some very specific feature or issue that you think is responsible for everything, which is frankly never the case. It's particularly crazy to claim that non-use of new API features is why games are having problems... that doesn't even pass the smell test.

I get that it's the mundane answer but there are a lot of factors that contribute to things, and the causes are not even always the same from game to game. While there's some commonality that we have discussed (i.e. changes to pipelines/PSOs) for various specific issues (shader compilation), beyond that there's a large swath of things that can be happening on a case to case basis.

Regarding CPU stuff and multithreading I'll just say this is obviously an area I spent a fair amount of time while at Intel. The reality is yes, modern CPUs have a lot of cores that go relatively unused in games... and almost every other application you use. The only stuff that really scales still is the embarrassingly parallel stuff. Games are slowly getting better but the reality is while certain embarrassingly parallel parts of games scale well, other parts still do not and never have. It can certainly vary a bit from engine to engine, but no one has "solved" this problem... in general the games that have fewer issues are doing less stuff, in addition to doing it more efficiently. None of this is to say CPU issues and bottlenecks shouldn't be called out - they absolutely should and I'll bang that drum as loud as anyone! My point is just that it's this is like a general computing problem that likely doesn't have a silver bullet; writing parallel code is difficult, particularly in the context of "gameplay code" that gets exposed to non-experts via scripting and so on.
 
Status
Not open for further replies.
Back
Top