Shader Compilation on PC: About to become a bigger bottleneck?

This is working with Steam on Linux. They are also using incremental updates for pre-compiled shader caches
 
Well it certainly doesn't help when it's the drivers causing PC Spikes after exiting a game, which could lead to issues on the next game loaded. So you need to reboot after every single gaming session until the fix is released.


Nvidia acknowledges CPU usage spikes in GeForce driver 531.18​

Rebooting the PC or rolling back the driver can mitigate the issue until Nvidia releases its hotfix​


Some users reported noticing CPU usage increased by up to 15 percent after closing out of a game. Checking the Process tab revealed that the Nvidia Container was the culprit. To be more precise, the latest version of the Nvidia Game Session Telemetry Plugin seems to be the source of the issue.
 
Epic are implementing an updated PSO precaching system for UE5.2 with improved performance, reducing the number of PSOs by excluding unnecessary ones, catching more corner cases, as well as the ability to skip drawing the object if their PSOs aren't ready by draw time.

A new PSO precaching mechanism was introduced as Experimental in 5.1 to improve PSO hitching in DX12 titles. Improvements to this system in 5.2 include:
  • We've improved the performance and stability of the system. There were various corner cases that needed to be addressed.
  • We now skip drawing objects if their PSOs aren't ready yet. The system aims to have the PSO ready in time for drawing, but it will never be able to guarantee this. When it's late, it is now possible to skip drawing the object instead of waiting for compilation to finish (and hitching).
  • The number of PSOs to precache has been reduced due to improved logic that omits ones that will never be used.
  • We've improved the old (manual) PSO cache system so that it can be used alongside precaching.



Good stuff.
 
Last edited:
So instead of hitching we'll have missing objects like an LOD pop in?
Pretty much. I think that's how I'd prefer it to be.

In my mind ideally devs would create as full of a PSO cache as possible and precompile up front, and then hopefully the precaching system will catch any potential ones that they missed. Those could be precached in the background while we play, and ideally those corner cases would be compiled in time, but in the event they're not, the object/effect just pops in instead.
 
Pretty much. I think that's how I'd prefer it to be.

In my mind ideally devs would create as full of a PSO cache as possible and precompile up front, and then hopefully

Yeah I think that's a better compromise. The effect causing the stutternis likely just one small part of the screen and thus it not loading, or loading slower it likely to be preferable to the whole.image stuttering and potentially impacting gameplay.
 
Yeah I think that's a better compromise. The effect causing the stutternis likely just one small part of the screen and thus it not loading, or loading slower it likely to be preferable to the whole.image stuttering and potentially impacting gameplay.
Yep. I'd rather have an object in the scene pop in some milliseconds late rather than have the entire thing stutter for that amount of time.


Another interesting thing I noticed when looking at the roadmap was that with UE5.2 they have improved Garbage Collection performance and memory use.

Developer Tools
DANIEL TUTINO-GALLETTI
Posted on March 2
General updates and quality-of-life improvements for developer iteration in UE 5.2 include:

  • Visual Studio 2022 is now default
  • All enabled warnings for the Clang Static Analyzer have been addressed
  • C UnrealHeaderTool is deprecated and replaced with C# UnrealHeaderTool
  • Optimizations to Garbage Collection performance and memory use
  • UI and usability improvements to Unreal Insights components

I know that garbage collection/memory management is an often cited reason for hitching in PC games alongside compilation stuttering.. so I wonder about this point. Would be nice if it helps reduce traversal stuttering while loading in new areas/assets.

Anyone have any good links that can help me understand more about this? I understand that the memory setup between PC and console is very different.. but I find it odd that many PC games have these traversal stutters where consoles don't, considering we have faster CPUs and far more memory. I suppose the game is just coded in a more general way which favors the console design.. but I've always wondered why PC games can't just load things in with a more fine grained approach and get more of that data into RAM further ahead of time?

I actually wonder if it has more to do with the decompression happening on the CPU than anything else. DirectStorage would also help alleviate this particular issue.
 
Yep. I'd rather have an object in the scene pop in some milliseconds late rather than have the entire thing stutter for that amount of time.


Another interesting thing I noticed when looking at the roadmap was that with UE5.2 they have improved Garbage Collection performance and memory use.



I know that garbage collection/memory management is an often cited reason for hitching in PC games alongside compilation stuttering.. so I wonder about this point. Would be nice if it helps reduce traversal stuttering while loading in new areas/assets.

Anyone have any good links that can help me understand more about this? I understand that the memory setup between PC and console is very different.. but I find it odd that many PC games have these traversal stutters where consoles don't, considering we have faster CPUs and far more memory. I suppose the game is just coded in a more general way which favors the console design.. but I've always wondered why PC games can't just load things in with a more fine grained approach and get more of that data into RAM further ahead of time?

I actually wonder if it has more to do with the decompression happening on the CPU than anything else. DirectStorage would also help alleviate this particular issue.

I’m guessing garbage collection is related to the engine tools and not the real-time engine? I’d be very surprised if there were any gc, unless there’s a scripting language or blueprints that use gc.
 
I’m guessing garbage collection is related to the engine tools and not the real-time engine? I’d be very surprised if there were any gc, unless there’s a scripting language or blueprints that use gc.
Yea, that update is seemingly for the engine tools.. but if that doesn't apply to retail games, then why do some developers blame stuttering in games on garbage collection?
 
Yea, that update is seemingly for the engine tools.. but if that doesn't apply to retail games, then why do some developers blame stuttering in games on garbage collection?

I know they do for Unity because Unity has gc, at least if they're not using the new ECS. As far as I know UE games don't have any gc, but maybe I'm wrong and some of the scripting does.
 
Oh, what is this?

FsAIaNmXwAEbFMV



Things are happening!
 
That extension is a disaster ... :/

I was hoping that Khronos Group wouldn't straight up bring back the D3D11/OpenGL separate shader objects model to Vulkan. Is this the descent of Vulkan where it is going to meet the same fate of accumulating cruft like OpenGL did ? New generations of APIs shouldn't have to literally REPEAT the same mistakes as their predecessors did ...

At least graphics pipeline libraries was a reasonable compromise between separate shader objects and pipelines because linking monolithic pre-rasterization pipelines with separate fragment shader stages did match most graphics hardware since their compilation/code gen wasn't interdependent on each other. It's questionable if this is even a good idea on AMD HW because ever since primitive shaders were introduced changing any different set of combinations between vertex, tessellation, and geometry shaders will lead to a change in the code gen of the hardware's hull (surface ?) or geometry shader and doing this during runtime may incur hidden driver shader recompilations that we wanted to avoid in the first place with the old model ...

Is anyone baffled how there's now 3(!) different compilation models ? We have the default monolithic pipelines, graphics pipeline libraries, and now we have the dreaded separate shader objects. What other cursed model is Khronos Group going to come up next ? I thought we we're supposed to embrace a more explicit future with gfx APIs but with how much redundant functionality Khronos Group keeps introducing with Vulkan that future is very much in doubt since it makes driver development more complex and harder ...

Not even Apple's Metal API is that backwards in terms of design (probably out of hardware limitation) and quite a few graphics programmers like to rave about the simplicity in it's design ...
 
If the developers behind translations layers such as DXVK and VKD3D-proton (they have to deal with cursed functionality all the time) expressed their opposition to that extension then it's a sign that Vulkan is headed for the dark ages ...

I know that Microsoft has their own translation layer efforts for VK/GL and they've made some of their own controversial API design decision but if they absolutely have any foresight then they can show it by not following the same direction as Khronos has now ...

Performance​

One natural question to ask at this point is whether all this new flexibility comes at some performance cost. After all, if pipelines as they were originally conceived needed so many more restrictions, how can those restrictions be rolled back without negative consequences?

On some implementations, there is no downside. On these implementations, unless your application calls every state setter before every draw, shader objects outperform pipelines on the CPU and perform no worse than pipelines on the GPU. Unlocking the full potential of these implementations has been one of the biggest motivating factors driving the development of this extension.

On other implementations, CPU performance improvements from simpler application code using shader object APIs can outperform equivalent application code redesigned to use pipelines by enough that the cost of extra implementation overhead is outweighed by the performance improvements in the application.

In either case, all conformant VK_EXT_shader_object implementations are tested to meet specific performance requirements:

  • Draw calls using shader objects must not take more than 150% of the CPU time of draw calls using fully static graphics pipelines
  • Draw calls using shader objects must not take more than 120% of the CPU time of draw calls using maximally dynamic graphics pipelines
  • Dispatch calls using compute shader objects must not be measurably slower than dispatch calls using compute pipelines
  • Creating a shader object from binary shader code must not take more than 150% of the CPU time of the cost of copying an equivalent amount of data into device local memory
These tests are intended to establish a minimum performance bar for VK_EXT_shader_object implementations that developers can rely on. This means that if a driver advertises support for VK_EXT_shader_object, you can depend on it to perform well.

If you’re interested in the details of this extension’s performance goals and design criteria, or just more information about some of the motivations that drove the development of this extension, please see the formal extension proposal.

Why betray one of the main tenants of your own API which was to be explicit in it's design ? What happened to Khronos Group and have they've gone insane ? If they have doubts about performance why bring this functionality up ? Vulkan had a bad start but it was genuinely making decent progress but now Khronos Group are rolling back on their commitment to a more explicit future and Vulkan has started losing favour. It looks like D3D12 is the only sane API left to have a more explicit future ...

I think graphics programmers may seriously start having to consider ditching Vulkan for D3D12 unless they're forced to make app for mobile platforms ...
 
I think we have had enough of the overly overblown "Explicitness" on PC, more than half of DX12 ports are flawed in major ways. Performance is not improved, and user experience is severely compromised in more ways than one, to quote Khronos on this:

In summary, shader objects impose substantially fewer restrictions on applications compared to pipelines, and enable dynamism-heavy applications like games and game engines to avoid the explosive pipeline permutation combinatorics, which until now might have been seen as a cost of admission for access to modern graphics APIs.


I hope DX12 follows as well, so we can put this dark period behind us.
 
Back
Top