Shader Compilation on PC: About to become a bigger bottleneck?

That's the trick right? Where are they stored? Do the graphics vendors document these locations? Would a "clean" driver remove and reinstall fix it? Maybe?
Generally speaking, cleaning the shader cache is easy, you can do it with any of the following easy steps:

1-Install a new GPU driver either a new or an old one, developers should/usually test mutliple drivers anyway.

2-Or, wipe out the DirectX cache using the Disk Cleanup tool in Windows.

3-Or, manually delete the cached files from the driver or the game directory.
 
Last edited:
Generally speaking, cleaning the shader cache is easy, you can do it with any of the following easy steps:

1-Install a new GPU driver wither a new or an old one, developers should/usually test mutliple drivers anyway.

2-Or, wipe out the DirectX cache using the Disk Cleanup tool in Windows.

3-Or, manually delete the cached files from the driver or the game directory.

Yeah, I don't quite understand the concern that this would require a massive time-sink of constantly re-imaging entire development/QA testing departments. Are there examples of byzantine simultaneously active shader cache locations that were not specified by the game itself? I get being 'thorough' and I'll defer to those with actual industry experience, but I'm curious - are there actual examples where this has been required to fully nuke a games cache, or at least to the point where it should be expected and not an anomaly?

My understanding, is that by default, the vast, vast majority of shader caches, at least for Nvidia, will be in stored in AppData\LocalLow\NVIDIA\PerDriverVersion\DXCache. Now this has changed over the years, quite recently it was %localappdata%\Nvidia\DXCache, but that's neither here nor there - once the driver is updated and Nvidia has decided on a shader location, that's where they'll go, Nvidia doesn't keep multiple active locations going. This location is where the overwhelming majority of caches will be found. Close the game, delete all the files you can in there, and you've erased the games cache - at least has been my experience many, many times over the years.

Now, every game can decide where it wants to put it's cache too, they're not beholden to Nvidia's default location. Some have put it in a %localappdata%\<gamename> folder. Some have even put it in the game's config folder within My Documents. So yes, while the Nvidia default DXCache location is the majority, it's no the only one. However, again - my limited understanding here - those unique locations will be specified by the developer/game engine. Something the developer would...know?

Like if you're communicating to your testing staff, dedicated QA dept or not, that you want the results from a first-run like experience, you would think "delete this folder beforehand" would be included as part of that communication? I've honestly never even seen a case where reinstalling the driver was required to wipe a cache. As long as the game's .exe isn't running, I just delete the game's compiled cache files in their respective folder. I have yet to see a game where this didn't suffice to create a first-run environment wrt to shader caches.
 
Last edited:
PC gamers can take the time to play a game, see stutters immediately.. then play the same section over again with no stutters... then delete the driver cache, and see stutters again... to figure out that it's due to compilation.. but developers can't? Before they release to the world? This is some big ask?

Like "collecting the PSOs is a long, time consuming process"... yes and? Is my experience not worth that time?

"It still won't catch everything"... ok and? Do the best you can to get everything. A game shouldn't stutter the first time you do an attack.. or jump, or open a chest.. and have made no attempt to precompile anything.

To me that is negligence.
 
PC gamers can take the time to play a game, see stutters immediately.. then play the same section over again with no stutters... then delete the driver cache, and see stutters again... to figure out that it's due to compilation.. but developers can't? Before they release to the world? This is some big ask?

Like "collecting the PSOs is a long, time consuming process"... yes and? Is my experience not worth that time?

"It still won't catch everything"... ok and? Do the best you can to get everything. A game shouldn't stutter the first time you do an attack.. or jump, or open a chest.. and have made no attempt to precompile anything.

To me that is negligence.

If you're a dev studio, even small, with technical people or QA people then I agree. If you're some person that's maybe not even a programmer that's making a UE game for this Steam fest, then I think it's different. There are "self taught" people making indie games with Unity, UE that have never stepped foot inside any kind of formal classroom, that barely understand programming (they may even just be using Blueprints). Really depends what we're talking about. For those people you probably want engines like UE to have good/sane defaults, and it seems like they've taken steps in UE 5.3. Just not sure if games made with that version are showing up yet, and what the defaults are for people that need more hand-holding from the engine.
 
Yeah, I don't quite understand the concern that this would require a massive time-sink of constantly re-imaging entire development/QA testing departments. Are there examples of byzantine simultaneously active shader cache locations that were not specified by the game itself? I get being 'thorough' and I'll defer to those with actual industry experience, but I'm curious - are there actual examples where this has been required to fully nuke a games cache, or at least to the point where it should be expected and not an anomaly?

My understanding, is that by default, the vast, vast majority of shader caches, at least for Nvidia, will be in stored in AppData\LocalLow\NVIDIA\PerDriverVersion\DXCache. Now this has changed over the years, quite recently it was %localappdata%\Nvidia\DXCache, but that's neither here nor there - once the driver is updated and Nvidia has decided on a shader location, that's where they'll go, Nvidia doesn't keep multiple active locations going. This location is where the overwhelming majority of caches will be found. Close the game, delete all the files you can in there, and you've erased the games cache - at least has been my experience many, many times over the years.

Now, every game can decide where it wants to put it's cache too, they're not beholden to Nvidia's default location. Some have put it in a %localappdata%\<gamename> folder. Some have even put it in the game's config folder within My Documents. So yes, while the Nvidia default DXCache location is the majority, it's no the only one. However, again - my limited understanding here - those unique locations will be specified by the developer/game engine. Something the developer would...know?

Like if you're communicating to your testing staff, dedicated QA dept or not, that you want the results from a first-run like experience, you would think "delete this folder beforehand" would be included as part of that communication? I've honestly never even seen a case where reinstalling the driver was required to wipe a cache. As long as the game's .exe isn't running, I just delete the game's compiled cache files in their respective folder. I have yet to see a game where this didn't suffice to create a first-run environment wrt to shader caches.

In the UE documentation:
Use the -clearPSODriverCache switch consistently for all test runs that assess the smoothness of your game. Without it, hitches may be masked by the PSO cache built by the graphics driver and left over from the previous runs.
 
So earlier in the week I asked a few people who work in the industry if it would be possible for Microsoft to make a "Fossilize" equivalent for DirectX PSOs on Windows, which could be used by gaming clients like Steam to precache.. and they said there's no reason that they see why it wouldn't be possible. DX12 doesn't have a layer system meant for plugins like Vulkan does and would have to be hacked (causing problems with anti-cheats).. but Microsoft themselves could do it. They say the only thing it comes down to is convincing Microsoft that it's worthwhile to build this feature.

Well, I say it's damn well worth while. Maybe we need to start raising up this issue specifically when complaining about compilation stuttering in games and put more pressure on MS to do what's right for issues caused by the design of their API?? Bring up how Valve and Vulkan are doing this on Linux (and Windows) and that Microsoft should be doing it for their players on Windows with DirectX. Especially if they aren't planning any changes and PSOs aren't going away anytime soon.

Either that or maybe we need a Valve supported DirectX to Vulkan translation layer for Windows, which would then just allow Fossilize to work as it's already intended to work?

Wonder if @Andrew Lauritzen could maybe chime in with any insight into the likelihood that Microsoft could be convinced into doing something like this? Or what would be the best way to convince them. At this point I'm wondering if it can be done, then why isn't it? I'm sure Valve would absolutely be willing to work with Microsoft to integrate such a system into Steam.
 
PSOs aren't purely a function of API design. PSOs are a function of hardware design as well. Here's an old statement below from a Direct3D developer ...
PSOs exist so that drivers can do cross-pipeline optimizations, either incorporating fixed-function state into shader code (instead of patching) or to perform cross-stage optimizations
On some hardware, dynamic toggling of shader stages and states can lead to reduced performance and massive compilation hitches. On Intel HW, dynamically toggling Geometry/Tessellation/Compute shaders or using stream output can disable their position-only shading tiled based rendering pipeline optimization. On AMD HW, dynamically toggling Geometry/Tessellation Shaders or using stream output causes recompilations due to the combined cross-stage nature of primitive shaders ...

Microsoft aren't exposing PSOs as the mechanism for shader/state management for the laughs. PSOs are a necessary abstraction for some hardware more than others ...
 
So earlier in the week I asked a few people who work in the industry if it would be possible for Microsoft to make a "Fossilize" equivalent for DirectX PSOs on Windows, which could be used by gaming clients like Steam to precache.
Isn't Fossilize just a shader cache? DX has this for a long time now, and the issue with any download based solution is still the same - the amount of configuration permutations which will have to be supported by the system.

I'm also still completely against any system which can't work locally without an online component.
 
Isn't Fossilize just a shader cache? DX has this for a long time now, and the issue with any download based solution is still the same - the amount of configuration permutations which will have to be supported by the system.

I'm also still completely against any system which can't work locally without an online component.
Fossilize, as far as I understand it, essentially collects pipeline state data which which can then be collected (by Valve) and downloaded along with the game package to run on any GPU+driver to create a pipeline cache for that specific hardware ahead of time, without the game even being required. This essentially allows them to "precache" without the game.

DirectX doesn't have anything like that.

Here's the thing about your issue with it though. If you don't have an internet connection.. you're not downloading the game in the first place...lol. But regardless of that.. if you don't have an internet connection.. then the game will just do what it always does, and build it's own cache as you play it.

There's literally NO downsides. This isn't some replacement for developers doing the best they can to not have this issue in the first place.. it's essentially a redundancy.. because we can't depend on every studio to deal with this issue evidently, for whatever reason.

In the future I hope that the people who are creating these APIs consider creating the mechanisms to do exactly this.
 
Microsoft aren't exposing PSOs as the mechanism for shader/state management for the laughs. PSOs are a necessary abstraction for some hardware more than others ...
Please correct me if I understood this wrong.

Essentially what you mean is, DX11 with it's dynamic state switching caused performance drops on AMD and Intel hardware, so DX12 had to implement PSOs to better suit AMD and Intel hardware and stop any potential drops, right?
 
Fossilize, as far as I understand it, essentially collects pipeline state data which which can then be collected (by Valve) and downloaded along with the game package to run on any GPU+driver to create a pipeline cache for that specific hardware ahead of time, without the game even being required. This essentially allows them to "precache" without the game.

I can't speak with any authority on how it actually works, but I can at least relate my experience watching what it does on Linux. What it seems to do is collect the pipeline data that's gathered by playthroughs from other gamers and continually updated as it's expanded upon for that particular game. Then, after you've downloaded that update, the Fozzillize process goes to work - at least if you have it selected to run in the background. Up to 3 low-priority background threads are created (at least on my 12400f), which then process those pipelines into fully compiled shaders for your particular hardware. So yeah, it's not really like existing DirectX shader caches that the Nvidia driver creates, these are not downloading fully compiled shader caches for each and every GPU, your local system still has to do plenty of work.

You can select to opt-out of background processing too, which means after every update (and they're frequent!), you'll have to wait while the new shaders are compiled before the game is launched, but at least in this process you get full CPU dedicated to this, whereas the background process are less intensive by design.

The downside to this system of course is that it's store-specific, which means for non-Steam games on Linux, you have the same problem as Windows, but usually worse. For DX9/11 games at least that do JIT compiling, you can take advantage of dxvk-async which can help things greatly (truly saves some games on Windows too ime!), but that's not possible on DX12 titles, so you're at the mercy of devs precompiling just as you are on Windows unless you get the game from Steam.
 
I can't speak with any authority on how it actually works, but I can at least relate my experience watching what it does on Linux. What it seems to do is collect the pipeline data that's gathered by playthroughs from other gamers and continually updated as it's expanded upon for that particular game. Then, after you've downloaded that update, the Fozzillize process goes to work - at least if you have it selected to run in the background. Up to 3 low-priority background threads are created (at least on my 12400f), which then process those pipelines into fully compiled shaders for your particular hardware. So yeah, it's not really like existing DirectX shader caches that the Nvidia driver creates, these are not downloading fully compiled shader caches for each and every GPU, your local system still has to do plenty of work.

You can select to opt-out of background processing too, which means after every update (and they're frequent!), you'll have to wait while the new shaders are compiled before the game is launched, but at least in this process you get full CPU dedicated to this, whereas the background process are less intensive by design.

The downside to this system of course is that it's store-specific, which means for non-Steam games on Linux, you have the same problem as Windows, but usually worse. For DX9/11 games at least that do JIT compiling, you can take advantage of dxvk-async which can help things greatly (truly saves some games on Windows too ime!), but that's not possible on DX12 titles, so you're at the mercy of devs precompiling just as you are on Windows unless you get the game from Steam.
Yes. Essentially the idea is that as the game is played, this pipeline data is collected and uploaded to Valve's servers. That data can then in turn be bundled with the game to be downloaded as "Shader updates" and compiled by a players GPU+Driver before they play the game. It can do this before the player even has the game downloaded and installed. The shader update can be downloaded first, and compiled as the game proper is being downloaded. The pipelines the game requires will be ready to go for the first playthrough.

So ideally the developers/QA would play their game before it launches, and Fossilize would collected and build the "shader pack" ready to download for players to precache for their first time playthrough to make it as compilation stutter-free as possible. And as you stated, the beauty of this would be that as more and more people play, even if the developers weren't able to build a full complete cache for some reason, it would quickly be updated by the massive player base.

Steam already has all this stuff integrated into the client. Fossilize doesn't require Steam per-se either. It's integrated into Steam, but developers can use it for other purposes. If Microsoft wanted they could build something with this functionality which plugs into DirectX and gaming clients could utilize it to do the same thing. What we need to do is convince Microsoft to do it. At the very least we could at least start asking questions about it and maybe get the ball rolling on something like this.
 
Please correct me if I understood this wrong.

Essentially what you mean is, DX11 with it's dynamic state switching caused performance drops on AMD and Intel hardware, so DX12 had to implement PSOs to better suit AMD and Intel hardware and stop any potential drops, right?
It wasn't always this way. Even older AMD/Intel HW designs used to feature more granular graphics pipelines that had separable logical shader stages and dynamic states. It wasn't that long ago for AMD/Intel where shader objects and dynamic states weren't harmful abstractions to their graphics architectures but they soon had very different ideas about the future of hardware evolution after the release of D3D12. WIth AMD/Intel, their hardware graphics pipeline became more monolithic in design on both Vega (2 years after D3D12) and Gen 11 (4 years after D3D12) respectively ...

It was only AFTER the release of D3D12 where shader objects and having tons of dynamic states became intractable issue for them ...
 
It was an issue on all AMD h/w starting with GCN1 at least - which definitely came before DX12.
That's not even close to being true ...

With just over 1000 lines of enablement code, open source driver developers were able to easily pass the conformance test suite for separate shader objects which consisted well of over 200k unit tests for the feature from GFX6 (GCN1) to GFX8 (GCN3/4) ...
 
It is 100% true which is why they've made Mantle in the first place. The number of shader objects isn't the issue.
I don't think you've read the Mantle programming guide ...

Pipelines on Mantle are a lot more flexible in terms of link-time optimizations. Just like in D3D11, you create shader objects in Mantle but the difference is that you bind the shader objects for pipeline creation. In Mantle, identical shader objects can be reused to skip recompilations for multiple pipelines if compiled previously in a different pipeline ...

Mantle just like D3D11 or other APIs that feature separate shader objects all share the ability to do runtime shader linking ...
 
Mantle just like D3D11 or other APIs that feature separate shader objects all share the ability to do runtime shader linking ...
None of this is the issue, the issue are dynamic state changes which can't be predicted in earlier DX11 API model and are now statically baked into shaders in DX12 and VK.
 
None of this is the issue, the issue are dynamic state changes which can't be predicted in earlier DX11 API model and are now statically baked into shaders in DX12 and VK.
Mantle supports several dynamic states too. Make whatever claim you want but don't go around responding to others with
:misinfo:
for things you don't know about ...
 
Before we start telling people what they do and don't know, let's give some examples to demonstrate the situation.
 
Back
Top