Shader Compilation on PC: About to become a bigger bottleneck?

Old cache is unlikely to be in use since people don't play a 1000 of games all at the same time.

Sure if you're fine with never replaying an older game in the past but otherwise it doesn't prevent recompilations or frametime spikes in those cases ...

Leaving unlimited space for these caches is still the best experience by far in terms of minimizing load times or frametime spikes due to recompilation even if it does bloat the storage requirements ...
 
While the cache is an excellent solution (I left it at unlimited by the way), it only mitigates the issue on second runs, fps stutters and frame time spikes will still occur during the first run or the first encounter of effects, these moments are still crucial and they interfere with the fluidity of gameplay.
 
Sure if you're fine with never replaying an older game in the past but otherwise it doesn't prevent recompilations or frametime spikes in those cases ...

Leaving unlimited space for these caches is still the best experience by far in terms of minimizing load times or frametime spikes due to recompilation even if it does bloat the storage requirements ...
There will be loads of changes over a long time periods which will make old cache unusable anyway - driver, game, OS updates. Basically the cache works if it's no more than a month old or so. Beyond that there will still be recompilation.

Also I kinda wonder if we're even talking about the same thing here. Driver level cache takes care of DXIL to GPU ISA compilation, these stutters people are talking about is HLSL to DXIL compilation which happens in the API, not drivers. At least that's my understanding.
 
There will be loads of changes over a long time periods which will make old cache unusable anyway - driver, game, OS updates. Basically the cache works if it's no more than a month old or so. Beyond that there will still be recompilation.

Also I kinda wonder if we're even talking about the same thing here. Driver level cache takes care of DXIL to GPU ISA compilation, these stutters people are talking about is HLSL to DXIL compilation which happens in the API, not drivers. At least that's my understanding.
Yeah. Still though... anything that reduces the amount of driver-side recompilations the better.
 
I set mine to unlimited and after playing a few games I'm already at 2.6GB.

Now it would be nice if they were able to separate the shaders into different folders based on each game/application so that we could easily delete complied shaders for games which we no longer play.

IMO this should be an option right in the driver control panel.

This. Labelling the folders with their associated .exe would really help.
 
Old cache is unlikely to be in use since people don't play a 1000 of games all at the same time.
Horizon Zero Dawn's is 300+ mb (currently - it still generates caches during gameplay IME after the full optimization stage is run, and I've played it for 10 mins). Outer Worlds is 200+ MB and I've played it for maybe 20 minutes.

Modern games just have a heck of a lot more shaders than old. As Remji stated, he's already at 2.6GB. Hence, Nvidia - from all indications, well past due - has allowed this option. You don't need to playing "1000 games" - just a handful of particular ones.
 
How many games do you think you played in the last week ?
This is my current installed library, about half of these were played (a few such as DX:MD I installed last night to see what they do with regard to shaders and haven't launched them yet). It varies wildly per game, in general more modern games will need more, but not always (as mentioned in Remji's articles, one of the many ways Doom:Eternal is so optimized is also in the size of its shaders, which is probably why the average shader cache download for it over Steam is under a mb!).

Outer Worlds and HZD are the biggest offenders, just those 2 are consuming almost 600mb currently.

u4Yc2DE.jpg
 
While the cache is an excellent solution (I left it at unlimited by the way), it only mitigates the issue on second runs, fps stutters and frame time spikes will still occur during the first run or the first encounter of effects, these moments are still crucial and they interfere with the fluidity of gameplay.

Yeah, I don't think many are arguing (at least on here) this as a 'solution' to the problem this thread was created to discuss, it just helps alleviate one potential source in particular situations - still want to see that 'compile shaders before running' option for every game no doubt. I see some on Reddit/Nvidia forums believe this is a "solution" to stuttering unfortunately, even on second runs that would only apply if the game you're currently playing for some reason created 1+GB of shader caches, purging the previous on every load.

Also of course, there are more reasons for 'stutter' outside of cache compiling. Outer Worlds still stutters like a mofo on my i5-9400, even with a large shader cache after revisiting the same spots. It's not as bad, but clearly that engine has other bottlenecks (wondering if a sidegrade to a 10600K largely for hyperthreading would help to alleviate more of these problems, dunno).

We'll see as it goes, but two particular examples I can point to currently where this may have helped is in Dishonored 2 and D2: Death of the Outsider. These are two games I like to keep around and revisit, and even in between driver changes so often I would see that 2+ minute loading screen when I would boot them up due to needing a shader recompile. Hasn't happened since this driver upgrade and setting the cache to 10GB despite bouncing between games, chances are with the 1GB limit I could have seen it being compiled again.
 
Yeah, I don't think many are arguing (at least on here) this as a 'solution' to the problem this thread was created to discuss, it just helps alleviate one potential source in particular situations - still want to see that 'compile shaders before running' option for every game no doubt. I see some on Reddit/Nvidia forums believe this is a "solution" to stuttering unfortunately, even on second runs that would only apply if the game you're currently playing for some reason created 1+GB of shader caches, purging the previous on every load.

Also of course, there are more reasons for 'stutter' outside of cache compiling. Outer Worlds still stutters like a mofo on my i5-9400, even with a large shader cache after revisiting the same spots. It's not as bad, but clearly that engine has other bottlenecks (wondering if a sidegrade to a 10600K largely for hyperthreading would help to alleviate more of these problems, dunno).

We'll see as it goes, but two particular examples I can point to currently where this may have helped is in Dishonored 2 and D2: Death of the Outsider. These are two games I like to keep around and revisit, and even in between driver changes so often I would see that 2+ minute loading screen when I would boot them up due to needing a shader recompile. Hasn't happened since this driver upgrade and setting the cache to 10GB despite bouncing between games, chances are with the 1GB limit I could have seen it being compiled again.
Yep, exactly. This is a very welcomed addition, and definitely long overdue for Nvidia in particular as I'm not sure about the AMD side of things. I believe they've had a large limit for a while now. Is it user definable on Radeon cards? Anyway yeah, this isn't addressing the root of the problem, but the ability to change the cap has its benefits such as you stated.. Reducing recompilations which in turn can reduce *potential* stutters, but more assuredly will help reduce load times upon revisiting levels/areas in games. That's one thing I think many people don't realize. When certain games pre-compile shaders upon initial load, it can go a long way to reduce overall load times in games. Especially games where you're constantly travelling back and forth between levels or "maps" in MP games. A little upfront "caching" can improve the overall experience greatly.

It does make me wonder if some stuff is happening behind the scenes though which may have pushed Nvidia to decide it was time to allow us to adjust our cache limit. With DirectStorage and OS filestack improvements coming you don't want driver-side shader recompiles happening all the time.

It's been a while since the last DirectX Dev Blog update.
 
There will be loads of changes over a long time periods which will make old cache unusable anyway - driver, game, OS updates. Basically the cache works if it's no more than a month old or so. Beyond that there will still be recompilation.

Sure the longevity of these caches come into question in face of these factors but that can be improved with better driver or application design. Not all driver updates have to immediately invalidate the cache, just the ones that will change the shader/pipeline compiler and old caches could still be made usable while the updated shader/pipeline compiler builds the new cache for higher performance. As for application updates, a lot of the times many shader/pipelines don't change at all and stay the same so we needn't invalidate all of the caches either in this case while only applying compilation for the new shader/pipelines ...

As for OS updates, I'm not too sure but I imagine it's a possibility that changes to the graphics kernel driver might potentially invalidate the caches as well ...

Also I kinda wonder if we're even talking about the same thing here. Driver level cache takes care of DXIL to GPU ISA compilation, these stutters people are talking about is HLSL to DXIL compilation which happens in the API, not drivers. At least that's my understanding.

I don't think developers have any reason to ship the HLSL source and do runtime processing with the FXC/DXC compiler. You can do offline pre-processing with the FXC/DXC compiler so most games will automatically ship with DXBC or DXIL bytecode by default. When we look at D3D translation layers as a reference, they don't do shader translation with HLSL source and they translate from either DXBC or DXIL too. You can even feed/sign illegal custom DXBC/DXIL code to the drivers directly ever since AMD leaked DXBC checksum algorithm ...

There's virtually no reason for end users to have to compile from the HLSL shader source since that can be purely done from the developer side without having to worry about driver updates since the shipped DXBC/DXIL code permanently stays the same. Sure a game could ship with a FXC/DXC compiler itself but whatever version those compilers shipped with the game, the bytecode codegen always remains the same regardless of the drivers for the sake of consistency so there's absolutely no reason developers to not consider offline pre-processing ...
 
As for OS updates, I'm not too sure but I imagine it's a possibility that changes to the graphics kernel driver might potentially invalidate the caches as well ...
AFAIU OS (API to be precise) compile shaders from HLSL (GLSL) representations into DXIL(DXBC/SPIR-V) which are then fed into IHVs drivers which compile them into respective GPU binaries.

I don't think developers have any reason to ship the HLSL source and do runtime processing with the FXC/DXC compiler. You can do offline pre-processing with the FXC/DXC compiler so most games will automatically ship with DXBC or DXIL bytecode by default.
I'll give you three:
1. Improvements in API compilers
2. Future compatibility with possible new intermediate formats
3. A much higher degree of portability in general

Sure a game could ship with a FXC/DXC compiler itself but whatever version those compilers shipped with the game, the bytecode codegen always remains the same regardless of the drivers for the sake of consistency so there's absolutely no reason developers to not consider offline pre-processing ...
Many games do ship with HL shaders and compile them on a target platform AFAIK.
Hence why we have three levels of shader caches these days:
A. Apps cache compiled shaders (can be found somewhere in game's data folder usually; either in system's Documents or %APPDATA%)
B. APIs cache compiled shaders (not sure where these go but Windows Disk Cleanup tool have an option of deleting them)
C. Drivers cache compiled shaders (which we're discussing above)
And I'm not at all sure that all the issues you're discussing here are due to C.
AFAIU again the most issues with hitching these days are due to improper PSO setup and that's something which must be done - and cached possibly - by the renderer, in A.

Edit: Come to think of it, what do games which "compile shaders" (like Horizon) actually compile? They certainly don't compile their shaders into a GPU binary since this is down to the IHV driver. Do they compile/cache PSOs? Or compile DX bytecode from their HLSL sources?
 
Last edited:
AFAIU OS (API to be precise) compile shaders from HLSL (GLSL) representations into DXIL(DXBC/SPIR-V) which are then fed into IHVs drivers which compile them into respective GPU binaries.

I don't think the OS does shader source compilation. That's the job intended for tools like FXC/DXC compilers ...

I'll give you three:
1. Improvements in API compilers
2. Future compatibility with possible new intermediate formats
3. A much higher degree of portability in general

I think these benefits don't outweigh the drawback behind changing compilers which can potentially break shaders since compilation might see unexpected results between different compiler versions. Having to rework some of your shaders constantly because of changing your compiler is not sane from a development perspective ...

Edit: Come to think of it, what do games which "compile shaders" (like Horizon) actually compile? They certainly don't compile their shaders into a GPU binary since this is down to the IHV driver. Do they compile/cache PSOs? Or compile DX bytecode from their HLSL sources?

Most of the shader/pipeline compilation time is down to the driver translating DX bytecode into GPU ISA. The reason there's a large variance between compilation times in games is mostly correlated to how many PSOs the game will contain. Doom with it's lean forward renderer and ubershaders only has a few dozen PSOs to compile while HZD on PC with it's fancy deferred renderer with it's shader graphs which usually causes a combinatorial explosion in shader permutations might have well over 10000 PSOs to compile which makes a profound difference in the compilation times ...
 
I don't think the OS does shader source compilation. That's the job intended for tools like FXC/DXC compilers ...
I consider graphics APIs a part of an OS really.

I think these benefits don't outweigh the drawback behind changing compilers which can potentially break shaders since compilation might see unexpected results between different compiler versions. Having to rework some of your shaders constantly because of changing your compiler is not sane from a development perspective ...
Well, there are reasons regardless.

Most of the shader/pipeline compilation time is down to the driver translating DX bytecode into GPU ISA.
I doubt that this is the case since driver wouldn't store such cached PSOs in an application data folder.
 

Yes but they heavily discourage runtime source compilation and that's what happens in practice most of the time. Changing compiler versions will introduce inconsistent bytecode generation which can result in unexpected application behaviour so developers will usually stick with the same source compiler version during a project ...

I doubt that this is the case since driver wouldn't store such cached PSOs in an application data folder.

These findings are consistent with the challenges behind creating D3D translation layers or drivers. Games usually don't ship the shader source (HLSL/GLSL) and they'll often ship with pre-compiled intermediate bytecodes like DXBC/DXIL/SPIR-V so translation layers or drivers only accept these formats. There's literally no advantages to keeping the shader source since drivers are codified into accepting intermediate bytecode so most developers will pre-process the shader source by default using tools like FXC/DXC/glslang (source compilers) to compile them into intermediate bytecodes for driver consumption where there will be further compilation happening in the driver's internal shader/pipeline compiler ...

The crux of the issue behind long compilation times is mostly down to the high number of PSOs consumed by the driver's internal shader/pipeline compiler and not shader source compilation ...
 
Default size for Nv's DX shader cache is 4GBs:
https://forums.guru3d.com/posts/5957880/ said:
Default is 4GB which is a pool for all games. This means that if a new game takes say 1GB and you've hit the 4GB limit, the oldest shader cache files will get deleted. If you re-run that old game, a new shader cache will need to be re-created leading to slower boot times and possibly lower performance until the shader cache creation is completed. So increasing the amount may help if you run a lot of games on your PC.
 
Just out of curiosity, did anyone hear about hash collisions in NVidias shader cache? Or is it robust enough?

Because for AMD, at least for OpenGL shaders, those were / are definitively a thing. Had to experience that several times during shader development, as the shader effectively executed was a pretty obvious mismatch to the shader submitted to the driver (interface block mismatch warnings). And then the same after shipping an update on the consumer side too. And only wiping the driver internal cache ever fixed that.
 
I wonder how many times I've attributed hitching/stuttering to shader compilation when it was perhaps something else, like texture decompression, or some other heavy CPU task?

I also wonder if in the future DirectStorage and it's GPU based texture decompression will improve that situation for a lot of these stutters we see during streaming during gameplay?

I really wish they'd give us an update on DirectStorage soon. It's out in developer preview right? I wonder what the first game to really utilize it will be? Also whether it will be something developers use to market their game? I'm assuming it will be some Microsoft first party game, perhaps the next Gears, or Hellblade 2?
 
Back
Top