Shader Compilation on PC: About to become a bigger bottleneck?

Mantle supports several dynamic states too. Make whatever claim you want but don't go around responding to others with
:misinfo:
for things you don't know about ...
I have no idea what that emoji is or represents. A swan in front of a clue cloud?? Can we please keep B3D classy, people.
 
I have no idea what that emoji is or represents.
Which is why I don't know how to answer to that statement.
I'll just say that the fact than Mantle supported some dynamic states doesn't say anything against what I'm saying.
The original Lurkmass statement was that the issues which modern AMD (and Intel? I'm not sure about these) GPUs have with non-PSO model have appeared at around GCN5/Vega, which just isn't true in case of AMD's h/w specifically.
 
Brothers Remake is using UE 5.3 (the first game that I know of outside of Fortnite) and guess what?.. There are hitches all over the place which look awfully a lot like shader compilation stutters.

The game also manages to hit ~50% of total CPU load on my 5900X doing... I'm not sure exactly what it's doing as it looks like a game from PS360 era really and doesn't use neither Lumen nor Nanite.
 
I dunno, I watched a bit of a playthrough on PC and it was clearly their first time playing as achievements were popping up, and I didn't see any stuttering really.

Also, I asked someone from the DirectX team about PSOs and comp stuttering, and basically about having something like Fossilize.. and while he was not addressing anything in particular, he told me not to take their silence as them ignoring the issue. They're working on a plan to address it for the future, but have nothing to share about how at this point in time.
 
Ultimately, both APIs have their advantages and disadvantages. If you choose DirectX 12, you have to live with shader compilation stutters, for example. Even though the game pre-compiles shaders for about a minute when you first start it, that's apparently not nearly enough. The problem is significantly less pronounced than in the demo and was greatly improved with the Day 1 patch, but remains present throughout the entire game.

DirectX 11, on the other hand, has a different problem. Outcast – A New Beginning runs incredibly slow under the API. Even with a very fast Ryzen 9 7950X3D, you will immediately reach a CPU limit in scenes that require a GPU, but in other sequences you will hardly be able to get out of there, even in Ultra HD. The differences are really extreme, as the benchmarks show. The latter were created in a GPU-demanding sequence, while others show significantly larger differences.
 
Recently been enjoying Evil West, great throwback AA tite, very silly and very fun.

...let down unfortunately by having no attempt at shader precompilation whatsoever - especially considering there's no RT in this game, really no excuse for it even with UE4's imperfect PSO gathering mechanism.

However, DXVK to the rescue - albeit at the cost of DLSS (boo - although does work under Linux). So choice between stutter-free with FSR2 and stomach this with DLSS:

Edit: Welp, never mind. I had forgotten that Nvidia separates out its Vulkan/GLcaches from D3D in different locations (%appdata%\locallow vs %appdata%\local\), so it wasn't an even playing field - I wasn't properly cleaning out the Vulkan shader cache before testing dxvk async like I was with native D3D. Once that was done, little difference, dxvk async can't save this title.

Edit2: They just updated the GLCache location in the latest drivers to be in sync with the D3Dcache, both in LocalLow now.

Nvidia driver notes said:
  • Shader cache locations have been updated for 545.37 or newer drivers - %USERPROFILE%\AppData\LocalLow\NVIDIA\PerDriverVersion\DXCache and %USERPROFILE%\AppData\LocalLow\NVIDIA\PerDriverVersion\GLCache
 
Last edited:
This is what both Nixxes and Guerrilla games have had to say about the PSO situation. In summary it's ugly and they don't like it one bit. Turns out developers hate PSOs on PCs as much as any PC player, as they are difficult to create, compile, collect and QA.

One thing I wanted to go back to is PSOs. Interestingly, there's a longer 'burn' at the beginning of the game - 30s on a big CPU and around a minute on smaller CPU. That's reasonable - but what is the PSO process like, how are they collected?

Michiel Roza - Nixxes
: It's similar to Zero Dawn; we let QA collect all of the VSOs, collect them at the end of the week and merge them into a big database. One extra challenge in this game was that this game uses compute shaders that are mostly unique by tile to generate placements. If these shaders aren't ready in time, they cause streaming issues. To alleviate that, we front-load those specific shaders and that's the shader compilation step you see.

Are you happy with that PSO collection process for Forbidden West on PC - could automation be used as well? And how do you feel about the whole PSO issue for Windows gaming in general?

Michiel Roza - Nixxes
: We do have an automated collection, which doesn't catch everything, but does make sure the game is in a playable state and QA won't run into a stutter fest every time we change the shaders... I think PSOs are not a great solution. It's the best we have, but I think something else can be done here. I really like the way Valve is handling it on the Steam Deck... and I wonder if that can be extended to PC as well.

Jeroen Krebbers - Guerrilla: Yeah, it would need a lot of collaboration between Microsoft, hardware manufaacturers and software developers to come to an understanding of how this should work. I've jokingly said like "why do we have GPUs that are so fast and yet we can't compile a shader on it?"

Patrick Den Bekker - Nixxes: Compared to consoles, it's actually a big burden on the development project. Whereas on the PS5 you can build them on a build machine and load them in, execute them and you're done, on PC we really need to rely on vendors and caches to get them compiled, and we spent a lot of trouble making sure they're compiled faster. At some point in the project we had one PSO that took 150 seconds... It's not only the collection, it's making sure they can execute fast enough.

Jeroen Krebbers - Guerrilla: As a developer, the PSO situation is very frustrating. We know that it doesn't have to be as bad as it currently is. And it is really bad. It takes a lot of time and effort, and then you compile the same shader on a million different PCs, right? So there's energy waste and things like that.

 
It actually didn't feel like Horizon Forbidden West managed to get around the stuttering issues at all, despite PSO collection.

Even in the intro cinematic scenes, there's plenty of assets just popping in, and the stutter in some of the level transitions is massive. Can't tell whether it's asset streaming or PSO compilation, but the game is certainly struggling to have both assets and PSOs ready when needed. Weirdly enough there is both popping assets AND half-second hangs as distinct issues.

It also appears that there's a couple of crash hot spots around those points where still un-cached states are hit - so that's definitely something to watch out for as a developer when you don't have a structured approach to collecting ALL required states as your QA coverage of the still remaining dynamic states worsened.

...

Did I mention that apparently both NVidia and AMD are also still not resilient to shader cache corruption in case of an application or driver crash?...
 
Michiel Roza - Nixxes : We do have an automated collection, which doesn't catch everything, but does make sure the game is in a playable state and QA won't run into a stutter fest every time we change the shaders... I think PSOs are not a great solution. It's the best we have, but I think something else can be done here. I really like the way Valve is handling it on the Steam Deck... and I wonder if that can be extended to PC as well.

Indeed as others have suggested here over the years, something like this where at least the shaders being gathered by the community during playthroughs and fed back for local compiling - at least in the interim to deal with current implementations - would be well, something. It's been made clear a different approach going forward in terms of how shaders are handled at an API level is needed no doubt, but I also want something to help with the existing problematic titles out there.

I'm wary of having anything like this tied to a particular store though, that's the problem right now if you're a Linux gamer - unless you get it from Steam, there's the chance that you essentially get a far worse version, dxvk-async can only help so much, plus it doesn't work with DX12 titles. Something like Borderlands 3, which downloads a 1GB shader file that when uncompressed/compiled comes in at 6 gigabytes (!), is required to get smooth gameplay. If you have it from another store, well then stutter away unless you use dxvk-async, which is far less consistent than an already compiled cache. So any download solution like this needs to be really be directed by MS or the GPU vendors imo.

Really appreciative of more developers raising the alarm about this though. Wish other PC youtubers would do the same! This is a 10 ton albatross around the neck of the PC gaming industry that consoles gamers don't have to deal with, we need like twice-yearly "shader stutter updates" on how this problem is being tackled.
 
It actually didn't feel like Horizon Forbidden West managed to get around the stuttering issues at all, despite PSO collection.

Even in the intro cinematic scenes, there's plenty of assets just popping in, and the stutter in some of the level transitions is massive. Can't tell whether it's asset streaming or PSO compilation, but the game is certainly struggling to have both assets and PSOs ready when needed. Weirdly enough there is both popping assets AND half-second hangs as distinct issues.

I didn't see pop-in, but I did get audio de-sync during the intro realtime cutscene.

Did I mention that apparently both NVidia and AMD are also still not resilient to shader cache corruption in case of an application or driver crash?...

Which I theorized may have been the cause of my constant re-compiling events, quitting the game with alt-F4 may have corrupted it, dunno. But yes, verifying/validating caches is yet another issue.
 
Nice to hear someone at Nixxes say the same thing I've just been saying. Maybe with enough bitching something could actually get done..

And that's just the thing.. look at Matias' response. He's absolutely right.. but notice "If you want companies to care..." and that's what pisses me off to no end. Let's be clear... PSOs work. Things CAN be a lot better than they are for gamers... with PSOs being as they are. We've known this isn't so much an insurmountable issue.. as it is a time/budget/QA issue.

Let's be realistic about this.. Gamers want a solution for PSOs so they can actually play and enjoy the game they bought without stuttering constantly. Developers want a solution to PSOs to improve their ability to work more efficiently, test, and iterate on their work quicker. Which is where the problem comes in. Dealing with PSOs is just tedious and time consuming enough... that it's often not done properly... or seemingly at all. And since it's more of a pain for developers, and a more time consuming process... it seems like devs either can't or just don't bother trying to convince the publishers that it IS important enough to spend the time to do properly.

That explains why publishers don't give a shit about stutters in games until a big stink is made where it causes bad press.... I get that fixing something like this isn't going to happen over night, but what the hell type of pressure needs to be put on companies to start addressing it? This is 2 generations now.. DX12 and Vulkan are not new anymore. What about all the money wasted during production having to deal with exploding numbers shader variants, and PSO compiling?

I've said it before and I'll say it again. You need a multi-faceted approach.. Something outside of the application *for players*! You cannot expect all developers to put the same care into this issue as others. It can be done... it simply has to become a priority. When people bring up something like "Fossilize" being a kind of solution... you'll never hear a developer fully agree with that... because it doesn't solve the issue for THEM as they are developing games. It doesn't help their situation during development in the same way.

We need both.
 
Last edited:
Some interesting developer discussion following Nixxes comments.


Is that solution something a developer can do that uses a third party engine though (legit question, I have no idea)?

We can always look to iDTech and Doom Eternal's "300 shaders" of exactly how a well-managed project where the engine and art dept is under control of experienced developers as the gold standard, and it is - but most developers don't have that ability as they're using an off the shelf engine these days. Any potential remedy has to account for that fact.
 
Is that solution something a developer can do that uses a third party engine though (legit question, I have no idea)?

We can always look to iDTech and Doom Eternal's "300 shaders" of exactly how a well-managed project where the engine and art dept is under control of experienced developers as the gold standard, and it is - but most developers don't have that ability as they're using an off the shelf engine these days. Any potential remedy has to account for that fact.
Like Andrew said on this very forum... good luck trying to put that cat back in the bag.. (regarding UE anyway)

It's not going to happen.
 
Thumbnail image of image collection No. 011 / [GDC 2024] Work Graph, a pipeline in which GPU draws spontaneously without using the CPU, has been officially adopted in DirectX 12


Light at the end of the tunnel ?

I don't know about the rest of you guys but using Work Graphs to sort draws by state sure beats compiling potentially redundant PSOs for every draw call!
 
Is that solution something a developer can do that uses a third party engine though (legit question, I have no idea)?

We can always look to iDTech and Doom Eternal's "300 shaders" of exactly how a well-managed project where the engine and art dept is under control of experienced developers as the gold standard, and it is - but most developers don't have that ability as they're using an off the shelf engine these days. Any potential remedy has to account for that fact.
Yeah, it's kind off a self made problem that general purpose engines have made. They basically made it possible for smaller teams to make bigger games. But that comes at a cost. The ratio of artist-to-engineer is way different than in studios with an inhouse engine. Games like Kena: Bridge of Spirits was for instance made by a team that used to focus on commercials. The number of game development teams basically exploded but there is a lack of talented people as it takes years to amass the knowledge that teams like Guerilla Games or Rockstar have. About a year ago I listened to a podcast with Rami Ismail where he said that studios have a hard time filling in the senior roles.

Like already pointed out in the twitter discussions posted above shaders graphs are one culprit. If you let each artist make his own character he will probably make his own materials instead of reusing other materials with different textures. Also less technical people don't always know when it's good to add a branch to an existing shader or when it's better to make a new shader permutation. I have seen some people make try to cut down on materials by making some really inefficient shaders. You need good tech-artists or engineers to oversee this and manage it.

Another problem that amplifies this is the asset market place. This is again very good for small teams. But if you buy 5 different packs with foliage and 5 different packs with rocks you get at least 10 different materials, maybe even more. Basically each pack you buy comes with his own set of materials and sometimes it's hard to combine materials from different packs into one.

And at last some of those small teams might lack the knowledge to make a good PSO gathering step. In UE5 this should be easier but in UE4 you had to roll your own system and only a handful developers actually managed that.
 
Back
Top