Shader Complilation on PC: About to become a bigger bottleneck?

Discussion in 'Rendering Technology and APIs' started by Flappy Pannus, Aug 6, 2020.

  1. Lurkmass

    Lurkmass Regular

    Sure if you're fine with never replaying an older game in the past but otherwise it doesn't prevent recompilations or frametime spikes in those cases ...

    Leaving unlimited space for these caches is still the best experience by far in terms of minimizing load times or frametime spikes due to recompilation even if it does bloat the storage requirements ...
     
    Remij likes this.
  2. DavidGraham

    DavidGraham Veteran

    While the cache is an excellent solution (I left it at unlimited by the way), it only mitigates the issue on second runs, fps stutters and frame time spikes will still occur during the first run or the first encounter of effects, these moments are still crucial and they interfere with the fluidity of gameplay.
     
    pharma, PSman1700 and Remij like this.
  3. DegustatoR

    DegustatoR Veteran

    There will be loads of changes over a long time periods which will make old cache unusable anyway - driver, game, OS updates. Basically the cache works if it's no more than a month old or so. Beyond that there will still be recompilation.

    Also I kinda wonder if we're even talking about the same thing here. Driver level cache takes care of DXIL to GPU ISA compilation, these stutters people are talking about is HLSL to DXIL compilation which happens in the API, not drivers. At least that's my understanding.
     
    PSman1700 and Remij like this.
  4. Remij

    Remij Regular

    Yeah. Still though... anything that reduces the amount of driver-side recompilations the better.
     
    PSman1700, pharma and Flappy Pannus like this.
  5. Flappy Pannus

    Flappy Pannus Regular

    This. Labelling the folders with their associated .exe would really help.
     
  6. Flappy Pannus

    Flappy Pannus Regular

    Horizon Zero Dawn's is 300+ mb (currently - it still generates caches during gameplay IME after the full optimization stage is run, and I've played it for 10 mins). Outer Worlds is 200+ MB and I've played it for maybe 20 minutes.

    Modern games just have a heck of a lot more shaders than old. As Remji stated, he's already at 2.6GB. Hence, Nvidia - from all indications, well past due - has allowed this option. You don't need to playing "1000 games" - just a handful of particular ones.
     
    PSman1700 and Remij like this.
  7. DegustatoR

    DegustatoR Veteran

    These 10 games I've ran are all very modern. Granted I haven't played much, just did some benchmarks on the new driver.
     
  8. Flappy Pannus

    Flappy Pannus Regular

    This is my current installed library, about half of these were played (a few such as DX:MD I installed last night to see what they do with regard to shaders and haven't launched them yet). It varies wildly per game, in general more modern games will need more, but not always (as mentioned in Remji's articles, one of the many ways Doom:Eternal is so optimized is also in the size of its shaders, which is probably why the average shader cache download for it over Steam is under a mb!).

    Outer Worlds and HZD are the biggest offenders, just those 2 are consuming almost 600mb currently.

    [​IMG]
     
  9. Flappy Pannus

    Flappy Pannus Regular

    Yeah, I don't think many are arguing (at least on here) this as a 'solution' to the problem this thread was created to discuss, it just helps alleviate one potential source in particular situations - still want to see that 'compile shaders before running' option for every game no doubt. I see some on Reddit/Nvidia forums believe this is a "solution" to stuttering unfortunately, even on second runs that would only apply if the game you're currently playing for some reason created 1+GB of shader caches, purging the previous on every load.

    Also of course, there are more reasons for 'stutter' outside of cache compiling. Outer Worlds still stutters like a mofo on my i5-9400, even with a large shader cache after revisiting the same spots. It's not as bad, but clearly that engine has other bottlenecks (wondering if a sidegrade to a 10600K largely for hyperthreading would help to alleviate more of these problems, dunno).

    We'll see as it goes, but two particular examples I can point to currently where this may have helped is in Dishonored 2 and D2: Death of the Outsider. These are two games I like to keep around and revisit, and even in between driver changes so often I would see that 2+ minute loading screen when I would boot them up due to needing a shader recompile. Hasn't happened since this driver upgrade and setting the cache to 10GB despite bouncing between games, chances are with the 1GB limit I could have seen it being compiled again.
     
  10. Remij

    Remij Regular

    Yep, exactly. This is a very welcomed addition, and definitely long overdue for Nvidia in particular as I'm not sure about the AMD side of things. I believe they've had a large limit for a while now. Is it user definable on Radeon cards? Anyway yeah, this isn't addressing the root of the problem, but the ability to change the cap has its benefits such as you stated.. Reducing recompilations which in turn can reduce *potential* stutters, but more assuredly will help reduce load times upon revisiting levels/areas in games. That's one thing I think many people don't realize. When certain games pre-compile shaders upon initial load, it can go a long way to reduce overall load times in games. Especially games where you're constantly travelling back and forth between levels or "maps" in MP games. A little upfront "caching" can improve the overall experience greatly.

    It does make me wonder if some stuff is happening behind the scenes though which may have pushed Nvidia to decide it was time to allow us to adjust our cache limit. With DirectStorage and OS filestack improvements coming you don't want driver-side shader recompiles happening all the time.

    It's been a while since the last DirectX Dev Blog update.
     
    Flappy Pannus and PSman1700 like this.
  11. Lurkmass

    Lurkmass Regular

    Sure the longevity of these caches come into question in face of these factors but that can be improved with better driver or application design. Not all driver updates have to immediately invalidate the cache, just the ones that will change the shader/pipeline compiler and old caches could still be made usable while the updated shader/pipeline compiler builds the new cache for higher performance. As for application updates, a lot of the times many shader/pipelines don't change at all and stay the same so we needn't invalidate all of the caches either in this case while only applying compilation for the new shader/pipelines ...

    As for OS updates, I'm not too sure but I imagine it's a possibility that changes to the graphics kernel driver might potentially invalidate the caches as well ...

    I don't think developers have any reason to ship the HLSL source and do runtime processing with the FXC/DXC compiler. You can do offline pre-processing with the FXC/DXC compiler so most games will automatically ship with DXBC or DXIL bytecode by default. When we look at D3D translation layers as a reference, they don't do shader translation with HLSL source and they translate from either DXBC or DXIL too. You can even feed/sign illegal custom DXBC/DXIL code to the drivers directly ever since AMD leaked DXBC checksum algorithm ...

    There's virtually no reason for end users to have to compile from the HLSL shader source since that can be purely done from the developer side without having to worry about driver updates since the shipped DXBC/DXIL code permanently stays the same. Sure a game could ship with a FXC/DXC compiler itself but whatever version those compilers shipped with the game, the bytecode codegen always remains the same regardless of the drivers for the sake of consistency so there's absolutely no reason developers to not consider offline pre-processing ...
     
    Remij likes this.
  12. DegustatoR

    DegustatoR Veteran

    AFAIU OS (API to be precise) compile shaders from HLSL (GLSL) representations into DXIL(DXBC/SPIR-V) which are then fed into IHVs drivers which compile them into respective GPU binaries.

    I'll give you three:
    1. Improvements in API compilers
    2. Future compatibility with possible new intermediate formats
    3. A much higher degree of portability in general

    Many games do ship with HL shaders and compile them on a target platform AFAIK.
    Hence why we have three levels of shader caches these days:
    A. Apps cache compiled shaders (can be found somewhere in game's data folder usually; either in system's Documents or %APPDATA%)
    B. APIs cache compiled shaders (not sure where these go but Windows Disk Cleanup tool have an option of deleting them)
    C. Drivers cache compiled shaders (which we're discussing above)
    And I'm not at all sure that all the issues you're discussing here are due to C.
    AFAIU again the most issues with hitching these days are due to improper PSO setup and that's something which must be done - and cached possibly - by the renderer, in A.

    Edit: Come to think of it, what do games which "compile shaders" (like Horizon) actually compile? They certainly don't compile their shaders into a GPU binary since this is down to the IHV driver. Do they compile/cache PSOs? Or compile DX bytecode from their HLSL sources?
     
    Last edited: Oct 19, 2021
    Remij, PSman1700 and BRiT like this.
  13. Lurkmass

    Lurkmass Regular

    I don't think the OS does shader source compilation. That's the job intended for tools like FXC/DXC compilers ...

    I think these benefits don't outweigh the drawback behind changing compilers which can potentially break shaders since compilation might see unexpected results between different compiler versions. Having to rework some of your shaders constantly because of changing your compiler is not sane from a development perspective ...

    Most of the shader/pipeline compilation time is down to the driver translating DX bytecode into GPU ISA. The reason there's a large variance between compilation times in games is mostly correlated to how many PSOs the game will contain. Doom with it's lean forward renderer and ubershaders only has a few dozen PSOs to compile while HZD on PC with it's fancy deferred renderer with it's shader graphs which usually causes a combinatorial explosion in shader permutations might have well over 10000 PSOs to compile which makes a profound difference in the compilation times ...
     
    Remij and BRiT like this.
  14. DegustatoR

    DegustatoR Veteran

    I consider graphics APIs a part of an OS really.

    Well, there are reasons regardless.

    I doubt that this is the case since driver wouldn't store such cached PSOs in an application data folder.
     
    PSman1700 likes this.
  15. Lurkmass

    Lurkmass Regular

    Yes but they heavily discourage runtime source compilation and that's what happens in practice most of the time. Changing compiler versions will introduce inconsistent bytecode generation which can result in unexpected application behaviour so developers will usually stick with the same source compiler version during a project ...

    These findings are consistent with the challenges behind creating D3D translation layers or drivers. Games usually don't ship the shader source (HLSL/GLSL) and they'll often ship with pre-compiled intermediate bytecodes like DXBC/DXIL/SPIR-V so translation layers or drivers only accept these formats. There's literally no advantages to keeping the shader source since drivers are codified into accepting intermediate bytecode so most developers will pre-process the shader source by default using tools like FXC/DXC/glslang (source compilers) to compile them into intermediate bytecodes for driver consumption where there will be further compilation happening in the driver's internal shader/pipeline compiler ...

    The crux of the issue behind long compilation times is mostly down to the high number of PSOs consumed by the driver's internal shader/pipeline compiler and not shader source compilation ...
     
    Malo, Jawed, Remij and 1 other person like this.
  16. DegustatoR

    DegustatoR Veteran

    Default size for Nv's DX shader cache is 4GBs:
     
  17. pharma

    pharma Veteran

    Previous branch was 1GB, so default goes up with new cache size control.
     
    Remij and PSman1700 like this.
  18. Ext3h

    Ext3h Regular

    Just out of curiosity, did anyone hear about hash collisions in NVidias shader cache? Or is it robust enough?

    Because for AMD, at least for OpenGL shaders, those were / are definitively a thing. Had to experience that several times during shader development, as the shader effectively executed was a pretty obvious mismatch to the shader submitted to the driver (interface block mismatch warnings). And then the same after shipping an update on the consumer side too. And only wiping the driver internal cache ever fixed that.
     
  19. Davros

    Davros Legend

    Blasphemy....
     
    Rootax, PSman1700, BRiT and 1 other person like this.
  20. Remij

    Remij Regular

    I wonder how many times I've attributed hitching/stuttering to shader compilation when it was perhaps something else, like texture decompression, or some other heavy CPU task?

    I also wonder if in the future DirectStorage and it's GPU based texture decompression will improve that situation for a lot of these stutters we see during streaming during gameplay?

    I really wish they'd give us an update on DirectStorage soon. It's out in developer preview right? I wonder what the first game to really utilize it will be? Also whether it will be something developers use to market their game? I'm assuming it will be some Microsoft first party game, perhaps the next Gears, or Hellblade 2?
     
Loading...

Share This Page

Loading...