Shader Compilation on PC: About to become a bigger bottleneck?

You can build a PSO cache in Unreal Engine but even Epic Games doesn't think it's realistic to achieve 100% coverage in many instances and there are other limitations too ...


This feature doesn't work with a popular VFX plugin or PSOs with ray tracing so no matter how hard many developers try to follow best practices including making super levels involving all assets, PC is forever a cursed platform in a lot of cases. The only way to permanently fix the problem for good is if Microsoft decides to make Xbox D3D available on PC so that that at least on one vendor with their latest architecture can reuse the same precompiled binaries on consoles. That won't ever happen since it will change the political balance between Microsoft and the HW vendors since it'll cause grief towards other HW vendors to see that one of them will have a consistent performance advantage. Having an "identical experience" between PC and consoles isn't technologically possible ...
The Ascent definitely doesn't precompile all their shaders.. and lmfao.. you need to stop acting like this is impossible...

Gears 5 is smooth as hell and uses a form of RT..

Those guys are small indie devs with very limited budget.. I know that it doesn't mean they don't know what they are talking about... but their game specifically performed worse than most UE games...
 
So is it not possible to pre-compile shaders for a game at load time in UE4 even if the developer wants to? i.e. UE4 forces this to be done in real time during gameplay?

And does anyone know how UE5 behaves in this regard?

The behavior remains unchanged ...

The Ascent definitely doesn't precompile all their shaders.. and lmfao.. you need to stop acting like this is impossible...

Gears 5 is smooth as hell and uses a form of RT..

Those guys are small indie devs with very limited budget.. I know that it doesn't mean they don't know what they are talking about... but their game specifically performed worse than most UE games...

Does Gears 5 use hardware ray tracing ? PSO caching specifically does not work with the hardware ray tracing pipeline ...

Their example just demonstrates that even if you do use a solution provided by Epic Games, there are still limitations to being able to pre-load all of the PSOs at startup in their case. 100% coverage of PSOs is the exception, not the norm as stated by Epic Games ...
 
The behavior remains unchanged ...



Does Gears 5 use hardware ray tracing ? PSO caching specifically does not work with the hardware ray tracing pipeline ...

Their example just demonstrates that even if you do use a solution provided by Epic Games, there are still limitations to being able to pre-load all of the PSOs at startup in their case. 100% coverage of PSOs is the exception, not the norm as stated by Epic Games ...
What's your excuse going to be when The Coalition releases Gears 6 and its fine with hardware based RT/Lumen?

Basically... I don't want to hear more excuses.. either Epic has to change something in their engine, or the game developers have to come up with their own solution.... or Epic needs to release a public statement telling PC gamers this is the best they can expect from Unreal Engine and they have no intention of fixing it...so people aren't out there spending thousands on hardware attempting to "fix" this problem which has nothing to do with their hardware..

Allowing games to release like this is simply negligence at this point..
 
What's your excuse going to be when The Coalition releases Gears 6 and its fine with hardware based RT/Lumen?

Exceptions don't change the trend ...

Basically... I don't want to hear more excuses.. either Epic has to change something in their engine, or the game developers have to come up with their own solution.... or Epic needs to release a public statement telling PC gamers this is the best they can expect from Unreal Engine and they have no intention of fixing it...so people aren't out there spending thousands on hardware attempting to "fix" this problem which has nothing to do with their hardware..

Allowing games to release like this is simply negligence at this point..

@Bold They already did this in their documentation ?! Basically, don't expect 100% PSO coverage ...
 
Little side question here, but do we know why Square didn't use their own Luminous engine for FF7 Remake ? It was very impressive on PC in the end in FFXV imo.
 
Exceptions don't change the trend ...

@Bold They already did this in their documentation ?! Basically, don't expect 100% PSO coverage ...
What the exceptions do.. is prove that it's possible... so I don't want to hear Epic's excuses..

Besides.. you know that 99% isn't 100% right? Nobody is asking for a perfect game which doesn't ever dip a single frame ever...

And no.. something being stated in their documentation is not the same as an announcement to the public about the current state of their engine on PC. People are going out and spending REAL money on new hardware in an attempt to fix these issues... and if Epic and Developers KNOW that they aren't going to fix it... then they are knowingly screwing over their customers, and absolutely need to be called out on this. Developers LITERALLY solve issues every single day and work around things that documentation says "isn't possible"... Imagine where this industry would be if developers just simply accepted things as they were? Man, what a depressing world that would be.

I don't care who has to do the work to solve this issue... but it has to be done. If I can make a shader cache and the 2nd time through it's perfectly smooth... then these guys can figure out a way to make that happen the first time I play something. I've already stated I'm willing to wait for a process to "warm up" the game and create a cache..

This is NOT expected behavior from games and is NOT something PC gamers should EVER accept. Other developer and other engines are largely solving this... if Epic can't figure it out, and developers using their engine knowingly release products which have these issues... then we have a serious problem.
 
It is now possible to fix the stuttering issues in FF by making the game run in DX 11 mode and adding DXVK async to the game, a Vulkan wrapper that bypasses the shader compilation of the game, allowing for asynchronous shaders to work.

STEP 1:
1) Make your game run in DX11 mode, otherwise this won't work. There are two options to do this:
a) add "-dx11" in your epic store launch options for the game
b) open epic_emu.ini from Final Fantasy VII Remake -> Engines -> Binaries -> ThirdParty -> EOS and add " -dx11" after AppName=FFVIIRemakeIntergrade -> so it looks like AppName=FFVIIRemakeIntergrade -dx11, save the .ini file.

STEP 2:
How to add DXVK async to your game ( DXVK is a DX11->Vulkan wrapper that bypasses the shader compilation of the game and allows asynchronous, stutter-free shaders to work)
1) Go to : DXVK async github
2) Download the dxvk-async-1.9.2.tar.gz file
3) You need to copy the files d3d11.dll and dxgi.dll from x64 folder into Final Fantasy VII Remake -> End -> Binaries -> Win64
4) Run the game!

https://wccftech.com/final-fantasy-...issues-can-be-fixed-with-a-simple-workaround/
 
Don't have FF7 Remake yet but something to look into regarding the stutters - https://www.nexusmods.com/finalfantasy7remake/mods/66?tab=description

Some other random thoughts on some of the comments regarding this -

I think Square's own Luminous engine was considered hard to work with (and therefore more labor intensive) especially for the design side. They'd already shifted to UE4 for other projects, such as Kingdom Hearts 3.

My understanding is the PC version is based on the PS5 version. The PS5 version supposedly was updated with specifics to take advantage of the higher memory of the PS5 and the SSD/storage framework. This has some issues. Memory, and specifically VRAM, as well as data movement (due to the non unified nature) on the PC side can be quite lacking versus the PS5 (or XSX) at this point still even if the other performance metrics are higher. The analogous storage framework is also not yet available.

Compounding the above is the issue of differing system configurations, likely scalability limitations of any solution, and relative lack of sales (revenue generation) might result in. Perhaps the solution that works best to fit GPUs with "low" VRAM, "low" system memory, PCIe 3.0, no NVMe storage, lower thread count/perf CPUs, etc. will also have to result in a compromise that doesn't fully leverage a top of the line system either (eg. 12900k, RTX 3090, 32GB+ System ram, PCIe 4.0, NVMe storage, etc.).

The above has also always been an interesting issue in terms of optics on the PC side as well. As poster mentioned above people expect more out of their "$3000" systems, however that money (extra) all goes to the hardware vendors. The software vendor makes the same as the person with the <$1000 system. From an aggregate sales stand point the software vendor actually makes more as proportionally less people have those $3000 systems.

Circling back to the "solution" I posted earlier judging by some of the comments posted it seems like the work around has a heavy VRAM requirement (even exceeding the 10GB of a RTX 3080). Are some of the decisions being taken, especially now as we move towards heavier asset "next gen" (or current gen), games a compromise due to the limitations of the PC market base?

My other understanding with the lower level API's being used, especially with respect to DX12, is that memory management is almost entirely on the onus of the software developer now away from the hardware vendor (driver side). Are the developers being essentially conservative with this to avoid more catastrophic hard faults? What I'm actually interested in is at some point it might be worth visiting, or revisiting, the issue of the applicability of lower level APIs for the PC space.
 
It is now possible to fix the stuttering issues in FF by making the game run in DX 11 mode and adding DXVK async to the game, a Vulkan wrapper that bypasses the shader compilation of the game, allowing for asynchronous shaders to work.

STEP 1:
1) Make your game run in DX11 mode, otherwise this won't work. There are two options to do this:
a) add "-dx11" in your epic store launch options for the game
b) open epic_emu.ini from Final Fantasy VII Remake -> Engines -> Binaries -> ThirdParty -> EOS and add " -dx11" after AppName=FFVIIRemakeIntergrade -> so it looks like AppName=FFVIIRemakeIntergrade -dx11, save the .ini file.

STEP 2:
How to add DXVK async to your game ( DXVK is a DX11->Vulkan wrapper that bypasses the shader compilation of the game and allows asynchronous, stutter-free shaders to work)
1) Go to : DXVK async github
2) Download the dxvk-async-1.9.2.tar.gz file
3) You need to copy the files d3d11.dll and dxgi.dll from x64 folder into Final Fantasy VII Remake -> End -> Binaries -> Win64
4) Run the game!

https://wccftech.com/final-fantasy-...issues-can-be-fixed-with-a-simple-workaround/
^it's nice people are trying to make it work... but it's just laughable.. "simple workaround"...

Besides not doing anything to fix the REAL issues.... imagine having to do all that junk to make your $70 game work properly in the first place.... and it STILL doesn't work as it should..

Perhaps if I reconfigure my flux capacitor, overclock my quantum overdrive combobulator, and then tinker with the wifi antenna so I can download some more RAM, it might straighten out that frametime a bit so that it's bearable......


Trash. No more sugar coating it. This is on SquareEnix to fix in the first place... BEFORE the game ever launches...
 
DF found out that running the game with DX11 API is enough to eliminate almost all stutters.


That's it, FUCK DX12. The game stutters even at 30fps in DX12! Why bother compile shaders for maxium performance when the game is running only at 30fps? Why invoke shader compilation at all? Some freaking games invoke it just by the act of changing graphics settings.

I will repeat my questions again .. I understand that compilation is a form of real time optimization (JIT or Just In Time) to convert some generic code into machine code that is specifically tuned to every GPU architecture, so in essence it's trying to eeke out every single GPU power from the hardware, to increase fps.

But then GPU architectures are not that many, for a typical DX12 game (DX12 titles suffer the most from this issue), you only have these architectures to support: GCN, Polaris, Vega, RDNA1 and RDNA2 from AMD. NVIDIA has Kepler, Maxwell, Pascal, Turing and Ampere. So really not that many! What's stopping developers from precompiling the shaders for these ten architectures, while leaving the real time compiling exclusively for future archs only? Or do we really need to compile specifically for every individual GPU in those families?

And while we are at it, Why not give the user the option of running the generic code for stutter free experience and slightly reduced fps?
 
It's risky to force async compilation through a translation layer since it can trigger the integrated anti-cheat kernel drivers to potentially ban you from the game ...

Async compilation can also cause visual corruption issues because of the nature in how it operates. Async compilation will allow the application to temporarily "skip" drawing commands but the side effect is that content will be rendered incorrectly until compilation is done. This will reduce the frame time spikes caused by compilation but the trade-off is the temporary introduction of more rendering artifacts. Deferring drawing commands isn't safe either since that could crash the application because applications might depend on GPU side logic in their design ...
 
Pretty hilarious that simply running the game in DX11 mode can solve virtually all the issues. It'd be interesting to see the performance impact of that vs DX12 in terms of frame rate.

It seems like a bad situation as we clearly need DX12 (Ultimate) for its advanced features, but it appears to be too easy for inexperienced teams to screw up.
 
It seems like a bad situation as we clearly need DX12 (Ultimate) for its advanced features, but it appears to be too easy for inexperienced teams to screw up.

On a technical level is a DX12 style API required for all the new features in DX12 Ultimate? My understanding is that DX11 was at least iterating in step with DX12 in terms of features up until DX12.1 with DX11.3 (well D3D 11.3).

Of course on a practical level it's not likely to change in this respect. However I do still have the question since the move to "low level" APIs on the PC space with respect to the actual suitability (and impact) in terms of the relative shift of the onus of optimization away from hardware vendors to software vendors.
 
I will repeat my questions again .. I understand that compilation is a form of real time optimization (JIT or Just In Time) to convert some generic code into machine code that is specifically tuned to every GPU architecture, so in essence it's trying to eeke out every single GPU power from the hardware, to increase fps.

But then GPU architectures are not that many, for a typical DX12 game (DX12 titles suffer the most from this issue), you only have these architectures to support: GCN, Polaris, Vega, RDNA1 and RDNA2 from AMD. NVIDIA has Kepler, Maxwell, Pascal, Turing and Ampere. So really not that many! What's stopping developers from precompiling the shaders for these ten architectures, while leaving the real time compiling exclusively for future archs only? Or do we really need to compile specifically for every individual GPU in those families?

Vendors can make any sudden changes to a compiler in a driver update which can invalidate the precompiled binaries. If vendors feel like they can gain 0.1% more performance they will try to push out an update ASAP just to win in benchmarks against their competitors even if it means more compilation for the end user ...
 
Am I missing something but afaik you dont run binaries through a compiler

The shaders are more of an intermediate format. Somewhere in these forums, maybe even this thread, are several extremely detailed posts that includes everything you need to know to understand the process.
 
But then GPU architectures are not that many, for a typical DX12 game (DX12 titles suffer the most from this issue), you only have these architectures to support: GCN, Polaris, Vega, RDNA1 and RDNA2 from AMD. NVIDIA has Kepler, Maxwell, Pascal, Turing and Ampere. So really not that many! What's stopping developers from precompiling the shaders for these ten architectures, while leaving the real time compiling exclusively for future archs only? Or do we really need to compile specifically for every individual GPU in those families?
Well the fact that these games has to run on future GPUs with unknown architectures do, at the very least.
Unless you would be fine with loosing what is basically the biggest advantage of PC for gaming - a wide backwards and forwards s/w compatibility.
And shaders compile for a sum of GPU model (not architecture really), OS version and GPU driver version - all of these can have changes at any moment which would make precompiled shaders perform badly at best and straight up fail to work at worst.

And while we are at it, Why not give the user the option of running the generic code for stutter free experience and slightly reduced fps?
There is no "generic code". Shaders in their HLSL or API intermediate format cannot be executed by the h/w. It's a bit like saying "why not just let our CPUs run this C++ code instead of compiling it into a binary".

The bigger question here is why does using DX11 solve this while most issues seem to stem from using DX12? Is there some inherent flaw in DX12 API design which leads to this result? Can it be fixed without loosing typical advantages that DX12 provide?

On a technical level is a DX12 style API required for all the new features in DX12 Ultimate?
Yes. Make no mistake, even though it will take years (decades possibly), "old" APIs are on their way out. No one in the industry want to support two different APIs, MS including. Thus the push to add new features to DX12 exclusively is an expected result, and things like D3D12on7 are an attempt at making the API singular instead of supporting a bunch of them.

Vendors can make any sudden changes to a compiler in a driver update which can invalidate the precompiled binaries. If vendors feel like they can gain 0.1% more performance they will try to push out an update ASAP just to win in benchmarks against their competitors even if it means more compilation for the end user ...
Right, but let's not forget that these "0.1%" can also be "10%" or "20%" in many cases. I doubt that many users - or IHVs for that matter - would say no to that just because some games can't implement shader compilation properly.

Here's another question on FF7R situation specifically btw: if these stutters are indeed due to shader compilation why aren't they solved by shader caches? Or are they?
 
Yes. Make no mistake, even though it will take years (decades possibly), "old" APIs are on their way out. No one in the industry want to support two different APIs, MS including. Thus the push to add new features to DX12 exclusively is an expected result, and things like D3D12on7 are an attempt at making the API singular instead of supporting a bunch of them.

That's not what I'm referring. I understand on a practical level why they will almost certainly make the new features restricted to DX12.

What I'm asking is if there is an inherent technical reason that a "low level" API relative to the DX11 was needed for the new features. As in hypothetically if they did not choose to go in the direction of DX12.

This is what I'm also referring to with respect to the possible pitfalls of the direction of API's with respect to the PC space. This is to some extent essentially revisiting an old point of contention, but I've always had some reservations about what the impact would be in terms of shifting more of the optimization burden away from the hardware vendor to the software vendor in the PC space.
 
What I'm asking is if there is an inherent technical reason that a "low level" API relative to the DX11 was needed for the new features. As in hypothetically if they did not choose to go in the direction of DX12.
Anything which can be done in one API can be implemented in another so the question is a bit pointless.
It's impossible to say if D3D12U features would work well if they'd be implemented in a D3D11 update since we don't know how this would've been done.
Generally speaking though all new graphical features of DX12 are closely resembling something which could've been done through GPU compute (whether DX based or CUDA/OpenCL) and compute is handled much better in D3D12 API, if only for the ability to run it asynchronously.
 
Specifically blaming shader compilation on an on-going basis during gameplay, for stuttering, seems problematic.

The videos I linked on Monday show almost zero stuttering over a 6 minute timeframe, on both 6900XT and 3090. The stutters appear to be asset-related and triggered by location and are over pretty quickly.

Stutters when arriving at a game location for the first time ever may also not be due to compilation.

The lack of a deep dive into the mechanics of the problem in this game by tech sites does surprise me.
 
Back
Top