Shader pipelines
As we knew with our OpenGL® engine, the compilation of shaders can take a long time on PC. During the production of the game, we generated a shader cache targeting the GPU model of our workstations. It was taking a whole night to generate a complete shader cache for
Detroit: Become Human! This shader cache was provided to everyone each morning. But it didn’t prevent the game from stuttering because the driver still needed to convert that code into native GPU shader assembly.
Vulkan® turned out to be much better than OpenGL® to tackle this issue.
Firstly, Vulkan® doesn’t directly use a high-level shading language such as HLSL, but a standard intermediate shader language called SPIR-V. SPIR-V makes shader compilation faster and easier to optimize for the driver shader compiler. In fact, it is similar in terms of performance to the OpenGL® shader cache system.
In Vulkan®, the shaders must be associated to form a VkPipeline . A VkPipeline can be made with a vertex and a pixel shader for instance. It also contains some render state information (depth tests, stencil, blending, and so on), and the render target’s formats. This information is important for the driver to ensure it has everything it needs to compile shaders in the most efficient way possible.
In OpenGL®, the shader compilation does not know the context of shader usage. The driver still needs to wait for a draw call to generate the GPU binary, and that’s why the first draw call with a new shader can take a long time to execute on the CPU.
With Vulkan®, VkPipeline provides the context of usage, so the driver has all the information needed to generate a GPU binary, and the first draw call has no overhead. We can also update a VkPipelineCache when creating a VkPipeline .
Initially, we tried to create the VkPipelines the first time we needed them. This caused stuttering much like the OpenGL® driver strategy. The VkPipelineCache is then up-to-date, and the stuttering will be gone for the next draw call.
Then we anticipated the creation of the VkPipelines during loading, but it was so slow when the VkPipelineCache was not up-to-date that our background loading strategy was compromised.
In the end, we decided to generate all the VkPipeline during the first launch of the game. This completely eradicated the stuttering issue, but we were now facing a new problem: the generation of the VkPipelineCache was taking a very long time.
Detroit: Become Human has around 99,500 VkPipelines ! The game is using a forward rendering approach, so material shaders contain all the lighting code. Consequently, each shader can take a long time to compile.
We found a few ideas to optimize this process:
- We optimized our data to be able to load only the SPIR-V intermediate binaries.
- We optimized our SPIR-V intermediate binaries with SPIR-V optimizer.
- We made sure that all CPU cores were spending 100% time on VkPipeline creation.
Finally, a big optimization was suggested by Jeff Bolz from NVIDIA and has been very effective in our case.
A lot of VkPipelines are very similar. For instance, some VkPipelines can share the same vertex and pixel shaders, differing only by some render states such as stencil parameters. In this case, the driver can consider internally that it is the same pipeline. But if we create them at the same time, one of the threads will just wait until the other one finishes the task. By nature, our process was sending all the similar VkPipelines at the same time. As a solution, we just re-sorted VkPipelines . The “clones” were put at the end, and their creation ended up much faster.
Performance of the VkPipelines creation is very variable. In particular it depends greatly on the number of hardware threads available. With an AMD Ryzen™ Threadripper™ with 64 hardware threads, it can take only two minutes. But on a low-end PC, it can unfortunately be more than 20 minutes.
The last case is still too long for us. Unfortunately, the only way to improve this time further is to decrease the number of shaders. It requires that we change the way we create materials to share them as much as possible. It was not feasible on
Detroit: Become Human because artists would have to rework all the materials. We plan to do proper material instancing in our next game, but it is too late for
Detroit: Become Human.