Digital Foundry Article Technical Discussion [2025]

Complex lighting ends up having inconsistent performance because it can worsen depending on quad utilization. Small triangles are the primary culprit, but you are guaranteed to have quads that do not have perfect quad utilization regardless. A deferred rendering is guaranteed to only execute the lighting once per pixel in a quad regardless of quad coverage/utilization, because it does not have to sample textures. The texture data that's relevant is written out to a gbuffer in a first pass, so mipmap selection is no longer relevant (and there's something about full-screen quads or compute shaders here that I don't fully know). That makes lighting perform better regardless of small triangles etc.
A full-screen compute shader dispatch could (depending hardware configuration) be faster than rendering a draw with a full-screen quad that consists of 2 triangles because no additional helper invocations are generated when compared to the interior diagonal edge of a quad ...

There's other advantages to running the lighting pass in compute shaders now commonly seen currently these days as opposed to pixel shaders. If you want to utilize other GPU hardware resources you can't use groupshared memory with the pixel shading pipeline since standardized APIs limit this capability to the compute shading pipeline but there are likely console extensions to bypass that specific limitation. Running the lighting pass in a compute pass also means that you can use async compute to interleave other types of tasks like drawing/rendering graphics work such as shadow map generation or copying resources around on a transfer queue with them concurrently ...
 
I watched the UE livestream where they talked about shader compilation stutter and the tools they have to mitigate it. I felt like it was a fairly honest stream, though I don't remember if they addressed any bugs with the tooling that was causing pain points for devs as @Dictator brought up in the DF discussion. The points in the stream about having distribution of pre-compiled shaders like Valve does with Steam Deck was pretty interesting.

Is there anything that really prevents devs from creating "ubershaders" with UE5, or is it the way the material/shader editor works that prevents it? I'm assuming artists may author shaders in external tools as well?
 
Is there anything that really prevents devs from creating "ubershaders" with UE5, or is it the way the material/shader editor works that prevents it? I'm assuming artists may author shaders in external tools as well?
Here's a relevant twitter thread from responses by Wihlidal:



Basically your options boil down to ALWAYS running a worst case (uber) shader with likely very high register pressure (results in poor HW occupancy) or wish away the "optimization passes" (inlining, dead code elimination, constant folding/propation, etc.) of IHV driver compilers in which the generated slop will exhibit slow execution times ...

In both cases you're pessimizing performance ...
 
Last edited:
I watched the UE livestream where they talked about shader compilation stutter and the tools they have to mitigate it. I felt like it was a fairly honest stream, though I don't remember if they addressed any bugs with the tooling that was causing pain points for devs as @Dictator brought up in the DF discussion. The points in the stream about having distribution of pre-compiled shaders like Valve does with Steam Deck was pretty interesting.

Is there anything that really prevents devs from creating "ubershaders" with UE5, or is it the way the material/shader editor works that prevents it? I'm assuming artists may author shaders in external tools as well?
Yes you can make ubershaders with the material editor. But there could be performance implications with ubershaders that an artist does not know about. I think that things like counting the number of registers used and so on are better left up to the engineers. And the shader graph might not be expressive enough to get optimal shader code for ubershaders. Most artists tend to use static switch nodes which gets you an explosion of number of shaders quickly. The best workflow is that some technical artists or engineers make some materials that are carefully curated and that artists only use those.

But I also have had artists just duplicate materials and change them a little for on-off things just out of convenience. Like for really lame things like when an texture-map is too bright/dark, instead editing the map with photoshop they duplicate the shader and add a bunch of math nodes. These are the things that get you uncontrollable shader stutter. Think of this: now you have one object with a custom shader sitting in a big level and that shader is only used for that object so you need to be lucky that during QA that objects gets loaded else you will not collect PSOs for it. The material system in UE is so user friendly that it is both a blessing and a curse now. It gets you really good looking scenes because there is far more materials variety. But sometimes you also have to ask if that is needed. Look for instance at KCD2. CryEngine does not a material graph so only engineers can make new materials. This limits the number of materials that exists by a lot and this also shows. For me KCD2 looks really last gen because I pick up on this, but the general public does not care and praise it for the good graphics.
 
or wish away the "optimization passes" (inlining, dead code elimination, constant folding/propation, etc.) of IHV driver compilers in which the generated slop will exhibit slow execution times ...
Can the API just give developers the choice between a proper compilation with all the optimization passes and a fast compilation without them? That way as many shaders as possible can be properly compiled ahead of time, but whenever a shader needs to be compiled in real-time for whatever reason the fast option can be chosen to minimize stutter and CPU cost (and the PSO added to a queue for proper compilation when there's enough CPU headroom to allow it).
 
Can the API just give developers the choice between a proper compilation with all the optimization passes and a fast compilation without them? That way as many shaders as possible can be properly compiled ahead of time, but whenever a shader needs to be compiled in real-time for whatever reason the fast option can be chosen to minimize stutter and CPU cost (and the PSO added to a queue for proper compilation when there's enough CPU headroom to allow it).
Well no IHV out there wants to effectively develop/QA test two independent compiler stacks for their own drivers!

Why else are they converging on notoriously slow common compiler infrastructure like Clang/LLVM for all these past years ? Why else is Microsoft interested in integrating SPIR-V with Direct3D ? It's because the industry is interested in sharing more work with each other in the open since it saves them resources even if it makes the consumer user facing (gamer) experience in question worse!

Also as a little bit of a joke, what's the market potential like for a specialized ASIC for faster optimized code generation so that way we can charge customers some more that want to get rid of perceived driver compilation job spikes ? (one standardized CPU to be able to do everything while another special CPU/ASIC design for faster optimal code analysis for anyone that dares)
 
Well no IHV out there wants to effectively develop/QA test two independent compiler stacks for their own drivers!

Why else are they converging on notoriously slow common compiler infrastructure like Clang/LLVM for all these past years ?
It was the industry-standard compilers that made me ask the question - they all have flags for picking an optimization level, with the lower levels resulting in faster compilation but less optimized output. I'm not sure how applicable the comparison is to IHV driver compilers, but I don't think it would require maintaining two entirely separate compilers. On a related note, can some of the optimization passes be moved from the DXIL/SPIR-V -> machine code step on the user's machine to the HLSL/GLSL -> DXIL/SPIR-V step on the developer's machine?
 
@Pjotr id software uses the ubershader approach in the Doom series. My takeaway from what you've written is that they must have engineers or technical artists that create a library of materials that the other artists may use when creating assets for the game. It's an entirely different workflow to ensure performance, but imposing some limitation on artists because there's a finite set of materials to work with, or maybe some material limitations to keep performance from collapsing.
 
It was the industry-standard compilers that made me ask the question - they all have flags for picking an optimization level, with the lower levels resulting in faster compilation but less optimized output. I'm not sure how applicable the comparison is to IHV driver compilers, but I don't think it would require maintaining two entirely separate compilers. On a related note, can some of the optimization passes be moved from the DXIL/SPIR-V -> machine code step on the user's machine to the HLSL/GLSL -> DXIL/SPIR-V step on the developer's machine?
How is a "hardware invariant" compiler optimization model supposed to work in your proposal ?

Attempting to move these optimization techniques towards a higher level representation just means that the IHV compiler now has to potentially work even harder to undo these 'supposed' optimizations ...

Take for instance bit manipulation instructions PDEP/PEXT where AMD architectures prior to Zen 3 had microcoded implementations so how is a compiler supposed to know when to generate these instructions or generate alternative binaries for optimal execution times without that specific knowlegde in hand ?
 
The reason that battlefield trailer made the DF guys have bf3 vibes is because I think some of the quick shots were from bf3 maps, i'm almost positive I saw grand bazaar (day) on the east side of the map with the pedestrian overpass. I got the feeling they might be going all in on that thing they did in the last BF (was it called BF portal?). Where you could select classic maps and apply whatever game rules you felt like.
 
Back
Top