Charlietus
Regular
I would take a performance hit on direct x13 if it meant that stutters were reduced. Who cares about max performance honestly.
depends on the hit but yeah if its like 5% who caresI would take a performance hit on direct x13 if it meant that stutters were reduced. Who cares about max performance honestly.
A full-screen compute shader dispatch could (depending hardware configuration) be faster than rendering a draw with a full-screen quad that consists of 2 triangles because no additional helper invocations are generated when compared to the interior diagonal edge of a quad ...Complex lighting ends up having inconsistent performance because it can worsen depending on quad utilization. Small triangles are the primary culprit, but you are guaranteed to have quads that do not have perfect quad utilization regardless. A deferred rendering is guaranteed to only execute the lighting once per pixel in a quad regardless of quad coverage/utilization, because it does not have to sample textures. The texture data that's relevant is written out to a gbuffer in a first pass, so mipmap selection is no longer relevant (and there's something about full-screen quads or compute shaders here that I don't fully know). That makes lighting perform better regardless of small triangles etc.
I would say closer upto 90% depending if your goal is to pressure every developer into opting ubershaders or getting IHVs to pivot removing their shader compilers as much as possible ...depends on the hit but yeah if its like 5% who cares
Here's a relevant twitter thread from responses by Wihlidal:Is there anything that really prevents devs from creating "ubershaders" with UE5, or is it the way the material/shader editor works that prevents it? I'm assuming artists may author shaders in external tools as well?
Yes you can make ubershaders with the material editor. But there could be performance implications with ubershaders that an artist does not know about. I think that things like counting the number of registers used and so on are better left up to the engineers. And the shader graph might not be expressive enough to get optimal shader code for ubershaders. Most artists tend to use static switch nodes which gets you an explosion of number of shaders quickly. The best workflow is that some technical artists or engineers make some materials that are carefully curated and that artists only use those.I watched the UE livestream where they talked about shader compilation stutter and the tools they have to mitigate it. I felt like it was a fairly honest stream, though I don't remember if they addressed any bugs with the tooling that was causing pain points for devs as @Dictator brought up in the DF discussion. The points in the stream about having distribution of pre-compiled shaders like Valve does with Steam Deck was pretty interesting.
Is there anything that really prevents devs from creating "ubershaders" with UE5, or is it the way the material/shader editor works that prevents it? I'm assuming artists may author shaders in external tools as well?
Can the API just give developers the choice between a proper compilation with all the optimization passes and a fast compilation without them? That way as many shaders as possible can be properly compiled ahead of time, but whenever a shader needs to be compiled in real-time for whatever reason the fast option can be chosen to minimize stutter and CPU cost (and the PSO added to a queue for proper compilation when there's enough CPU headroom to allow it).or wish away the "optimization passes" (inlining, dead code elimination, constant folding/propation, etc.) of IHV driver compilers in which the generated slop will exhibit slow execution times ...
Well no IHV out there wants to effectively develop/QA test two independent compiler stacks for their own drivers!Can the API just give developers the choice between a proper compilation with all the optimization passes and a fast compilation without them? That way as many shaders as possible can be properly compiled ahead of time, but whenever a shader needs to be compiled in real-time for whatever reason the fast option can be chosen to minimize stutter and CPU cost (and the PSO added to a queue for proper compilation when there's enough CPU headroom to allow it).
It was the industry-standard compilers that made me ask the question - they all have flags for picking an optimization level, with the lower levels resulting in faster compilation but less optimized output. I'm not sure how applicable the comparison is to IHV driver compilers, but I don't think it would require maintaining two entirely separate compilers. On a related note, can some of the optimization passes be moved from the DXIL/SPIR-V -> machine code step on the user's machine to the HLSL/GLSL -> DXIL/SPIR-V step on the developer's machine?Well no IHV out there wants to effectively develop/QA test two independent compiler stacks for their own drivers!
Why else are they converging on notoriously slow common compiler infrastructure like Clang/LLVM for all these past years ?
How is a "hardware invariant" compiler optimization model supposed to work in your proposal ?It was the industry-standard compilers that made me ask the question - they all have flags for picking an optimization level, with the lower levels resulting in faster compilation but less optimized output. I'm not sure how applicable the comparison is to IHV driver compilers, but I don't think it would require maintaining two entirely separate compilers. On a related note, can some of the optimization passes be moved from the DXIL/SPIR-V -> machine code step on the user's machine to the HLSL/GLSL -> DXIL/SPIR-V step on the developer's machine?