How is this making it worse? Separating a different workload type into its own shader type likely makes compilation less complex i.e. faster.
At the end of the day, a new pipeline is going to introduce more potential for combinatorial explosion ...
With inline RT, you don't have to create anymore new pipelines than the ones that already exist ...
It's completely orthogonal. The model itself can be flawed but that doesn't mean that there can't be better or worse practices inside the model itself.
Considering the berating that the new gfx API designs received from the very same people who were proponents of RT PSOs, I imagine a confession that they themselves to be erroneous is in order ...
Why even be apologetic to the idea to those that considered it to be the anathema in the first place ?
How's so? Does inlining make compute shaders simpler?
No, inlining just puts more restrictions on how you can do some complex materials (i.e. you can not), but it's a common practice to reduce the number of hit shaders (materials) with DXR 1.0 to a bare minimum too, so is there any evidence that inlined compute ubershaders would do any better with compilation?
inlining -> less pipelines to create -> less pipelines to compile
Inlining doesn't necessarily impose restrictions either to complex materials. You can create an "uber-material" which can consist of all your material models ...
Also, reducing the number of hit shaders goes against Nvidia's recommendations. They recommend using a different material model per hit shader. We've also seen applications like Quake 2 RTX which straight up creates a new
PSO per bounce! I can already see it now, a new PSO per material model, per bounce, per different ray generation shader, and more etc ...
Inline is also the least efficient option when actually running non-trivial RT shaders. PSOs don’t inherently cause stutter. Lack of precompilation does.
The argument for inline is fundamentally flawed due to the thread divergence inherent to ray casting. Ironically Intel probably handles inline divergence better than Nvidia and AMD as their wavefront size is 8 vs 32 on the other guys. Yet they’re still spending transistors on thread sorting.
True precompilation will likely never be an option on PC for years to come so compilation stutters are inevitably going to be more frequent and more severe with RT PSOs so end users might as well get a head start in embracing this future ...