Intel ARC GPUs, Xe Architecture for dGPUs [2018-2022]

Status
Not open for further replies.
It's not like 1.1 solves the issue...
DXR 1.1 doesn't get rid of graphics or compute PSOs but it doesn't introduce more types of PSOs such as RT PSOs in DXR 1.0 so at least it isn't making compilation worse ...

Combining inline RT with compute pipelines may be the most powerful option in terms of minimizing compilation ...

If anyone advocates for DXR 1.0/RT PSOs but are opponents to the PSO model then they should lose all virtues to making a principled argument. They should just flat out start admitting that the PSO model was the correct abstraction all along for hardware and drop any grievances they've had for it in spite of it's flaws ...
 
DXR 1.1 doesn't get rid of graphics or compute PSOs but it doesn't introduce more types of PSOs such as RT PSOs in DXR 1.0 so at least it isn't making compilation worse ...
How is this making it worse? Separating a different workload type into its own shader type likely makes compilation less complex i.e. faster.

Combining inline RT with compute pipelines may be the most powerful option in terms of minimizing compilation ...
Why?

If anyone advocates for DXR 1.0/RT PSOs but are opponents to the PSO model then they should lose all virtues to making a principled argument. They should just flat out start admitting that the PSO model was the correct abstraction all along for hardware and drop any grievances they've had for it in spite of it's flaws ...
It's completely orthogonal. The model itself can be flawed but that doesn't mean that there can't be better or worse practices inside the model itself.
 
NVIDIA hardware doesn't care if the code is DXR1.0, DXR1.1 or Vulkan, it remains significantly faster than AMD in all of those cases according to already released games and synthetic benchmarks, so I am not sure about the importance of that point.
DXR 1.1 doesn't get rid of graphics or compute PSOs but it doesn't introduce more types of PSOs such as RT PSOs in DXR 1.0 so at least it isn't making compilation worse ...
Most RT advanced games still use DXR1.0 and manage to remain stutter free, games like Metro Exodus, Minecraft RTX, Control, Cyberpunk, Dying Light 2, .. etc.
 
Last edited:
It's a bit ironic to see people advocating for DXR 1.0 since the API uses the EXACT same PSO compilation model as the regular graphics pipeline which is the source of these compilation stutters in games for which many consider it to be a failure of the model itself. If end users and developers want more compilation stutters due to the combinatorial explosion of different PSOs then I guess it's the future they deserve ...

DXR 1.0 is known as RTPSO for a reason ...
It's not a failure of the model itself. It's a failure of developers doing the work the model requires them to do... They've relied on drivers doing the heavy lifting for them for so long, that it's inevitable that some developers wont take to it straight away. This 'period' of stutters in games we're going through, is birthing pains that we were going to have to go through at some point anyway. Better to get it over with now.
 
Combining inline RT with compute pipelines may be the most powerful option in terms of minimizing compilation ...
How's so? Does inlining make compute shaders simpler?
No, inlining just puts more restrictions on how you can do some complex materials (i.e. you can not), but it's a common practice to reduce the number of hit shaders (materials) with DXR 1.0 to a bare minimum too, so is there any evidence that inlined compute ubershaders would do any better with compilation?
 
Combining inline RT with compute pipelines may be the most powerful option in terms of minimizing compilation ...

Inline is also the least efficient option when actually running non-trivial RT shaders. PSOs don’t inherently cause stutter. Lack of precompilation does.

The argument for inline is fundamentally flawed due to the thread divergence inherent to ray casting. Ironically Intel probably handles inline divergence better than Nvidia and AMD as their wavefront size is 8 vs 32 on the other guys. Yet they’re still spending transistors on thread sorting.
 
How is this making it worse? Separating a different workload type into its own shader type likely makes compilation less complex i.e. faster.
At the end of the day, a new pipeline is going to introduce more potential for combinatorial explosion ...
With inline RT, you don't have to create anymore new pipelines than the ones that already exist ...
It's completely orthogonal. The model itself can be flawed but that doesn't mean that there can't be better or worse practices inside the model itself.
Considering the berating that the new gfx API designs received from the very same people who were proponents of RT PSOs, I imagine a confession that they themselves to be erroneous is in order ...

Why even be apologetic to the idea to those that considered it to be the anathema in the first place ?

How's so? Does inlining make compute shaders simpler?
No, inlining just puts more restrictions on how you can do some complex materials (i.e. you can not), but it's a common practice to reduce the number of hit shaders (materials) with DXR 1.0 to a bare minimum too, so is there any evidence that inlined compute ubershaders would do any better with compilation?
inlining -> less pipelines to create -> less pipelines to compile

Inlining doesn't necessarily impose restrictions either to complex materials. You can create an "uber-material" which can consist of all your material models ...

Also, reducing the number of hit shaders goes against Nvidia's recommendations. They recommend using a different material model per hit shader. We've also seen applications like Quake 2 RTX which straight up creates a new PSO per bounce! I can already see it now, a new PSO per material model, per bounce, per different ray generation shader, and more etc ...

Inline is also the least efficient option when actually running non-trivial RT shaders. PSOs don’t inherently cause stutter. Lack of precompilation does.

The argument for inline is fundamentally flawed due to the thread divergence inherent to ray casting. Ironically Intel probably handles inline divergence better than Nvidia and AMD as their wavefront size is 8 vs 32 on the other guys. Yet they’re still spending transistors on thread sorting.
True precompilation will likely never be an option on PC for years to come so compilation stutters are inevitably going to be more frequent and more severe with RT PSOs so end users might as well get a head start in embracing this future ...
 
True precompilation will likely never be an option on PC for years to come so compilation stutters are inevitably going to be more frequent and more severe with RT PSOs so end users might as well get a head start in embracing this future ...

Unfortunately, seems there are solutions to the problem that do work out pretty well. Only in your dreams will things get worse ;)
 
The argument for inline is fundamentally flawed due to the thread divergence inherent to ray casting.

It's not an insurmountable problem, the developers can add code to sort it in the middle by material. When Epic and NVIDIA added RTX support to Fortnite that's what they did in fact.
 
It's not an insurmountable problem, the developers can add code to sort it in the middle by material. When Epic and NVIDIA added RTX support to Fortnite that's what they did in fact.

They can but unfortunately they most likely won’t. We’ve seen it over and over with these lower levels apis. The promised revolution in developer innovation is happening in small pockets but it’s far from universal.

Sorting is only going to get harder as the number of bounces and materials increases. I have little faith developers will keep up.
 
Inlining doesn't necessarily impose restrictions either to complex materials. You can create an "uber-material" which can consist of all your material models ...
"uber-materials" cause low occupancy and divergent shading, that's why inlining imposes restrictions in practice

inlining -> less pipelines to create -> less pipelines to compile
RT shaders still have to be compiled be they compute or RT, so both would cause more PSOs to compile. Compiling "uber-materials" doesn't sound great either.

They recommend using a different material model per hit shader.
This guide suggest using DXR 1.0 stile shaders instead of DXR 1.1 ubershaders.
It also suggest considering simplified shading and a unified shader for materials, which would result into fewer hit shaders.
 
They can but unfortunately they most likely won’t. We’ve seen it over and over with these lower levels apis. The promised revolution in developer innovation is happening in small pockets but it’s far from universal.

Sorting is only going to get harder as the number of bounces and materials increases. I have little faith developers will keep up.

Will the rise of the physical rendering Uber Shader help? I don't mean the Dolphin/CS:GO uber shader combining all the old shaders into one big branching shader, but the new one from the offline rendering world where they just create one shader to rule them all based on physical rendering. It's a bit heavy, but it's a single shader.
 
Will the rise of the physical rendering Uber Shader help? I don't mean the Dolphin/CS:GO uber shader combining all the old shaders into one big branching shader, but the new one from the offline rendering world where they just create one shader to rule them all based on physical rendering. It's a bit heavy, but it's a single shader.

Theoretically yes but it seems impossible. The code to render subsurface scattering in skin must be different to the code that lights a metal pipe right? So there must be branching at some point.
 
If most developers start agreeing on a single shader and just adjusting weights for artistic expression (and get rid of a lot of hacks which required special shaders, no more decals for instance) then the engine designers combined with the IHVs can gift them proper shaders which resort for branch coherency.

If a developer is then determined to do something special, he'll have to compete with all the games which don't hand the hardware shit code.
 
So Intel is admitting the A770 is going to be equal to a 3060Ti rasterization wise, but only faster than 3060 ray tracing wise.

 
Status
Not open for further replies.
Back
Top