Intel Xe Ray Tracing

Jawed

Legend
I don't know how close to the hardware this presentation gets us:


Manually implemented stack for shader-call hierarchy looks like a pain in the arse. But that's Intel's problem and because of the complexity of ray-tracing's function hierarchy, this functional "Continuation-Passing Style" approach may be unavoidable. (I say this having read some AMD patents on the subject of the function hierarchy for ray-tracing.)

For example one of the parameters of a shader is the width of the SIMD. Since Intel builds a massive pipeline of callable shaders (this is referred to as a "mega-pipeline"), it can compile variants of the same source shader code with varying SIMD-width.

Intel's ray trace acceleration hardware does full traversal.

BVH building is hard, massive mess of functions...

Note: I know practically nothing about Vulkan, so lots of concepts here I can't distinguish as being Vulkan-specific or Intel-specific. Also, note, I'm assuming this is Xe.
 
Aha, so we have shaders callable from shaders now.
Most important missing feature from gfx APIs.
But it's limited to... only raytracing?
And thus, together with conditional draws not including barriers, another missed opportunity of serious progress on the API.
Meh.
 
Aha, so we have shaders callable from shaders now.
Most important missing feature from gfx APIs.
But it's limited to... only raytracing?
And thus, together with conditional draws not including barriers, another missed opportunity of serious progress on the API.
Meh.
Progress will happen after all the vendors will support what we have now at least, not before.
 
Aha, so we have shaders callable from shaders now.
Most important missing feature from gfx APIs.
But it's limited to... only raytracing?
And thus, together with conditional draws not including barriers, another missed opportunity of serious progress on the API.
Meh.
OpExecuteCallableKHR is generic shader call support aimed specifically at ray tracing.

Vulkan® 1.2.192 - A Specification (with all registered Vulkan extensions) (khronos.org)

I think it's interesting that the name contains no reference to ray tracing concepts, so it may be primed for wider use.

At the same time, outside of the scope of ray tracing, what semantics would be available? It seems to me that compute shaders are the only possible use and would require some kind of variable argument list support.

The call stack in generic compute is "unbounded" (e.g. recursion), so managing that stack becomes a programmer-exposed problem. We can see from the Vulkan specification:

Vulkan® 1.2.192 - A Specification (with all registered Vulkan extensions) (khronos.org)

that the programmer has the option to apply their knowledge of the worst-case stack size. If the driver uses shared memory to maintain the stack, then there's a hard limit to stack depth. If the stack has to be maintained in VRAM then performance disappears, I guess.

Now, here's an idea for a crazy programmer: hack ray-tracing to provide you with generic callable shader support in a non-ray-tracing scenario. e.g. create a one-node BVH (or maybe it would have to be a bit bigger?) and use a ray-generation shader that always intersects that node. Then set-up your callable shaders in a pipeline that are all available to the intersection shader. Would that work? Could you get arbitrary callable shaders out of that?

It seems you won't be able to use shared memory in those shaders, though.
 
Progress will happen after all the vendors will support what we have now at least, not before.
I don't expect too much. Would be happy already if we could skip over sections of a command buffer. Conditional draws can, but the sections can not have memory barriers. So we can't do any coarse control flow for compute dispatches.
Hard to to believe any modern GPU could not support this feature. It looks more like they just forgot to think about compute and importance of memory barriers with conditional draw, thus my rant.

Calling shaders form shaders is another topic, and much more difficult. Mesh amplification shaders look promising here. Would be nice to bring a similar concept to compute too, even if also limited to SIMD wide work group size for example.
I'd guess this also is possible now on any GPU supporting mesh shaders.
 
Back
Top