Acceleration structures
This is the first of the two major sections in the post. It focuses on building and management of ray-tracing acceleration structures, which is naturally the starting point for using ray tracing for any purpose.
- General tips
- Maximizing GPU utilization when building
- Memory allocations
- Organizing geometries into BLASes
- Build preference flags
- Dynamic BLASes
- Non-opaque geometries
- Particles
General tips
Consider async compute for AS building. Especially in hybrid rendering, where G-buffer or shadow maps are rasterized, it’s potentially beneficial to execute AS building on async compute.
Consider worker threads for generating AS building command lists. Generating AS building commands can include a considerable amount of CPU-side work, like the culling of objects. Moving it to one or more worker threads is potentially beneficial.
Cull instances for TLAS. Typically, including the entire scene in the TLAS is not optimal. Instead, cull instances depending on the situation. For example, consider culling based on an expanded camera frustum. Maximum distance can often be less than the far plane distance in rasterization. You can also consider instance size when culling so that smaller instances are culled at a shorter distance.
Use appropriate LOD for instances. Like in rasterization, using the most detailed geometry LOD for everything is typically suboptimal. LODs used for far away objects can be simpler. In hybrid rendering, using the same LOD for rasterization and ray tracing can be considered. It’s an efficient way to avoid self-intersection artifacts such as surface shadowing itself. Using lower detail LODs in ray tracing can be considered too, especially to reduce the updating cost of dynamic BLASes. If the LODs between rasterization and ray tracing don’t match, enabling back face culling is often needed in ray tracing to prevent the self-intersections. For more discussion about LODs in ray tracing, and an explanation for how to implement stochastic LODs, see
Implementing Stochastic Levels of Detail with Microsoft DirectX Raytracing.
Flag geometries or instances opaque whenever possible. Flagging instances or geometries as opaque allows uninterrupted hardware intersection search and prevents invocation of the any-hit shader. Do this whenever possible. Enable the use of any-hit shaders only for those geometries that need it; for example, to do alpha testing.
Use triangle geometries when possible. Hardware excels in performing ray-triangle intersections. Ray-box intersections are accelerated too, but you get the most out of the hardware when tracing against triangle geometries.
...
Hit shading
This section of the post focuses on the shading of ray hits. Even seasoned graphics developers may benefit from fresh ideas when they start developing ray-tracing shaders, as the optimal solutions may be different from what they are in rasterization.
- General tips
- Minimizing divergence
- Any-hit shader
- Shader resource binding
- Inline ray tracing (DXR 1.1)
- Pipeline states
General tips
Keep the ray payload small. Registers are used to hold payload values and they reduce the number of registers otherwise available to hit shaders. I recommend avoiding careless payload usage, though adding complex code to pack values is rarely beneficial.
Consider writing a safe default value to unused payload fields. When some shader doesn’t use all fields in the payload, which are required by other shaders, it can be beneficial to still write a safe default value to the unused fields. This allows the compiler to discard the unused input value and use the payload register for other purposes before writing to it.
Terminate rays on the first hit when possible. When resolving the correct closest hit is not required, like typically for shadow rays, flagging rays with RAY_FLAG_ACCEPT_FIRST_HIT_AND_END_SEARCH or gl_RayFlagsTerminateOnFirstHitNV is a simple and efficient optimization.
Use face culling only when required for correctness. Unlike in rasterization, enabling back or front face culling does not improve performance. Instead, it slightly slows down ray traversal. Use them only when it is required to get the correct rendering result.
Minimize live state across ray-trace calls. Variables initialized before a TraceRay or traceNV call and used after it are live state that needs to be maintained across the call while invoking hit and miss shaders. The driver has a few different options to do it, but they all have a cost. I recommend trying to minimize the amount of live state. Identifying such variables is not always trivial. NVIDIA and Microsoft are working together on a compiler feature for automatic detection of live state.
Avoid deep recursion. Deep, non-uniform ray recursion can get expensive.