Jawed
Legend
Annoyingly, page 222 (230 of the PDF) has an incomplete sentence/paragraph:For people who's inerested
https://developer.amd.com/wp-content/resources/RDNA2_Shader_ISA_November2020.pdf
The image_bvh_intersect_ray and image_bvh64_intersect_ray opcode do not support all of
I'm going to guess this is merely a reference to "texture" functionality that is not supported such as sampler mode, see the section "Restrictions on image_bvh instructions" on page 81 (89 of the PDF):
- DMASK must be set to 0xf (instruction returns all four DWORDs)
- D16 must be set to 0 (16 bit return data is not supported)
- R128 must be set to 1 (256 bit T#s are not supported)
- UNRM must be set to 1 (only unnormalized coordinates are supported)
- DIM must be set to 0 (BVH textures are 1D)
- LWE must be set to 0 (LOD warn is not supported)
- TFE must be set to 0 (no support for writing out the extra DWORD for the PRT hit status)
- SSAMP must be set to 0 (just a placeholder, since samplers are not used by the instruction)
There's support for BVH data that is larger than 32GB, but the count of nodes can only be described by a 42-bit number.
There's support for float16 ray and inverse ray direction vectors, "for performance and power optimization", saving 3 VGPRs. I can imagine console developers being all over that optimisation. Though until someone captures the ray cast shaders, we won't really know much more.
There are some interesting flags:
- Box sorting enable
- Triangle_return_mode
- big_page
So I think this is merely a way to "speed-up" box traversal, by allowing the ray to progress into multiple boxes per query and then to determine which of those boxes should be queried further. Or rather, start to evaluate the boxes, since a triangle hit could be in any of them. I'm struggling to come up with a fast algorithm here, so not sure if it's a speed-up technique or perhaps the 4-way result merely reflects how AMD has structured the boxes...
I can't find anything that describes what happens when there's less than 4 box results. If you start the query one level up from leaf nodes, then you might get less than 4 results?
Triangle return mode either returns a pair of triangle ID and hit status or i/j coordinates. I'm not sure how this relates to DXR's concept of instancing, but I presume the i/j pair is the barycentrics for the triangle. The first pair of floats seem to be "intersection time". I can't find any description of the use that intersection time might be put to.
It seems that if you want both the triangle ID and the barycentrics you have to run the query twice?
I can't find any explanation for the use of the big_page flag (either no override or pages that are >= 64KB in size), but I suppose this could be an optimisation that suits the total size of the BVH.
Box growing amount is an 8-bit description of "number of ULPs to be added during ray-box test, encoded as unsigned integer", so allowing some fuzziness and control of the fuzziness. I've been wondering whether fuzziness is a direct technique to speed-up ray queries, lots of fuzziness at the start of traversal, with reduced fuzziness as the depth increases. I'm not sure how traversal can evaluate depth though, except for a fuzzy depth derived from the count of queries issued for a ray direction. So the overall algorithm would use a single fuzzy ray instead of a group of, say, 32 rays and then increase the ray count with depth and reduced fuzziness.
I suppose it would also allow boxes to implicitly overlap each other. So far I've thought of overlapping boxes as being a design decision for the IHV, in terms of a fixed choice for how the hardware works. BVH data structures are way more complex than I first thought (irregular octree), so I'm lost.
I'm intrigued by the idea of a single traversal shader having access to multiple "BVH" textures concurrently. There's just a base address, so seemingly multiple BVHs could be available.
Along the way I've found this page:
DirectX Raytracing (DXR) Functional Spec | DirectX-Specs (microsoft.github.io)
which is quite detailed "AABB" is the best way to search the page to understand bounding boxes/volumes. BVH isn't really used.
There's a concept of a degenerate AABB, which is worrying!:
I've not heard of bounds being inspectable before. There's nothing in the RDNA 2 documentation which would appear to support the idea of inspecting bounds. It sounds to me like wishful thinking...During traversal, degenerate AABBs may still report possible (false positive) intersections and invoke the intersection shader. The shader may check the validity of the hit by, for example, inspecting the bounds.
Overall, pretty interesting, but less conclusive than I was hoping to find.