Both samples perform very different things ...Correct me If I am wrong, but doesn't their result align with NVIDIA? At bin sizes 6 to 13, the Work Graph methods are slower than Multi Pass ExecuteIndirect, only at 14 bins does the Work Graph carve a ~20% win, and at 15 bins it goes down to ~5%.
Nvidia's sample implements a multi-BRDF deferred shading model by starting out with a broadcasting root node to do tiled light culling and then outputs a deferred shading record. For all of the broadcasting leaf nodes, each of them represents their own specialized BRDF shading model and takes the deferred shading record generated by the tiled light culling pass to compute the material colour ...
AMD's sample implements a compute rasterizer and starts off by sending a work load record to the broadcasting root node which will do vertex shading and triangle bounding box computation and outputs both the split records and rasterization records. The split records will be used as inputs for broadcasting nodes (one specialized each for small/large bounding boxes) that do hierarchal bounding box subdivision/culling and both will output their own rasterization records. Lastly, we have thread leaf nodes each of which will take a rasterization record and represent the different triangle bin sizes for scan conversion (rasterization). One notable property of thread launch nodes is that their implementation can operate as some sort of producer-consumer queue ...
In the AMD sample, there's two different work graph implementations of the compute rasterizer. We have a version of the algorithm that does a dynamic dispatch for the nodes that do hierarchal bounding box subdivision/culling which is always slower than the multi-pass ExecuteIndirect implementation in this sample and a fixed dispatch version of the algorithm where they found out that triangle bins with tile sizes upto ~16K pixels is the fastest method in this case. The concept of binning in the Nvidia sample only applies during the purposes of the light culling pass where they split up the image into 32 pixels sized screen space tiles ...
Last edited: