IIRC, research on advanced reordering in GPU HW gave a speed up of about 2 for incoherent test cases. As long as we are interested mostly in coherent rays (shadows, sharp reflections), i think that's not worth it yet. But as time moves on we might get it and then inline tracing becomes deprecated.At about 1 trillion rays/s, how much sorting and over what proportion of the rays is "shuffling" going to be meaningful? Seems to me that the cache hierarchy is the right place to worry about coherence...
However, the really interesting thought about this "AMD prefers inline tracing", and the assumption "inline tracing has been added for AMD / consoles" is this:
It fits the TMU patent where traversal outer loop is handled from regular shader cores. And if that's what they have finally used, traversal shaders would be possible.
So my hope AMD exposes it with extensions is not totally dead yet.