Geometry Shader is a lesson from history: it is essentially a complete waste of time and was when it was introduced.
Also, triangle intersection test rate (which RDNA 2 seems to be bad at) is difficult to isolate as a bottleneck separate from BVH traversal incoherency. BVH traversal is bottlenecked by memory (just like texture filtering, in general), meaning that compressed BVH nodes and a rambo cache are vital.
In my opinion NVidia has extremely good node-compression, based on the years-long research that NVidia has on the subject of BVH format.
It's worth remembering that a workgroup of, say, 1024 work items intrinsically offers a speed-up for dynamic branching on current hardware: when an entire hardware thread goes dark, that subset of work items (e.g. 32 on RDNA 2) no longer occupies any execution slots. During traversal, those execution slots can run the closest-/any-/miss-shader for those work items instead, if the hardware is running an uber-ray-shader that combines traversal with closest-/any-/miss-shaders.
Finally, if you can build a compute unit that can do conditional routing to mitigate the slow-down of incoherent branching, then hardware traversal is entirely pointless. RDNA 2 is not 10x or more slower even in the worst-case scenarios, so a 2-4x speed-up from such a solution is more than enough.
Intel's variable-width hardware threads, if Arc has them, will be interesting in their own right as a solution to dynamic branching in shaders. In combination with hardware BVH traversal that will be interesting.
Intel's hardware traversal may not be MIMD.