I am not sure I follow you here, but intersecting a triangle or an AABB should not be any worse, latency wise, than issuing a texture sampler instruction.BTW, up to a point. You can't separate them out. A highly variable latency instruction for a shader ... feels wrong.
PS. oops, I guess you could just detect leaf nodes in the shader and limit triangle count during build. So it doesn't really matter.
AFAIR RDNA2 sends the ray and AABB/triangle data to the HW intersectors, so it's not like the latter have to fetch anything from memory anyway (unlike when we are sampling a texture).
The main problems with this approach is that traversal doesn't like SIMD/SIMT and that there is a fair amount of state that needs to be moved around from the shader core to the intersector.
When you keep traversal in one place (as in a dedicated unit) there is not as much state being moved around and you can go MIMD on traversal.
It's not the only way to do it though. I believe the latest RT material from IMG advocates for a different approach, where IIRC traversal is constantly re-converged and SIMDfied, so that it might not require MIMD HW to be efficient.
OTOH that re-ordering might need to shuffle more state around..