Indeed! Should not take that long and might explain...Maybe when we get the ISA for RDNA 2 we'll find out what an "accelerated ray query (box or triangle)" instruction actually looks like.
One of the queries I have about BVHs is whether it's possible to configure multiple, independent, BVHs simultaneously and use them ad-hoc.
The split between top-level and bottom-level acceleration structures might provide some clues here.
I still lack DXR experience, but the concept of top / bottom level AS usually is:
Build high quality tree per model. (Usually only once and offline, but DXR does it on client CPU on model load.)
Build low quality top levels per frame over the set of active models. (DXR on GPU - even for thousands of sub trees this should give no perf problems.)
DXR seems fine here? It only lacks the option to cache bottom AS to disk, which could be added.
One thing is quite ironic to me:
When many people said 'realtime RT is not possible because building AS takes too long', i laughed because i knew the top / bottom split idea had already solved this problem since centuries.
Then when RTX came out, it took me quite some time to realize the problem is suddenly back for open world games.