What other RT solutions could AMD entertain? Does the BVH or equivalent have to be on GPU? Seems not to me, with nVidia sticking it there as that's their only part in a PC. AMD however could put an acceleration unit in the CPU, or indeed elsewhere in a system. What would the ideal be?
BVH acceleration can make sense because it's useful for other things like physics too. But ofc the nodes have to be exposed as accessible data structure. Also building the tree should be left to the user, flexible branching factor (number if max children per node) would be nice if possible.
Shooting rays against the BVH would be useful, but i'd prefer to keep control of ray batching. So i wanted to shoot a compute wavefront of 64 rays against BVH, not single threaded isolated rays.
Further i would be fine with getting the results later in some form of query. I do not want the wavefront to be stalled while waiting on results of a prcocess that has long non constant runtime.
I would be fine with just the BVH. Triangles, or whatever other form of geometry we use should be implemented by users instead, again within regular compute to utilize parallel algorithms. No single threaded custom intersection shader crap!
To help with this, some special instructions to accelerate triangle - ray test would be worth it.
I do not want to call surface shaders per thread, execute them and return the results. All this makes it easy but it's not how GPUs work. Instead leave it to the users to batch materials and to limit divergence. More work for us, but more potential for best performance.
Additionally we need options to generate work directly on GPU (!!!!), so all the above can be done without inflexible requirement of command buffer generation on CPU.
Mantle already had support for conditional command buffer execution and loops. I guess consoles have this already and it might be good enough. But it needs to be extended to async compute as well.
EDIT: ofc generting commands directly from compute would be much better.
Most of all this can be done very easily by AMD. (BVH in HW traversal is no requirement either)