I wonder how much size amd rt implentation will take. With Navi, which is a "small" cheap on 7nm, it seems they have hard time competing with nvidia big tu 104 / 106 which are still in 12nm, efficiency / power wise (based on what they said, but I concede we need to wait for the review to be sure of that). 7nm won't save them if the implentation is not efficient...
We can make some assumptions from the AMD RT patent, and from what we can guess about NV:
AMD uses TMUs to process one iteration of the RT loop, which means the shader program issues to intersect one level of BVH BBox / triangle intersection. A compute shader would look like this (simplified):
queue.push(BVH_root_node)
while (!queue.empty())
{
intersection_info = TMU.interesct(ray, queue) // may push new nodes to the queue, maybe implemented using LDS memory
if (intersection_info.hitType == triangle) closestHit = min(closestHit, intersection_info.triangleID)
}
store closest intersection for later processing...
This means the shader is busy while raytracing, but also there is felxibility in programming (could terminate if takes too long, maybe really interesting things...)
On NV it looks more likely just like this:
intersection_info = RT.Core.FindClosestIntersection(ray);
Which means the shader core likely becomes available to other pending taskes after this command (like hit point shading, or async compute, ...).
Also we have no indication NV RT cores would use TMU or share the cache to access BVH.
Conclusion is NV RT is likely faster but takes more chip area. AMD likely again offers more general compute performance which could compensate this.
But it could also happen AMD adds a FF unit to process the outer loop i have written above. Patent mentions this as optional. Still, fetching textures while raytracing would compromise perf more than on NV - maybe. (Patent mentions sharing TMU/ VGPR advantage is to avoid the need for specialized large buffers to hold BVH or ray payload data.)
It will become very interesting to compare performance, and to see what programming felxibility (if so) can add...
That assumes the competition would set on their laurels and do nothing to improve their current RT solution, which we know won't likely happen, as NVIDIA will push their RT angle to the extremes.
My bet (better said my hope) is, the next logical step would be to make BVH generation more flexible.
For example if they want to be compatible with mesh shaders, they just have to make this dynamic.
This would be awesome because it solves the LOD limitation. (I would not even care if BVH generation becomes FF too
)
After that i would make sense to decrease ROPs and increase RT Cores. Up to the point where rasteirzation is implemented only with compute. (Texture filtering remains, ofc)
And only after that i would see a need for ray reordering. (Which i initally thought to be FF already now, and the assumed complexity was a main reason to question RTX.)