Frenetic Pony
Veteran
Not necessarily, simple RT could be implemented without many changes: http://diglib.eg.org/handle/10.2312/hpg.20141091.029-040
Of course dedicated circuits would be better. They could also do it like in the paper, and then reserve a fixed amount of CUs do to RT in parallel to the normal pipeline, or whatever.
What a great paper, it could explain the rumored massive frontend redesign of Navi. Looking it over, I'm not sure Navi would even need the proposed specialty hardware as it already has low precision double rate and quad rate built in. Nvidia already does 4bit and there's no reason AMD can't either (relevant to the proposed BHV update scheme herein). The cache hierarchy redesign that would be needed is already confirmed.
Using rapid packed math low precision would still be slower/larger than the proposal, but would be much more flexible than the tiny fixed function units proposed. GCN has had a great run for consoles by being highly programmable for a 2013 arch, Death Stranding and Doom Eternal look great thanks in part to giving programmers flexibility. What if programmers want to use bounding spheres for better BVH building? Or bounding octahedrons for better tree traversal? Etc. etc.
Not saying that's what Navi has done at all, but it'd be fascinating if they did. The frontend redesign rumored would be a massive change, but could also help with disparate shading from a work distribution perspective. It could also be helpful with the more modern triple A requirements of the GPU, eg running physics and possibly AI and stuff as well as the usual shading operations.
Or it's all completely wrong, ah well, a few days to find out.