PowerVR Kyro 3 please.
Coherency gathering in ray tracing: the benefits of hardware ray tracking
11 February 2020 - Rys Sommefeldt
https://www.imgtec.com/blog/coheren...racing-the-benefits-of-hardware-ray-tracking/
IMG's ray tracing media go back a number of years, so questions like whether this latest article applies to a coherency engine today versus a block of that name in the 2014 Wizard GPU might illuminate why this is seems to apply to upcoming IP.Hmmm... grouping rays by direction and origin to clusters of BVH is pretty much the standard idea of reordering that exists for a long time.
The fact NV did not implement it (yet) although they did a lot of related research, and the fact that licensing ImgTec lists the coherency engine as optional, led me to the conclusion it is not worth it yet.
That's contrary to what i thought for all those years before, assuming reordering would be the only way to practical realtime RT at all.
Do you think ImgTec continues to offer their RT blocks for other GPU makers? The question bothering me is how difficult this is to integrate. (Same question as when considering non AMD RT in PS5)IMG's ray tracing media go back a number of years, so questions like whether this latest article applies to a coherency engine today versus a block of that name in the 2014 Wizard GPU might illuminate why this is seems to apply to upcoming IP.
Which has 2 TF at most.IMG's A-Series
even if they did offer blocks, it would still need to be heavily customized for this to work. As you noted, the previous GPU their RT worked on was for a mobile GPU. They would need to scale this up significantly while working with AMD tech.Do you think ImgTec continues to offer their RT blocks for other GPU makers? The question bothering me is how difficult this is to integrate. (Same question as when considering non AMD RT in PS5)
Or are their licensing offers only about a whole ImgTec GPU for SOC makers?
Which has 2 TF at most.
Can we assume practical compute performance so could be similar to a 2TF desktop GPU or PS4?
I ask in context of comparing any brand mobile GPU vs. desktop GPU in general, so if anyone can share some experience or educated guess i'd appreciate.
It's hard to get any impression about mobile compute perf from sparse given specs and gfx. benchmarks. For example i do not know if mobile GPUs have reserved on chip LDS memory at all.
I've only seen references to licensing the GPU to SOC makers, although the pool of those that don't make their own is smaller these days.Do you think ImgTec continues to offer their RT blocks for other GPU makers? The question bothering me is how difficult this is to integrate. (Same question as when considering non AMD RT in PS5)
Or are their licensing offers only about a whole ImgTec GPU for SOC makers?
I haven't found a good comparison point. The few benchmarks I've found are old and were implementations far below that range.Which has 2 TF at most.
Can we assume practical compute performance so could be similar to a 2TF desktop GPU or PS4?
PowerVR's Rogue architecture has a Common Store per shading cluster that holds workgroup shared memory. Details aren't as well-documented for the A-Series (or the B-Series that was listed under 2020 in the A-Series announcement roadmap).It's hard to get any impression about mobile compute perf from sparse given specs and gfx. benchmarks. For example i do not know if mobile GPUs have reserved on chip LDS memory at all.
Not backed by the measured results though, the conclusions are just wishful thinking. The boost they measured is indistinguishable from coincidental cache coherency, in an implementation which is balanced so well it has (for common use cases) only a factor of 4-10x between being ALU bound (best case) and being memory throughput bound (worst case).(Among others) they come to the conclusion that RTX does use some sorting
As for "reordering", no indicator for that either. The traversal unit may just be marching over all threads (rays) of a warp in SIMD as far as we know. So far there is no indicator the scheduling is any finer than the usual "per-warp" tracking of in-flight memory accesses, stalling each a full warp until the memory dependencies for all threads are satisfied.
And they saw that "speedup" even for primary rays only, too. So no loop involved, indicating something bogus about their test setup. Also some really weird speedups when going from secondary to tertiary rays, achieving once again the performance level of fully coherent primary rays?!As i understood it, they saw speed up when making paths like: generation->hit->hit shaders, but slower when doing just a loop inside a single generation shader.
I'd say area lights and shadows have highest priority
Huh? Why do you think so?because ray tracing does little to make soft shadows more efficient.