Thanks for all the backing, guys! Like said i don't take anything personal, and i see what i've said seems to be just polarizing, which is not bad. But also all this led to some doubts on myself.
So i looked up some more kernels of RadeonRays, but my impression does not change. Basically i rule them all out early because they use a binary tree, which results jumping around in memory like crazy, and GCN performs badly with this. NV is much more forgiving to bad memory access patterns.
(Unrelated info: NV is also more forgivng to unoptimized code. Or, if you prefer: AMD honors optimization much more.)
I don't say the code here is unoptimized or bad, but in my opinion binary tree is the worst choice. Using a tree with larger branching factors (e.g. 8, 16... children per node) allows to read child nodes from coherent memory, and also to process them in parallel if desired. Also the tree has much less levels, which limits divergence.
Add this to my previous suggestion (which can lead to reduced bandwidth by a factor up to 64!), and you see why i am not impressed if we talk about realtime RT. I would say RadeonRays is 'high performance', but i would not say it is 'realtime'.
That's just my personal opinion. But my criticism is just a response to you, mentioning RadeonRays in context of realtime RT or even hardware acceleration. This makes no sense to me. RR is very fine for content creation, because it does not require Cuda.
My personal pessimism and doubt in AMDs raytracing experience may be similar out of place! It's no fact, just a personal guess! AMD managed to surprise with beating their competitors more than once.
The second point, vendor compute performance is something i can't proof, but i see it appears exaggerated, even to many experienced programmers.
But i repeat some points: Most recent GPUs not included. GCN needs more optimization work and careful design of memory access patterns (some pitfalls that easily go unnoticed).
Further, some personal impressions: Game industry has still not learned to utilize compute - they think in triangles and pixel shaders. Other industries has been already won by NVs Cuda and have no need to optimize.
This is why we see the insane compute power of GCN so rarely. But it's there, my numbers are real. Notice that all my optimizations work for NV too. I do not optimize exclusively for AMD, and i maintain different codepaths for both vendors in case of differing best choices. (Which luckily seems not necessary anymore with more recent GPUs)
If i would want to criticize myself, i would really pick other points, likely:
Accuse NV to do blackbox and fixed function to protect their RT lead at the cost of limiting general progress and innovation. <- why did nobody react to this? That's a real exaggerated insinuation maybe. But you go havoc on my performance analysis, which is real (but you have to add to others for an average).
Also, my apologizes to Bruce Bell. That was really out of place.
... probably i'm wrong with other things too. I'm often wrong, like everybody else.
I was not aware what i've seen is not real. It showed Tensor and RT having the same area as shader cores.
So you're right and i've made potentially wrong conclusions.
Agree about triangles, but not because they are state of the art. They are just the most efficient way to approximate geometry in practice. (Exception is something diffuse like a branchy bush with many leaves).
But i disagree with your optimism in your remaining comment.
You are just wrong: The core of rasterization (ROPs) is still fixed function, or can you draw a curved triangle, or do occlusion culling while rendering back to front like Quake does? No, you can't. All you can do is early Z and occlusion culling. Both requires to draw the entire triangle.
Now you can argue that's no problem - today we cull stuff at larger granulary etc. and you are right.
But raytracing is different. Rasterizing a trianlge is simple enough you can select one out of two possible options, make it a fixed function and good is. Raytracing however is still an open problem. On both CPU and GPU. Now all research on this open problem is entirely in the hands of a profit oriented minority.
Maybe that's just the kind of specialization our time requests, but maybe it is just to early to close this topic for public research.
In any case, i doubt the core will ever become programmable. The harm may have already happened and may be irreversible. We can not be sure about that.
About SDF, well... we can not compare this to anything discussed here. If we want RT GI, i personally think we have to rule it out. Together with Voxel Cone Tracing, Light Propagation, etc.
The problem is if we talk about lighting surfaces, doing this with a volume data structure requires more memory and more samples no matter how good your compression is.
Also the volume data appears attracting to implement simple algorithms, and simple is good. But the truth is that it's just but brute force. Sphere tracing is brute force, memory is limited and slow. No good choice if we need to relate every point in space to each other, and perform a visibility test in between. We can not solve a O(n^3) problem using brute force. Or even if we could, we should choose the better approach just to save energy.
Personal opinion - i've failed with volume data approaches - others are still working on it and achieve results. Personal. And not meant as criticism towards Claybook or SDF in general. I only talk about application for full GI.
For example i like this work here, which seems to be a volume based diffusion approach:
I've experimented with this too years ago, but the problem is: With reduced volume resolution light leaks just like crazy. Volume data is no good approximation at low resolution (you can not express multiple walls within a single voxel, not even a single wall well). Voxel Cone Tracing has the same problem, SDF too.
Surfels can be tuned to cause overocclusion instead, which is acceptable, and there is no global spatial limitation like a grid. At some point each approach breaks down, hopefully at a distance far enough from the camera. (See the Many LODs paper i've mentioned in the other thread if you're interested:
)
So, that's it. I need to continue work now. Costs too much time to do introductions on GI
... see ya!