I fail to see how research just using compute is blocked.
Turing supports every feature of Volta
In my particular case i already have a BVH of surfels in place, which works fine for GI and rough reflections (for the latter i store 4x4 or 8x8 env maps for each texel of a lightmap and update them in real time, similar to Many LoDs. SH etc. would work as well)
For GI this is all fine: Fully dynamic large scene, Infinite bounces, area lights from emitting surfaces and area shadows all for free. I calculate visibility with RT in compute efficiently and there is no need for RTX here. Algorithm has time complexity pretty much of O
, which is remarkable for GI.
But surfels have a spatial maximum of detail (targeted 10cm for current gen), and env maps have limited angular precision because of low resolution.
So i want some classical RT as well to add high frequency details, mainly sharp reflections, and maybe some hard shadows. (Shadow maps or just living with the soft GI shadows are options too)
I'm optimistic at this point a very big step towards photorealism is possible with current hardware. RTX can do all things well that i can do not, and the other way around.
In the future one might want to add some local high frequency GI, finally my stuff could still serve as a fallback after 2nd or 3rd hit within a pathtracer, and after that my work would be no longer needed.
The natural way to add sharp reflections now would be to use classical RT at triangles close at the origin, at some distance fall back to my surfels because here we could solves the LOD problem as well, and at large distance fall back to the env map.
As it is now, to do so i have to maintain two BVHs, and fallback to surfels can not utilize sharing of my custom data in LDS for multiple rays within a DXR custom intersection shader.
It is thus possible that a compute only solution could handle the tracing more efficiently, even in comparison to RTX (accepting the limitation of 'no custom shaders for triangle hits', which in my case of already available low res shaded textures makes sense.)
But likely i wont try this. Because competing a hardware solution sounds just dump. So, (after swallowing my anger
) i will just use RTX as you intend, and i will discontinue my 'research' in this direction.
In contrast, assuming there would be no RT cores i had to utilize, experimenting with all this would still make sense and i might end up with a solution that is more efficient.
Now seeing RT cores struggle a bit to deliver, i'm frustrated and all that RTX thing appears wrong to me again... it's an emotional issue.
I feel guilty and sorry for the noise. But because game developers make smaller steps while making games, i also feel the need to bring up such topics right now, assuming i would be indeed ahead with GI. (Not sure yet - my need for seamless global parametrization is a big downside - making automated tool is hard, only after that i can test my stuff on real game geometry.)
So i hope the above example makes sense to you. Getting heard is all i want.
But now to another point, and i think this one is more important, not only for me:
We finally need a way GPUs can generate their own work! Please bring this to game APIs. I'm fine with vendor extensions.
Stacking up large brute force workloads as it is common in games does not work for everything.
My command buffers are full of indirect dispatches and barriers that do zero work at runtime and cost sums up.
Async compute can compensate, but the sync between multiple queues is too expensive to make it a win for small planning workloads. (On GCN, never tried on any NV)
Launching compute shaders from compute shaders is what we need.
So whatever you have, please expose it. VK device generated command buffers are nice, but no barriers so it's useless for me.
Seeing one additional shader stage introduced every year but no progress here really is frustrating. (Or did i miss something? I did no GPU work during last year.)
Why do you keep repeating this?
DX8,9,10 are different pipelines.
DX RT has DX12 binding model and it's the first time GPGPU controls work creation, data expansion, work scheduling / distribution / termination, etc for fixed function units.
I repeat myself because i never know if people understand what i mean. I'm new here and don't know anyone or what they do, and i'm no hardware expert or enthusiast either.
What i want is this functionality exposed to compute shaders in Vulkan - i would change to DX12 as well to get it. But CUDA is no option, and OpenCL 2.0 neither.
RTX shows there is really fine grained work sheduling under the hood, but i need this to be available in compute.