Next gen lighting technologies - voxelised, traced, and everything else *spawn*

We don't get what we need, that is first and foremost solutions for unsolved problems in primary casts (area integrals). It's bad to propose an algorithm to end-users which is so processing-hungry that it devours the money-pockets of generations of consumers to come. Maybe you heard of Jevon's paradox. The technological advance in this case isn't used to allow better algorithms run faster (or become practical), it's used to run more of the inferiour algorithms. Fixed function stuff has the ability to be much higher speed and much more energy efficient, please Nvidia, just give us some generalized FF compute stuff for notorious real-world problems. Look at the math and the code, generalize it, pick the winner, and ship it. Raytracing ain't that IMHO.

Exactly. What i criticize about current RT HW and API is mainly the black boxed BVH traversal. I would prefer an programmable approach, so this can be used for other tasks than RT, and adapted to other rendering algorithms than RT as well. Examples would be broadphase collision detection or point based GI methods. Have fun implementing your own BVH in compute even on GTX 20X0 if you want to do something like that. Result: Two BVH implementations, in software and hardware running on one chip. It just makes no sense.

As it is now, it's primarily to the advantage of NV, not to ours. It opens up new possibilities, yes. But it locks a lot of others by reducing chip area for more flexible GPGPU.

Personally i hope AMD and Intel won't follow and implement RT on top of compute. On the long run this would result and better hardware and software.
 
I had floated the idea of a more generic memory traversal processor in one of these discussions. I've no idea how one would be architected, if possible, but that'd open up a far wider range of space representations including cone tracing.
 
I only compare RT vs. what we have in rasterizer space currently. Proposing raytracing as a "superior" alternative to existing practical or impractical algorithms which are analytic and/or integral based, is poor. Calling raytracing hardware innovative for a sector where you can't smash the problem with your own scales is poor.

If you come forward and say that you solved the geometry submission problem (overdraw, z-buffer, etc.) by using a BVH, that's honest and fine. If you say, hey now you can have irregular z-buffers, that's fine. And then also saying, well, we're at 20% (whatever exact number, because economy of silicon, concurrency and non-coherence) the speed of what a rasterizer can do for the screen - and you're free to ask for any other rays which are not raster-based - that's very nice. But because it's 20% speed it's kinda academic in the moment. They put a BVH in there and it's not going to be faster anytime soon (they already pulled the trump card, their highly optimized Optix implementation), and you can't make the chip have 5x more silicon easily. Non-coherent memory access is the specter that haunts memory hierarchies, pointer-chasing of all things, and then GDDR[6] which is optimized for long burst of continous i/o, with horrible latency.

Where is the scaling suppose to come from? Just to get to parity with rasterization. I'll immediately drop my sceptisism if I see how to find a way around some really hard to beat universal problems in computing.

We don't get what we need, that is first and foremost solutions for unsolved problems in primary casts (area integrals). It's bad to propose an algorithm to end-users which is so processing-hungry that it devours the money-pockets of generations of consumers to come. Maybe you heard of Jevon's paradox. The technological advance in this case isn't used to allow better algorithms run faster (or become practical), it's used to run more of the inferiour algorithms. Fixed function stuff has the ability to be much higher speed and much more energy efficient, please Nvidia, just give us some generalized FF compute stuff for notorious real-world problems. Look at the math and the code, generalize it, pick the winner, and ship it. Raytracing ain't that IMHO.
Even in limited form it already achieves much better results than rasterization alone.

I'll take blurry but accurate reflections over the mess that is cubemaps+SSR. Then again BFV's RT reflections look pretty good.

Also, this isn't just naive RT. Algorithms tailored specifically for real time will be different than offline ones.

RT is here to stay, I suggest you just accept it.
 
I only compare RT vs. what we have in rasterizer space currently. Proposing raytracing as a "superior" alternative to existing practical or impractical algorithms which are analytic and/or integral based, is poor. Calling raytracing hardware innovative for a sector where you can't smash the problem with your own scales is poor.

If you come forward and say that you solved the geometry submission problem (overdraw, z-buffer, etc.) by using a BVH, that's honest and fine. If you say, hey now you can have irregular z-buffers, that's fine. And then also saying, well, we're at 20% (whatever exact number, because economy of silicon, concurrency and non-coherence) the speed of what a rasterizer can do for the screen - and you're free to ask for any other rays which are not raster-based - that's very nice. But because it's 20% speed it's kinda academic in the moment. They put a BVH in there and it's not going to be faster anytime soon (they already pulled the trump card, their highly optimized Optix implementation), and you can't make the chip have 5x more silicon easily. Non-coherent memory access is the specter that haunts memory hierarchies, pointer-chasing of all things, and then GDDR[6] which is optimized for long burst of continous i/o, with horrible latency.

Where is the scaling suppose to come from? Just to get to parity with rasterization. I'll immediately drop my sceptisism if I see how to find a way around some really hard to beat universal problems in computing.

We don't get what we need, that is first and foremost solutions for unsolved problems in primary casts (area integrals). It's bad to propose an algorithm to end-users which is so processing-hungry that it devours the money-pockets of generations of consumers to come. Maybe you heard of Jevon's paradox. The technological advance in this case isn't used to allow better algorithms run faster (or become practical), it's used to run more of the inferiour algorithms. Fixed function stuff has the ability to be much higher speed and much more energy efficient, please Nvidia, just give us some generalized FF compute stuff for notorious real-world problems. Look at the math and the code, generalize it, pick the winner, and ship it. Raytracing ain't that IMHO.

Exactly. What i criticize about current RT HW and API is mainly the black boxed BVH traversal. I would prefer an programmable approach, so this can be used for other tasks than RT, and adapted to other rendering algorithms than RT as well. Examples would be broadphase collision detection or point based GI methods. Have fun implementing your own BVH in compute even on GTX 20X0 if you want to do something like that. Result: Two BVH implementations, in software and hardware running on one chip. It just makes no sense.

As it is now, it's primarily to the advantage of NV, not to ours. It opens up new possibilities, yes. But it locks a lot of others by reducing chip area for more flexible GPGPU.

Personally i hope AMD and Intel won't follow and implement RT on top of compute. On the long run this would result and better hardware and software.

Evolution does not work like this. There is no solution which would solve everything with the first try (btw. the final gaming resolution is not the same for everyone). Intermediate steps are needed. Even the rasterizer still gets new techniques like tile caching or mesh shading. It's a constant exchange between software and hardware developers. In the one hand with the way the game developers use the features they influence the hardware. On the other hand the existing hardware features make new things possible for the software developers. No one can do without the other but in the end only the hardware manufacturer can make this process starting. DXR didn't came from nowhere. Hardware and software developers have known about it for a long time and this had an influence on api design etc.

Every technique is an intermediate step because everything is in motion. See the rasterizer examples from above. The usefulness of a technique is decided by the developers and over time we will see how they use these things. But without having something they can't improve it either.
 
Last edited:
Evolution does not work like this. There is no solution which would solve everything with the first try (btw. the final gaming resolution is not the same for everyone). Intermediate steps are needed. Even the rasterizer still gets new techniques like tile caching or mesh shading. It's a constant exchange between software and hardware developers. In the one hand with the way the game developers use the features they influence the hardware. On the other hand the existing hardware features make new things possible for the software developers. No one can do without the other but in the end only the hardware manufacturer can make this process starting. DXR didn't came from nowhere. Hardware and software developers have known about it for a long time and this had an influence on api design etc.

Yeah, and the world is all good and everything is beautiful and shiny. :)

AFAIK evolution happens by interaction and selection between all individual species.
Here we have just a single company which is dictating the future of graphics development. (I don't blame them for being successful)
Did they ask you, or game developers, or other vendors about your / their visions about RT? I doupt it. They say they have a decade of experience with RT so they know enough, that's all. And with their current lead in game benchmarks they can afford to spend some chip area for the higher goal to secure the lead, or better to win the cloud gaming market which will supersede consoles.
That's not evolution, that's capitalism. But i neither want to discuss biology nor politics here.

What i agree with is, in the best case, after other vendors implement RT too, it will open up with the years and become programmable. We have seen this already with Fixed Function -> Vertex and Pixel shaders, and finally something really useful: GPGPU. (Thanks for that at least, NV! ;)
But how many years did it take? How many years until you have seen bump mapping again after the software renderer in the game Outcast? Same question: Did early GPUs accelerate or deaccelerate innovation and progress?
The price for hardware acceleration is high, if it is so restricted and tuned to just one application.
But to see this you have to go beyond just implementing the stuff NV has shown you at GDC. You have to progress beyond known or classic algorithms as well. Only then you see what's missing.

To be clear: I don't have anything against raytracing or hardware acceleration. It's welcome. I only dislike the extremely restricted implementation.
Also it makes me angry that rays can launch rays, but compute shaders still can't launch compute shaders - although other APIs support this for years. I hope this will change soon now at least.
 
Can you provide concrete examples of how DXR is an "extremely restricted implementation" of ray tracing?
 
Can you provide concrete examples of how DXR is an "extremely restricted implementation" of ray tracing?

I've already mentioned the inability to use the BVH for other purposes like searching broadphase pairs of potential colliders (physics acceleration),
or Point based GI (notice that here tree and geometry is the same thing - very efficient).
Trees have other applications as well, so it would make sense to offer programmable data structure and traversal.

Another point is the custom intersection shader: Basically this is just a fallback to make alternate geometry at least possible (voxels, sdf, point hierarchies). But there is zero acceleration.
Instead your implementations es even more restricted than a compute shader would be for the same purpose. AFAIK intersection shaders have no inter thread communication or access to LDS, so they are as dump as pixel shaders.
Which means, you probably have to implement your own mini BVH, but you can't do it efficient because there is no way to share work between threads, which is key to parallel raytracing.

This point applies to the whole implementation to some degree, because ray batching is left entirely to the vendors implementation. Even if your situation would allow better batching than their generic approach (which is very likely), you have no control.

Notice that RTX only offers to use raytracing, but it does not allow your custom implementation. So there is no reason to think of improvements at all - they are just not possible. There is no feedback from devolper to hardware vendor otr at least no contribution. Any progress in the field of raytracing becomes exclusively possible to the hardware vendor.
 
Another point is the custom intersection shader: Basically this is just a fallback to make alternate geometry at least possible (voxels, sdf, point hierarchies). But there is zero acceleration.
Instead your implementations es even more restricted than a compute shader would be for the same purpose. AFAIK intersection shaders have no inter thread communication or access to LDS, so they are as dump as pixel shaders.
Which means, you probably have to implement your own mini BVH, but you can't do it efficient because there is no way to share work between threads, which is key to parallel raytracing.

Can you provide proof of these statements? My understanding is you can use ray-box testing as a (quick) first pass and then follow up with custom intersections. The idea is to reject rays as quickly as possible before doing more expensive intersection calculations. It's not clear to me that this technique would be "slower" than using just compute shaders (and even if that was the case...you can just use compute shaders and use DXR where it makes sense! That's the best selling point of DXR, it integrates really well with "vanilla DX12"). I'll be honest, I don't really understand what you're saying (I don't mean that negatively, perhaps the issue is me :)).
 
RT is here to stay, I suggest you just accept it.

RT is there since 40 years, and it's complexity has been studied very well.
I totally accept RT as practical brute-forcing. And like many I don't accept it as the favorite algorithm to bake it into realtime silicon in the presented way.

The usefulness of a technique is decided by the developers ...

This is not right. The usefulness is not willed into existance. It's proven by it's performance. In the realtime domain you pick a point along the cost-benefit line, also called Pareto-frontier, so you get the best result for the least cost. It's the logic why screen-space methods emerged and prevailed.
 
Last edited:
I've already mentioned the inability to use the BVH for other purposes like searching broadphase pairs of potential colliders (physics acceleration),
or Point based GI (notice that here tree and geometry is the same thing - very efficient).
Trees have other applications as well, so it would make sense to offer programmable data structure and traversal.

Another point is the custom intersection shader: Basically this is just a fallback to make alternate geometry at least possible (voxels, sdf, point hierarchies). But there is zero acceleration.
Instead your implementations es even more restricted than a compute shader would be for the same purpose. AFAIK intersection shaders have no inter thread communication or access to LDS, so they are as dump as pixel shaders.
Which means, you probably have to implement your own mini BVH, but you can't do it efficient because there is no way to share work between threads, which is key to parallel raytracing.

This point applies to the whole implementation to some degree, because ray batching is left entirely to the vendors implementation. Even if your situation would allow better batching than their generic approach (which is very likely), you have no control.

Notice that RTX only offers to use raytracing, but it does not allow your custom implementation. So there is no reason to think of improvements at all - they are just not possible. There is no feedback from devolper to hardware vendor otr at least no contribution. Any progress in the field of raytracing becomes exclusively possible to the hardware vendor.
The Turing whitepaper says the hardware acceleration can be used for collision detection and audio reverb simulation. Also, PICA PICA already supports point based diffuse GI.

Also it makes sense that RTX only supports triangle acceleration. Besides hardware cost, 99.99% of games use triangles for rendering which makes DXR relatively easy for developers to adopt. If it was voxels or something else that would require extra developer time and testing.

RT is there since 40 years, and it's complexity has been studied very well.
I totally accept RT as practical brute-forcing. And like many I don't accept it as the favorite algorithm to bake it into realtime silicon in the presented way.

This is not right. The usefulness is not willed into existance. It's proven by it's performance. In the realtime domain you pick a point along the cost-benefit line, also called Pareto-frontier, so you get the best result for the least cost. It's the logic why screen-space methods emerged and prevailed.
RTRT isn't brute force. Hybrid rendering, AI importance sampling and denoising already show this. While quality is king in offline rendering, performance is the priority in realtime graphics. Devs can afford to cut corners here and there which allows for algorithms that would be unacceptable in the offline world due to artifacts, something gamers are already accustomed to.
 
Exactly. What i criticize about current RT HW and API is mainly the black boxed BVH traversal. I would prefer an programmable approach, so this can be used for other tasks than RT, and adapted to other rendering algorithms than RT as well. Examples would be broadphase collision detection or point based GI methods. Have fun implementing your own BVH in compute even on GTX 20X0 if you want to do something like that. Result: Two BVH implementations, in software and hardware running on one chip. It just makes no sense.

As it is now, it's primarily to the advantage of NV, not to ours. It opens up new possibilities, yes. But it locks a lot of others by reducing chip area for more flexible GPGPU.

Personally i hope AMD and Intel won't follow and implement RT on top of compute. On the long run this would result and better hardware and software.

I do. Because grand solutions don’t drop out the sky. Flexibility in terms of programmability comes at a cost in the form of more silicon and more ingenuity if you want performance. The faster all three get on board the faster they can all start iterating to and competing with better designs.

There would be no point in AMD and Intel sitting around trying to build the perfect Swiss Army knife if Nvidia is going to swoop in and bludgeon them both to death with a rock tied to a stick.

LOL.
 
Last edited:
Can you provide proof of these statements? My understanding is you can use ray-box testing as a (quick) first pass and then follow up with custom intersections. The idea is to reject rays as quickly as possible before doing more expensive intersection calculations. It's not clear to me that this technique would be "slower" than using just compute shaders (and even if that was the case...you can just use compute shaders and use DXR where it makes sense! That's the best selling point of DXR, it integrates really well with "vanilla DX12"). I'll be honest, I don't really understand what you're saying (I don't mean that negatively, perhaps the issue is me :)).

Yes for the box test. DXR still accelerates traversing the bounding boxes (which often takes 90% of the time to find intersections in practice, so they are the most expensive and quick only in relation to the work they save).
But as soon as you hit the box, there is no way to proceed with compute shaders, because the custom intersection shader is called from and returns to the raytracing pipeline, not the compute pipeline.
The intersection shader then runs one ray per thread in isolation. This rules out any kind of parallel algorithm within the custom box, and rendering the custom data with compute shaders to the frame buffer would exclude it from reflections or raytraced shadows.

Although without control over batching parallel algorithms make no sense anyways and this is a minor limitation in comparison to black boxed BVH data and traversal.
Efficient tree traversal is very difficult on GPU. Having it but not being able to use it really hurts.

'Integrating well with vanilla API' more likely means that DXR allows to reuse your material shaders for the raytracing almost without changes. This makes DXR very easy to integrate.
 
The Turing whitepaper says the hardware acceleration can be used for collision detection and audio reverb simulation. Also, PICA PICA already supports point based diffuse GI.

Also it makes sense that RTX only supports triangle acceleration. Besides hardware cost, 99.99% of games use triangles for rendering which makes DXR relatively easy for developers to adopt. If it was voxels or something else that would require extra developer time and testing.

Collision detection is two things: First build pairs of potential colliders (neighbour searching - broadphase - impossible with RTX), second find exact intersections where raytracing is involved (RTX can be utilized, but in practice only for particles i would say - no rigid bodies). For audio yes.

PBGI in Pica is very coarse and brute force. Nice but not what i talk about. I mean something like seen in Pirates of the Caribbean movie. I aim for path traced quality and will likely use RTX 'just' for refelctions and shadows which is fine.

For games it is worth to spend extra developer time to solve GI - for movies it's not. There you use path tracing which is slow but simple and can do it all. Its cheaper to spend your money on more hardware.
Actually mantining RTX / no RTX costs additional dev time for the next 5 years at least - this argument would not hold yet even if RTX would deliver photorealism right now. We won't see the great simplification path tracing did to offline rendering in games anytime soon.
 
There would be no point in AMD and Intel sitting around trying to build the perfect Swiss Army knife if Nvidia is going to swoop in and bludgeon them both to death with a rock tied to a stick.

They could also decide to boycott for some years until they are able to compete.
Also, i've read somewhere GTX 20X0 is in practice only 4 times faster with RT than the previous gen. AMD compute performance beats NV by a factor of two. So a software implementation on compute is all they can do anyways for the coming years and might improve better than expected.

We have some time to request for improvements. Better start early with that than being fanboys and accept everything they praise.
 
Collision detection is two things: First build pairs of potential colliders (neighbour searching - broadphase - impossible with RTX), second find exact intersections where raytracing is involved (RTX can be utilized, but in practice only for particles i would say - no rigid bodies). For audio yes.

PBGI in Pica is very coarse and brute force. Nice but not what i talk about. I mean something like seen in Pirates of the Caribbean movie. I aim for path traced quality and will likely use RTX 'just' for refelctions and shadows which is fine.

For games it is worth to spend extra developer time to solve GI - for movies it's not. There you use path tracing which is slow but simple and can do it all. Its cheaper to spend your money on more hardware.
Actually mantining RTX / no RTX costs additional dev time for the next 5 years at least - this argument would not hold yet even if RTX would deliver photorealism right now. We won't see the great simplification path tracing did to offline rendering in games anytime soon.
Doesn't have to be as good as movies. Denoised path tracing at 1spp already looks 10 times better than current game graphics.
 
According to whom?
Take a large compute project with dozens of different shaders, optimize for both vendors, test on multiple GPUs, write down milliseconds, sum up, divide, calculate perf per dollar.
My last testing dates back to GTX 1070 vs. Fury. So 20X0, Polaris, Vega excluded. Public benchmarks vary wildly, but for me the factor is quite consistent across all shaders.
Traditionally AMD has better compute performance and NV has better rasterization performance. The older the GPU, the larger the difference with compute. Early GCN beats Kepler by a hard to believe factor of five.

No better proof than that, but... is it really necessary to 'proof' any word? I could not even post a link yet. I'm used to talk to other developers at forums and the kind of doubt here is really new to me.
Again: My goal is to improve at least the API with criticism, not to start useless vendor flame wars.
 
Back
Top