Next gen lighting technologies - voxelised, traced, and everything else *spawn*

Many people still think RTX is too unmature / too slow to be worth it yet.

Many also think it's worth it, and fast considering. being able to run DXR in modern very great looking games, at 1440p or even higher, at frame-rates above AAA titles on consoles is quite acceptable. The only thing that is a real problem is price, so far. While that has gotten better, i still think prices are too high. No one would mind a 2080Ti at more reasonable prices, say whatever you want about RTX, but i personally think nvidia created a GPU with true rasterization performance boost over pascal, with the addition of RT hardware amongst other things.
Next year is going to be interesting with RTX 3000, AMD's navi2.0, and possibly Intel gpu's competing with each other.
 
Many also think it's worth it, and fast considering. being able to run DXR in modern very great looking games, at 1440p or even higher, at frame-rates above AAA titles on consoles is quite acceptable. The only thing that is a real problem is price, so far. While that has gotten better, i still think prices are too high. No one would mind a 2080Ti at more reasonable prices, say whatever you want about RTX, but i personally think nvidia created a GPU with true rasterization performance boost over pascal, with the addition of RT hardware amongst other things.
Next year is going to be interesting with RTX 3000, AMD's navi2.0, and possibly Intel gpu's competing with each other.

What he tries to explain there is some fundamental limitation in RTX with no LOD support and this is because it is not flexible enough. The intel proposal is to add a programmable step. If AMD raytracing is something like inside the AMD patent it will be more flexible for traversal.

What @JoeJ try to explain is because of the RTX weakness, it will be difficult to use traversal shader because it will not be available in one RT architecture. It means doing in one side raytracing for console next generation, probably AMD and Intel GPU and on another side a totally different solution for RTX and a last solution needs to be implemented at least for a few years for non-RT hardware GPU.

At least studios making exclusives next generation console game will not have this problem.
 
If AMD raytracing is something like inside the AMD patent it will be more flexible for traversal.
What exactly in AMD's patent tells you that such h/w would be more flexible for traversal calculations than what NV has?
What exactly stops NV from making such calculations in the same way it is proposed to be done in this AMD patent (on general FP SIMD processors which any RTX card has plenty of available)?
 
Is traversal shaders tied to 32 bit snorm format?

I am asking because according to an obscure discussion thread on Anandtech's forum, a member that is known to be a developer had some posts about the hardware RT in consoles, he said that consoles will support 32 bit snorm format and have a more programmable traversal stage, he claims current DXR hardware lack those, and so will be at a disadvantage moving forward, requiring new DXR hardware. His points are contested in the thread though, and he never explained the usefulness of the snorm format, so I don't know about the accuracy of his argument.

There are three problems with the actual implementation.
- It's not support the 32-bit snorm format, because the fixed function hardware is not designed around it. This is a huge limitation, and it can sacrifice the performance and the memory too much, and the supported 32-bit float format don't gives you really better results.
- The used acceleration structures are not public, so it could result extremely wild performance variations depending on the scenes. This needs to be solved.
- The ray traversal stage is extremely limited. It should be programmable.

He also claims that Xbox and PS5 will have different pipelines for RT.
The PS5 has a very different RT solution compared to what is implemented in DXR. Even the Xbox has an upgraded pipeline

https://forums.anandtech.com/threads/ray-tracing-is-in-all-next-gen-consoles.2571546/#post-39954331

He also says that consoles will do custom BVHs.

I can't say too much about this, but the next step will be the custom BVHs.
 
What exactly in AMD's patent tells you that such h/w would be more flexible for traversal calculations than what NV has?
What exactly stops NV from making such calculations in the same way it is proposed to be done in this AMD patent (on general FP SIMD processors which any RTX card has plenty of available)?

Nvidia themselves know RTX needs to improve with BVH construction and traversal.


from page 41 in this document

before being at Nvidia he was a researcher and he did this two papers

https://neil3d.github.io/assets/img/s2018/a169-viitanen.pdf

https://tutcris.tut.fi/portal/files/15830142/viitanen_1551.pdf

And he said in the Nvidia paper there is one architecture with BVH construction hardware Imgtec raytracing technology.;)

In AMD patent it seems RT intersection are fixed function but traversal is a shader pass.
 
Last edited:
Nvidia themselves know RTX needs to improve with BVH traversal.
P41 and on in the document you've linked talk about BVH building which is done on CPU and CUDA cores in RTX h/w at the moment. They are looking at their options of accelerating this with FF h/w in future RTX updates. Nothing to do with traversal unless I've misunderstood you here.

In AMD patent it seems RT intersection are fixed function but traversal is a shader pass.
DXR has intersection shaders which are "less efficient" than the FF h/w (for triangles) but they are available. You can use them to control ray traversal on current DXR h/w. Everything in DXR is a shader pass.

What AMD patent suggests is that instead of performing all the testing of a ray vs the BLAS inside the RT core (Turing) they opt to perform the testing of a ray against each BVH node separately, controlling the overall traversal with the general SIMD processor. Turing h/w can handle such approach too I think, it's just not needed for the majority of ray traversals.
 
Last edited:
P41 and on in the document you've linked talk about BVH building which is done on CPU and CUDA cores in RTX h/w at the moment. They are looking at their options of accelerating this with FF h/w in future RTX updates. Nothing to do with traversal unless I've misunderstood you here.


DXR has intersection shaders which are "less efficient" than the FF h/w (for triangles) but they are available. You can use them to control ray traversal on current DXR h/w. Everything in DXR is a shader pass.

been able to rebuild the BVH more often and have access to the way it is generated can help with LOD and animation too or corner case which give ray traversal problem.

What makes you think AMD's RT hardware in PS5 will be different to what's in AMD dGPU's?

I never said it is different if RTX is really not as flexible as AMD RT or Intel RT tech. It means exclusives games will not have to be coded for all RT architecture, only AMD technology.
 
been able to rebuild the BVH more often and have access to the way it is generated can help with LOD and animation too or corner case which give ray traversal problem.
Well, yeah, but to be able to do that you'd need a FF h/w BVH builder - the opposite of "flexibility" you're talking about. I'm also unsure why you think that AMD's approach at RT acceleration as described in their patent would be any different in this than NV's. This patent doesn't seem to mention BVH building at all, only ray traversal through the BVH structure.
 
about Dreams: i see it has change a lot, there are no tiny cubes but big ones to get SDF, and raster HW is utilized even for culling. Something like that... meaning i was wrong here.
 
I'm also unsure why you think that AMD's approach at RT acceleration as described in their patent would be any different in this than NV's.
The most interesting difference seems support for traversal shaders for LOD, becasue compute handles the outer traversal loop.
They also mention another optional FF unit to handle outer loop without a need to utilize compute. This would be surely faster, but traversal shader would not be possible.
Maybe it becomes standard across vendors to offer both.
 
The most interesting difference seems support for traversal shaders for LOD, becasue compute handles the outer traversal loop.
What's stopping current RTX h/w from handling BVH traversal in a similar fashion if such a need will present itself? It would be a bit of a waste of RT cores capabilities and would likely result in lower RT performance but I don't see why it would be impossible.
 
What exactly stops NV from making such calculations in the same way it is proposed to be done in this AMD patent (on general FP SIMD processors which any RTX card has plenty of available)?
The assumption is NV traversal and intersection is closed FF, with no option to have external control over traversal from compute. Ofc. we do not know.
I assume and hope they will offer it with Ampere.
Is traversal shaders tied to 32 bit snorm format?
Can it be he means low precision optimizations with 'snorm'? Somebody here in the forum said RTX is often criticized for using full precision causing too muchband width. Maybe it's related to that, but likely unrelated to traversal shaders.
DXR has intersection shaders which are "less efficient" than the FF h/w (for triangles) but they are available. You can use them to control ray traversal on current DXR h/w. Everything in DXR is a shader pass.
Does not work for LOD, because you need to switch between LODs based on distance, so during traversal.
To emulate it on current RTX, the only option i see is to trace at fixed intervals like raymarching, and randomly move the ray to another space representing another LOD. Problem is you need to use short intervals because you do not know where the intersection happens - not practical i guess. (EDIT: needs mor thinking - random lod transition should be independent from scene and intersections? Could work then... maybe even with practical perf. - not sure.)
In any case: Stochastic LOD will hurt caching even more, because rays from neighbouring pixels no longer traverse the same LOD.
On the long run, progressive meshes could be better, which is why i want access to BVH format, having the option for custom build and refit. Would help with any dynamic geometry, and having HW BVH builder might close this door.
 
Last edited:
The assumption is NV traversal and intersection is closed FF, with no option to have external control over traversal from compute.
Nv's RT core traces ray through the AS until it has a hit which is returned to the RT shader. AMD's patent describes a system in which the RT shader controls the traversal through the acceleration structure by submitting a ray for traversal through each BVH node it encounters (?). Both systems have "closed off FF" traversal h/w and differ only in how this traversal though the AS is handled - AMD controls the traversal with general SIMDs in WGPs, NV controls this traversal with RT core itself freeing up SIMDs to do other work while it happens. Thus far it seems unclear if AMD's approach would offer any kind of higher flexibility at controlling the traversal or is in fact a pure h/w complexity optimization (a "ray traversal core" without BVH selection logic can be simpler than NV's "RT core", at the cost of general compute performance).

Does not work for LOD, because you need to switch between LODs based on distance, so during traversal.
I'm not sure how AMD's described approach to controlling the BVH traversal would be better at this.

In either case, I'd expect whatever new RT h/w updates we're likely to get next year (meaning RDNA2 and Ampere) to be added into Win10 20H1 API updates already. And all of them seem to be fully compatible with Turing h/w so far.
 
AMD controls the traversal with general SIMDs in WGPs,
Which means a programmable shader can do this process, and the necessary flexibility is very likely possible.

To emulate it on current RTX, the only option i see is to trace at fixed intervals like raymarching, and randomly move the ray to another space representing another LOD. Problem is you need to use short intervals because you do not know where the intersection happens - not practical i guess. (EDIT: needs mor thinking - random lod transition should be independent from scene and intersections? Could work then... maybe even with practical perf. - not sure.)

Thought about it - i'm pretty sure now it works.
Imagine top down view on a terrain with LODs like clip maps approach.
To support LOD, cut the ray at a random points accross each LOD transition and transform the ray to the proper space.
If the ray has no hit, you can even compact the survivors, solving the caching issue.

Likely the 1.1 ability to launch rays from compute is helpful here, and so first gen RTX can do stochastic LOD, maybe just as good as full traversal shader would allow :D
 
Which means a programmable shader can do this process, and the necessary flexibility is very likely possible.
What process are we talking about? You're selecting the BVH nodes for traversal until you've ended up at bottom level node which contains triangles and can test them all against the ray you're tracing. At what point can you use this process for LOD selection?

Likely the 1.1 ability to launch rays from compute is helpful here, and so first gen RTX can do stochastic LOD, maybe just as good as full traversal shader would allow
Rays launched from another shader type would still need to be traced through the BVH in the same way as rays from RT shaders, no? So from h/w perspective this doesn't change anything.
 
What process are we talking about? You're selecting the BVH nodes for traversal until you've ended up at bottom level node which contains triangles and can test them all against the ray you're tracing. At what point can you use this process for LOD selection?
At each internal node, you could decide to switch LOD based on distance and random number.
Switching LOD would mean either move the ray to another space containing another LOD,
or better: Terminate traversal and store the ray, binned by LOD for later processing.
Then ramaining rays are already sorted by LOD, so no caching issues as i sayd before.
Additionally you could even reorder the rays to improve caching and traversal speeds.

So the additional cost might even turn into an optimization in any case. (Although, storing and reordering so many rays was exactly the point where i gave up my compute RT plans... very unsure it's worth it)
The alternative is to just accept worse memory access and continue with the ray in another LOD space until done.
(AMDs patent does not confrim all his would be possible, but it's very likely.)


However, as said above, all this should be possible without traversal shaders. Or did i miss something?
Correct me if i'm wrong, but it seems i have just destroyed my own largest argument against HW RT.
Shame on everybody who did not catch me earlier, hehe :)
 
Back
Top