Next gen lighting technologies - voxelised, traced, and everything else *spawn*

Not sure if this is noticed yet but RTX is doing a piss poor job on human skins sometimes, it looks dramatically worse than rasterization here.
d8fMHPY.jpg

Is this with latest updates installed? This bug for example could affect result negatively

 
I thought it was understood already that the RT cores in Turing are accelerating BVH traversal?
Correct.
you specify the DXR command and put in a shader and denoiser (?) as a parameter. Once the ray/triangle intersections are identified, the shader runs on those hit triangles. Which are all done on compute.

How the vendors choose to handle intersection is what will vary from one IHV to the next, but the intersection as we know it is done in the drivers.
 
We know raytracing can be done with compute and we don't really know how nVidia RT cores are working. There could be quite a bit of compute being used.

What kind of fixed-function hardware would benefit raytracing and could be added to existing shader core architectures relatively cheap in order to help (mainly compute-based) raytracing? Ray triangle intersection? AABB bounding box generation of trangle strips, sets, meshes? Support for hierarchical tree-like structures like BVH?
These questions have been asked in the Impact of Turing on Consoles thread. Not many answers but JoeJ reckons improvements in executing code on compute is all that'll be needed .
 
Correct.
you specify the DXR command and put in a shader and denoiser (?) as a parameter. Once the ray/triangle intersections are identified, the shader runs on those hit triangles. Which are all done on compute.

How the vendors choose to handle intersection is what will vary from one IHV to the next, but the intersection as we know it is done in the drivers.
But do we really know that everything is handled in hardware? Could mostly be an elaborate compute shader utilizing some fixed-function hardware for the heavy lifting. Same for BVH generation.
 
But do we really know that everything is handled in hardware? Could mostly be an elaborate compute shader utilizing some fixed-function hardware for the heavy lifting. Same for BVH generation.
I'll try making this clearer.

DX12 -> DXR
When the code path runs and sees that your hardware supports DXR which is a driver flag, it runs the DXR code.
The API handles all your ray needs for the most part. What it does is make a call to your driver and the driver handles the intersections between your ray command and the data structure holding all your data.

It can be done on compute, and drivers can do this portion via compute, the developers are required to add a pointer to the memory location in which this will be held. I don't think that developers can write their own intersection handler and overwrite what the driver is doing.

That being said, in this way it's up to the IHV to determine which hardware they want to support and how to support it. It's entirely possible to do the intersection handling through their current compute pipeline, with, perhaps some modifications, that would enable them to speed up an acceleration structure.

Nvidia does this through their RT cores, and it accelerates the build up, take down, and modification of BVH from what we understand. And it does it fairly precisely as well.
 
I thought it was understood already that the RT cores in Turing are accelerating BVH traversal?
No, there's no BVH hardware in Volta. RT is entirely done in compute. But Volta has advanced compute options, which probably means fine grained work sheduling directly from compute without a need to rely CPU commands.
For RT you could use this to batch similar rays, batch rayhits to the same material, build BVH faster, etc. This is what we want for compute anyways, calling it 'dynamic dispatch', 'device side enqueue', etc.
Mantle exposed support via conditional command buffer execution, so even oldest GCN can do this to some degree already.
Unfortunately i have no source for the exact Volta options (It's not exposed to game APIs, so i did not bother...)
 
Nvidia does this through their RT cores, and it accelerates the build up, take down, and modification of BVH from what we understand. And it does it fairly precisely as well.
It has been never mentioned RT cores help to build the BVH, so likely this is done with compute. I heard only the mentioning of BVH traversal and triangle intersection, nothing else.
It can be done on compute, and drivers can do this portion via compute,
Yes, but this restricts RT to the API. If RT runs on compute, we likely want to implement it ourselves most efficiently and the API does not help at all here. This is the main reason why we can not draw final conclusions from Volta vs. Turing based on BFV or something like that. (just to mention)
 
It has been never mentioned RT cores help to build the BVH, so likely this is done with compute. I heard only the mentioning of BVH traversal and triangle intersection, nothing else.

Yes, but this restricts RT to the API. If RT runs on compute, we likely want to implement it ourselves most efficiently and the API does not help at all here. This is the main reason why we can not draw final conclusions from Volta vs. Turing based on BFV or something like that. (just to mention)
you might be right, it's unsure how the BVH is built up or torn down, perhaps CUDA could have better access to the structure over say, DXR, which won't let you access it.

It does restrict RT to the API. You're trading off pure optimization for a level of abstraction to deploy your code on multiple IHVs without the headache improving adoption and scaling the platform to a variety of programmers and not just the few.
 
It does restrict RT to the API. You're trading off pure optimization for a level of abstraction to deploy your code on multiple IHVs without the headache improving adoption and scaling the platform to a variety of programmers and not just the few.
Yeah, and likely that's where we are heading. I just don't like it. One more argument is the effort for a RT implementation. Of course it's much easier and faster to just use DXR.
I see a chance at least one next gen console might not aid devs and so they have to do it themselves, and they might be happy about it. But even then, and assuming the same GPUs arrive for PCs, PC will adapt DXR i guess.
But i'm not sure. If we can make RT games that work for everyone it's attractive for PC as well, considering how many years it will take until RT hardware can be assumed. It could not be any more difficult from the business perspective as well.
 
... which we can not do, because work generation is not exposed anywhere yet to game APIs, of course.
See another failed request of mine: https://community.amd.com/thread/236715 Zero response. :( I'll try another time again... I seriously want vendor APIs just for those reasons.
its on xbox i think
DXR fallback layer is apparently running on Radeon VII, results are not stellar though, on one demo the Radeon achieved 10fps, while the 2080Ti achieved 320fps (yes 300fps)! Of course this could just be a token support from AMD with no substantial optimizations.

uhh does that say 10fps vs 300 fps?
 
uhh does that say 10fps vs 300 fps?
I wouldn't read anything into it. I doubt there's any optimizations whatsoever and the fallback path isn't even a thing anymore. Not that it wouldn't be considerably slower, just that it's not a very good comparison and what's possible with AMD.
 
I wouldn't read anything into it. I doubt there's any optimizations whatsoever and the fallback path isn't even a thing anymore. Not that it wouldn't be considerably slower, just that it's not a very good comparison and what's possible with AMD.
that's true. It's only reasonable to compare when AMD says, 'hey, this is our RT card', then the comparisons make sense.

But yes, as of this moment, the performance of emulation is low on amd.
 
DXR fallback layer is apparently running on Radeon VII, results are not stellar though, on one demo the Radeon achieved 10fps, while the 2080Ti achieved 320fps (yes 300fps)! Of course this could just be a token support from AMD with no substantial optimizations.

The DXR fallback layer has been depreciated by Microsoft 4 months ago. So those benchs aren't even worth the bandwidth they are consuming on the net.
 
A comparison between a 2080Ti vs Titan V vs 1080Ti in some OptiX workloads. The 2080Ti is 3 to 6 times faster than TitanV depending on the workload, and much more faster than that compared to 1080Ti.

Because Nvidia is going out its way to make sure that RTX on non Turning GPUs is borked (contrary to what their recommendations are..RTX being supposedly the "optimal" path on GPUs arch in Optix6)

 
A comparison between a 2080Ti vs Titan V vs 1080Ti in some OptiX workloads. The 2080Ti is 3 to 6 times faster than TitanV depending on the workload, and much more faster than that compared to 1080Ti.
Why are results from the same people far lower than the OptiX 5 results?

Previous Benchmark, OptiX 5.
Titan V gets 108 M samples/s.
1080Ti gets 55.

fermat_bathroom.png



Latest benchmark, OptiX 6.
Titan V gets 67 M samples/s.
1080Ti gets 27.

fermat_bathroom.png


The only difference I can see is 'custom settings : 512 pass' whatever that means.
 
Back
Top