Next gen lighting technologies - voxelised, traced, and everything else spawn

manux · Feb 22, 2019

ultragpu said:
Not sure if this is noticed yet but RTX is doing a piss poor job on human skins sometimes, it looks dramatically worse than rasterization here.

Is this with latest updates installed? This bug for example could affect result negatively

https://twitter.com/x/status/1098634488479911938

iroboto · Feb 22, 2019

Malo said:
I thought it was understood already that the RT cores in Turing are accelerating BVH traversal?

Correct.
you specify the DXR command and put in a shader and denoiser (?) as a parameter. Once the ray/triangle intersections are identified, the shader runs on those hit triangles. Which are all done on compute.

How the vendors choose to handle intersection is what will vary from one IHV to the next, but the intersection as we know it is done in the drivers.

Shifty Geezer · Feb 22, 2019

N00b said:
We know raytracing can be done with compute and we don't really know how nVidia RT cores are working. There could be quite a bit of compute being used.

What kind of fixed-function hardware would benefit raytracing and could be added to existing shader core architectures relatively cheap in order to help (mainly compute-based) raytracing? Ray triangle intersection? AABB bounding box generation of trangle strips, sets, meshes? Support for hierarchical tree-like structures like BVH?

These questions have been asked in the Impact of Turing on Consoles thread. Not many answers but JoeJ reckons improvements in executing code on compute is all that'll be needed .

N00b · Feb 22, 2019

iroboto said:
Correct.
you specify the DXR command and put in a shader and denoiser (?) as a parameter. Once the ray/triangle intersections are identified, the shader runs on those hit triangles. Which are all done on compute.

How the vendors choose to handle intersection is what will vary from one IHV to the next, but the intersection as we know it is done in the drivers.

But do we really know that everything is handled in hardware? Could mostly be an elaborate compute shader utilizing some fixed-function hardware for the heavy lifting. Same for BVH generation.

iroboto · Feb 22, 2019

N00b said:
But do we really know that everything is handled in hardware? Could mostly be an elaborate compute shader utilizing some fixed-function hardware for the heavy lifting. Same for BVH generation.

I'll try making this clearer.

DX12 -> DXR
When the code path runs and sees that your hardware supports DXR which is a driver flag, it runs the DXR code.
The API handles all your ray needs for the most part. What it does is make a call to your driver and the driver handles the intersections between your ray command and the data structure holding all your data.

It can be done on compute, and drivers can do this portion via compute, the developers are required to add a pointer to the memory location in which this will be held. I don't think that developers can write their own intersection handler and overwrite what the driver is doing.

That being said, in this way it's up to the IHV to determine which hardware they want to support and how to support it. It's entirely possible to do the intersection handling through their current compute pipeline, with, perhaps some modifications, that would enable them to speed up an acceleration structure.

Nvidia does this through their RT cores, and it accelerates the build up, take down, and modification of BVH from what we understand. And it does it fairly precisely as well.

JoeJ · Feb 22, 2019

Malo said:
I thought it was understood already that the RT cores in Turing are accelerating BVH traversal?

No, there's no BVH hardware in Volta. RT is entirely done in compute. But Volta has advanced compute options, which probably means fine grained work sheduling directly from compute without a need to rely CPU commands.
For RT you could use this to batch similar rays, batch rayhits to the same material, build BVH faster, etc. This is what we want for compute anyways, calling it 'dynamic dispatch', 'device side enqueue', etc.
Mantle exposed support via conditional command buffer execution, so even oldest GCN can do this to some degree already.
Unfortunately i have no source for the exact Volta options (It's not exposed to game APIs, so i did not bother...)

JoeJ · Feb 22, 2019

iroboto said:
Nvidia does this through their RT cores, and it accelerates the build up, take down, and modification of BVH from what we understand. And it does it fairly precisely as well.

It has been never mentioned RT cores help to build the BVH, so likely this is done with compute. I heard only the mentioning of BVH traversal and triangle intersection, nothing else.

iroboto said:
It can be done on compute, and drivers can do this portion via compute,

Yes, but this restricts RT to the API. If RT runs on compute, we likely want to implement it ourselves most efficiently and the API does not help at all here. This is the main reason why we can not draw final conclusions from Volta vs. Turing based on BFV or something like that. (just to mention)

JoeJ · Feb 22, 2019

JoeJ said:
we likely want to implement it ourselves most efficiently and the API does not help at all here

... which we can not do, because work generation is not exposed anywhere yet to game APIs, of course.
See another failed request of mine: https://community.amd.com/thread/236715 Zero response.

I'll try another time again... I seriously want vendor APIs just for those reasons.

iroboto · Feb 22, 2019

JoeJ said:
It has been never mentioned RT cores help to build the BVH, so likely this is done with compute. I heard only the mentioning of BVH traversal and triangle intersection, nothing else.

Yes, but this restricts RT to the API. If RT runs on compute, we likely want to implement it ourselves most efficiently and the API does not help at all here. This is the main reason why we can not draw final conclusions from Volta vs. Turing based on BFV or something like that. (just to mention)

you might be right, it's unsure how the BVH is built up or torn down, perhaps CUDA could have better access to the structure over say, DXR, which won't let you access it.

It does restrict RT to the API. You're trading off pure optimization for a level of abstraction to deploy your code on multiple IHVs without the headache improving adoption and scaling the platform to a variety of programmers and not just the few.

JoeJ · Feb 22, 2019

iroboto said:
It does restrict RT to the API. You're trading off pure optimization for a level of abstraction to deploy your code on multiple IHVs without the headache improving adoption and scaling the platform to a variety of programmers and not just the few.

Yeah, and likely that's where we are heading. I just don't like it. One more argument is the effort for a RT implementation. Of course it's much easier and faster to just use DXR.
I see a chance at least one next gen console might not aid devs and so they have to do it themselves, and they might be happy about it. But even then, and assuming the same GPUs arrive for PCs, PC will adapt DXR i guess.
But i'm not sure. If we can make RT games that work for everyone it's attractive for PC as well, considering how many years it will take until RT hardware can be assumed. It could not be any more difficult from the business perspective as well.

DavidGraham · Feb 22, 2019

DXR fallback layer is apparently running on Radeon VII, results are not stellar though, on one demo the Radeon achieved 10fps, while the 2080Ti achieved 320fps (yes 300fps)! Of course this could just be a token support from AMD with no substantial optimizations.

https://www.reddit.com/r/Amd/comments/at7i4d

iroboto · Feb 22, 2019

JoeJ said:
... which we can not do, because work generation is not exposed anywhere yet to game APIs, of course.
See another failed request of mine: https://community.amd.com/thread/236715 Zero response. I'll try another time again... I seriously want vendor APIs just for those reasons.

its on xbox i think

DavidGraham said:
DXR fallback layer is apparently running on Radeon VII, results are not stellar though, on one demo the Radeon achieved 10fps, while the 2080Ti achieved 320fps (yes 300fps)! Of course this could just be a token support from AMD with no substantial optimizations.

https://www.reddit.com/r/Amd/comments/at7i4d

uhh does that say 10fps vs 300 fps?

Malo · Feb 22, 2019

iroboto said:
uhh does that say 10fps vs 300 fps?

I wouldn't read anything into it. I doubt there's any optimizations whatsoever and the fallback path isn't even a thing anymore. Not that it wouldn't be considerably slower, just that it's not a very good comparison and what's possible with AMD.

iroboto · Feb 22, 2019

Malo said:
I wouldn't read anything into it. I doubt there's any optimizations whatsoever and the fallback path isn't even a thing anymore. Not that it wouldn't be considerably slower, just that it's not a very good comparison and what's possible with AMD.

that's true. It's only reasonable to compare when AMD says, 'hey, this is our RT card', then the comparisons make sense.

But yes, as of this moment, the performance of emulation is low on amd.

Ike Turner · Feb 22, 2019

DavidGraham said:
DXR fallback layer is apparently running on Radeon VII, results are not stellar though, on one demo the Radeon achieved 10fps, while the 2080Ti achieved 320fps (yes 300fps)! Of course this could just be a token support from AMD with no substantial optimizations.

https://www.reddit.com/r/Amd/comments/at7i4d

The DXR fallback layer has been depreciated by Microsoft 4 months ago. So those benchs aren't even worth the bandwidth they are consuming on the net.

SlmDnk · Feb 22, 2019

iroboto said:
uhh does that say 10fps vs 300 fps?

Yes.

If you remove the resize commands from the URL, you get the original resolution ones:

https://media.discordapp.net/attachments/484833655424942093/548228851323371531/200_Ti_raytracing.PNG
https://media.discordapp.net/attachments/484833655424942093/548228295062192128/RadeonVII_DXR_2.PNG
https://media.discordapp.net/attachments/484833655424942093/548228309130018817/RadeonVII_DXR.PNG

Scott_Arm · Feb 22, 2019

Something to think about from EA Seed https://www.ea.com/seed/news/texture-level-of-detail-strategies-for-real-time-ray-tracing

DavidGraham · Feb 24, 2019

A comparison between a 2080Ti vs Titan V vs 1080Ti in some OptiX workloads. The 2080Ti is 3 to 6 times faster than TitanV depending on the workload, and much more faster than that compared to 1080Ti.

Ike Turner · Feb 24, 2019

DavidGraham said:
A comparison between a 2080Ti vs Titan V vs 1080Ti in some OptiX workloads. The 2080Ti is 3 to 6 times faster than TitanV depending on the workload, and much more faster than that compared to 1080Ti.

Because Nvidia is going out its way to make sure that RTX on non Turning GPUs is borked (contrary to what their recommendations are..RTX being supposedly the "optimal" path on GPUs arch in Optix6)

https://twitter.com/x/status/1095384388949475328

Shifty Geezer · Feb 24, 2019

DavidGraham said:
A comparison between a 2080Ti vs Titan V vs 1080Ti in some OptiX workloads. The 2080Ti is 3 to 6 times faster than TitanV depending on the workload, and much more faster than that compared to 1080Ti.

Why are results from the same people far lower than the OptiX 5 results?

Previous Benchmark, OptiX 5.
Titan V gets 108 M samples/s.
1080Ti gets 55.

Latest benchmark, OptiX 6.
Titan V gets 67 M samples/s.
1080Ti gets 27.

The only difference I can see is 'custom settings : 512 pass' whatever that means.

Next gen lighting technologies - voxelised, traced, and everything else spawn

manux

iroboto

Daft Funk

Shifty Geezer

uber-Troll!

N00b

iroboto

Daft Funk

JoeJ

JoeJ

JoeJ

iroboto

Daft Funk

JoeJ

DavidGraham

iroboto

Daft Funk

Malo

Yak Mechanicum

iroboto

Daft Funk

Ike Turner

SlmDnk

Scott_Arm

DavidGraham

Ike Turner

Shifty Geezer

uber-Troll!

Similar threads

Next gen lighting technologies - voxelised, traced, and everything else *spawn*

Daft Funk

uber-Troll!

Daft Funk

Daft Funk

Daft Funk

Yak Mechanicum

Daft Funk

uber-Troll!

Similar threads

Next gen lighting technologies - voxelised, traced, and everything else spawn