Nvidia Turing Speculation thread [2018]

Status
Not open for further replies.
According to OctaneRender, Turing is providing them 8x RT performance compared to Pascal on the same workload. That's comparing equivalent Quadro cards.

Didn't they switch from their Cuda backend to the Nvidia Optix backend ?
The latter including 'hacks' like the AI denoiser.
 
If current performance numbers are correct, raytracing isn't really fast on Turing... how could it be on Pascal without dedicated RT cores?

For hybrid rendering on Turing the memory system does take a beating from having to serve both RT and CUDA cores. So raytracing may appear slower than it is in that scenario


According to OctaneRender, Turing is providing them 8x RT performance compared to Pascal on the same workload. That's comparing equivalent Quadro cards.

Octane offers two rendering modes, local and cloud. I'm told by collegues that cloud rendering is significantly slower. Should be interesting to know for which of these modes the 8x number applies
 
I wonder if Nvidia will come up with a decently fast / optimized DXR driver for Pascal, or if instead the DXR fallback (known to be not fast, more like a reference implementation) will be used.
Also wondering if the raytracing comparisons Turing vs Pascal shown were based on the DXR fallback.
DXR driver support is apparently up to the GPU vendor I think. So, if current cards never get it, it would either because of a lack resource (AMD maybe not having enough resources to dedicate to adding this feature vs the potential gains that would come from it) or marketing reasons (Nvidia wanting to sell RTX boards).
So far, we have yet to see any performance number of a non Turing GPUs doing RT with DXR enabled drivers (besides Volta, which doesn't have RT Cores but does have experimental DXR drivers publicly available since April IIRC). We don't know yet if the Pascal numbers quoted are using the Fallback layer or DXR enabled drivers.

http://forums.directxtech.com/index.php?topic=5892.msg29731#msg29731
You will have to wait for any announcements from GPU vendors as they come through or directly ask them the question about their future DXR support plans. We are working with GPU vendors on enabling more and more developers to take advantage of DXR via broader HW support, but we're not at liberty discussing their future plans.
 
Didn't they switch from their Cuda backend to the Nvidia Optix backend ?
The latter including 'hacks' like the AI denoiser.
Octane offers two rendering modes, local and cloud. I'm told by collegues that cloud rendering is significantly slower. Should be interesting to know for which of these modes the 8x number applie
No idea, was pretty much a marketing statement and you know how those are with details and accuracy.
 
DXR driver support is apparently up to the GPU vendor I think. So, if current cards never get it, it would either because of a lack resource ... or marketing reasons (Nvidia wanting to sell RTX boards).
... or because the performance would be too low for it to be worth doing.

Isn’t that the most straightforward reason?

Nvidia has had OptiX for years. And it was never good enough for anything close to real time. If they had been able to make it run fast enough, they would have done it.
 
... or because the performance would be too low for it to be worth doing.

Isn’t that the most straightforward reason?

Nvidia has had OptiX for years. And it was never good enough for anything close to real time. If they had been able to make it run fast enough, they would have done it.
One would assume that having a real driver path would in nearly most cases be faster than and fallback emulation path. Nvidia already did it for Volta which doesn't have "RT Cores" . It will be more of a resources considaration (money, time, coding effort) IMO.
 
One would assume that having a real driver path would in nearly most cases be faster than and fallback emulation path.
Ray tracing is sufficiently slow for driver overhead to be a small and irrelevant part of the whole equation.

To make real time ray tracing possible, a large enough integer factor speedup was needed, not an integer percentage factor.
 
Ray tracing is sufficiently slow for driver overhead to be a small and irrelevant part of the whole equation.

To make real time ray tracing possible, a large enough integer factor speedup was needed, not an integer percentage factor.

For raytracing all the heavy lifting like intersecting scenes with rays using BVHs, BVH rebuild, ... is done 'by the driver', some of that assisted by RT hardware if present. If all of that is done before the driver with generic compute shaders like with the DXR fallback, you cannot expect the same performance that would be possible compared to GPU specific low level optimized RT code (like using PTX) running behind the driver making best possible use of compute resources even without RT hardware like for Pascal or current AMD.
 
For raytracing all the heavy lifting like intersecting scenes with rays using BVHs, BVH rebuild, ... is done 'by the driver', some of that assisted by RT hardware if present. If all of that is done before the driver with generic compute shaders like with the DXR fallback, you cannot expect the same performance that would be possible compared to GPU specific low level optimized RT code (like using PTX) running behind the driver making best possible use of compute resources even without RT hardware like for Pascal or current AMD.

Isn’t that exactly what libraries like Optix have been doing for a long time? The point is that no amount of optimization of a general compute implementation is going to approach the speed of dedicated hardware. If that was the case these raytracing “gimmicks” would’ve been possible years ago on Maxwell and Pascal cards.

The current BFV implementation is denoising on the shader cores not the tensors so all the acceleration is due to RT hardware.
 
Isn’t that exactly what libraries like Optix have been doing for a long time? The point is that no amount of optimization of a general compute implementation is going to approach the speed of dedicated hardware.
Sure dedicated RT hardware will always be faster as GPU code. The point was however optimized RT code in the driver could be a lot faster as the DXR fallback. It's up to the GPU vendors to provide such drivers, so it will be interesting to see what will happen.

If that was the case these raytracing “gimmicks” would’ve been possible years ago on Maxwell and Pascal cards.

There have already been for years realtime game oriented raytracing engines, see for example video of 2014, Brigade. With proper denoising this would have been quite good. (no need of AI for good denoising as BFV seems to prove)
 
For raytracing all the heavy lifting like intersecting scenes with rays using BVHs, BVH rebuild, ... is done 'by the driver', some of that assisted by RT hardware if present. If all of that is done before the driver with generic compute shaders like with the DXR fallback, you cannot expect the same performance that would be possible compared to GPU specific low level optimized RT code (like using PTX) running behind the driver making best possible use of compute resources even without RT hardware like for Pascal or current AMD.
Are you arguing that an Nvidia non-RT core driver solution will be slower than OptiX? While Ike is arguing that OptiX will be slower than an Nvidia non-RT core driver solution? And I’m arguing that they’d both be similar in performance?

Or am I just very confused?
 
Are you arguing that an Nvidia non-RT core driver solution will be slower than OptiX? While Ike is arguing that OptiX will be slower than an Nvidia non-RT core driver solution? And I’m arguing that they’d both be similar in performance?

Or am I just very confused?

I also have no clue what these two are arguing about...
Fig3_NVIDIA_raytracing_hierarchy-625x338.png

Optix / Vulkan / DirectX all sit at the same point
DirectX and Vulkan are nothing more than Hardware Agnostic platforms to extend down into the GPU.
Optix meanwhile is native and Hardware Specific to Nvidia/CUDA. Optix has the same if not better performance than DirectX in that it doesn't have to traverse any Hardware Abstraction layers. If the diagram was truly accurate, there would be a green section at the bottom of Vulkan and DXR denoting Nvidia's driver. In the case of any other hardware company, a section denoting their proprietary driver providing an interface to DXR/Vulkan.

Ray tracing is possible right now on Pascal with all of the fancy features you saw demo'd.
The Demos Jensen ran on stage run on Pascal right now via Optix 5.1. It's just slower than Turing because there is no dedicated hardware acceleration (ray trace cores and tensor cores)
As far as I understand it, the tensor cores are used for AI accelerated Denoising or DLSS.
The ray trace cores do the ray intersection tests
and the BVH generation/traversal/etc are done in CUDA cores/other areas and is mapped to the SM's through an improved and shared caching mechanism between the rasterizer pipeline.

I have no clue what is being spoken about w.r.t to 'drivers'. The driver for ray tracing is what everyone is already currently using in current gen hardware. DirectX nor Vulkan are needed for this. Each company has their own proprietary "driver" and API/SDK. All DirectX and Vulkan do is provide a higher level API that interfaces to this so that developers don't have to worry about hardware specific implementations. I'd expect Vulkan/DirectX to be slower than Optix or any other company's native software. What Microsoft means by 'fallback' path is probably some janky generic 'OpenCL' like implementation that can run on all cards w/o any optimizations.

It's important to distinguish between hardware/drivers and APIs.
DirectX is not a driver. It's an API :

https://en.wikipedia.org/wiki/DirectX :
Microsoft DirectX is a collection of application programming interfaces (APIs) for handling tasks related to multimedia, especially game programming and video, on Microsoft platforms.

Nothing of value is lost w/o it. The hardware has to already be capable and a driver from the manufacturer along w/ an API/SDK made available for DirectX to provide hooks into.
DirectX 12 (DXR) doesn't enable ray tracing for Nvidia. It has existed for years via Optix. All directX 12 (DXR) provides is a high level easy to use hardware agnostic API for developers.

Cut Vulkan/DirectX12 out of the picture, and you'd still have the same real-time ray tracing functionality on Turing via Optix. Because such a horrid job is done at detailing 'real-time' ray tracing at the conferences, here's a simplified walk through on how it all works w/o any marketing nonsense:
https://developer.apple.com/videos/play/wwdc2018/606/
Yes, that uber demo Jensen showed in the box room of 'real-time' ray tracing and denoising can run on an Ipad.
 
Last edited:
I also have no clue what these two are arguing about...
Fig3_NVIDIA_raytracing_hierarchy-625x338.png

Optix / Vulkan / DirectX all sit at the same point
DirectX and Vulkan are nothing more than Hardware Agnostic platforms to extend down into the GPU.
Optix meanwhile is native and Hardware Specific to Nvidia/CUDA. Optix has the same if not better performance than DirectX in that it doesn't have to traverse any Hardware Abstraction layers. If the diagram was truly accurate, there would be a green section at the bottom of Vulkan and DXR denoting Nvidia's driver. In the case of any other hardware company, a section denoting their proprietary driver providing an interface to DXR/Vulkan.

Ray tracing is possible right now on Pascal with all of the fancy features you saw demo'd.
The Demos Jensen ran on stage run on Pascal right now via Optix 5.1. It's just slower than Turing because there is no dedicated hardware acceleration (ray trace cores and tensor cores)
As far as I understand it, the tensor cores are used for AI accelerated Denoising or DLSS.
The ray trace cores do the ray intersection tests
and the BVH generation/traversal/etc are done in CUDA cores/other areas and is mapped to the SM's through an improved and shared caching mechanism between the rasterizer pipeline.

I have no clue what is being spoken about w.r.t to 'drivers'. The driver for ray tracing is what everyone is already currently using in current gen hardware. DirectX nor Vulkan are needed for this. Each company has their own proprietary "driver" and API/SDK. All DirectX and Vulkan do is provide a higher level API that interfaces to this so that developers don't have to worry about hardware specific implementations. I'd expect Vulkan/DirectX to be slower than Optix or any other company's native software. What Microsoft means by 'fallback' path is probably some janky generic 'OpenCL' like implementation that can run on all cards w/o any optimizations.

It's important to distinguish between hardware/drivers and APIs.
DirectX is not a driver. It's an API :

https://en.wikipedia.org/wiki/DirectX :
Microsoft DirectX is a collection of application programming interfaces (APIs) for handling tasks related to multimedia, especially game programming and video, on Microsoft platforms.

Nothing of value is lost w/o it. The hardware has to already be capable and a driver from the manufacturer along w/ an API/SDK made available for DirectX to provide hooks into.
DirectX 12 (DXR) doesn't enable ray tracing for Nvidia. It has existed for years via Optix. All directX 12 (DXR) provides is a high level easy to use hardware agnostic API for developers.

Cut Vulkan/DirectX12 out of the picture, and you'd still have the same real-time ray tracing functionality on Turing via Optix. Because such a horrid job is done at detailing 'real-time' ray tracing at the conferences, here's a simplified walk through on how it all works w/o any marketing nonsense:
https://developer.apple.com/videos/play/wwdc2018/606/
Yes, that uber demo Jensen showed in the box room of 'real-time' ray tracing and denoising can run on an Ipad.

Also note from the Apple video starting @22min, is that the ray/s metric largely depends on the scene.
An Ipad can do the box demo @ 176 Million rays/s.
This drops to 20 Million rays/s on a more complicated scene. It drops even more w/ 'reflective' surfaces (secondary divergent rays) which is why Jensen doesn't have many of them in his demos.

For reference, generally speaking Pascal is said to average around 400/500 Mrays/s... But in what?
This is the key. Jensen has provided no standard benchmark scene(s) as to how he arrives at this incredible 8/10 gigaray/sec figure. I'll be flat out honest : I think the real-world numbers will be much lower and the Pascal cards will be in the gigaray range themselves in the same scene.

If Nvidia weren't pulling any shenanigans, they would have shown the ray/s count, like apple, on a particularly known benchmark scene for ray tracing.
Much like FPS, they still haven't shown live demos... The only one I heard of has the 2080ti at around 3.2Gigarays/s. However, I am unaware of what Pascal does in comparison.
This is concerning but won't last much longer as people will be able to figure out exactly what the performance is when they get their hands on this hardware.

I truly hope Jensen wasn't pulling a fast one w/ these Gigaray measures and its a number formed by a range of industry standard Ray tracing scenes.
 
Last edited:
If Nvidia weren't pulling any shenanigans, they would have shown the ray/s count, like apple, on a particularly known benchmark scene for ray tracing.
Much like FPS, they still haven't shown live demos... The only one I heard of has the 2080ti at around 3.2Gigarays/s. However, I am unaware of what Pascal does in comparison.
This is concerning but won't last much longer as people will be able to figure out exactly what the performance is when they get their hands on this hardware.

I truly hope Jensen wasn't pulling a fast one w/ these Gigaray measures and its a number formed by a range of industry standard Ray tracing scenes.
I think during the "live stream" yesterday with Tom Petersen he mentioned approx. 1 Gigaray for Pascal (1080Ti).
 
From a Reddit user, I don't know if this claim is true or not:
T4CFantasy said:
its also confirmed 2070 will use TU106 because of the device ID
TU106 in the 2070 matches the information in the AdoredTV Turing rumor from about three weeks ago. So I'm wondering if the 7 GB GDDR6 for the 2070 mentioned in that rumor might not be entirely wrong. Is it possible that the RTX 2070 uses a 7 GB + 1 GB memory configuration similar to the GTX 970?
 
Lastly :
http://on-demand.gputechconf.com/gtc/2017/presentation/s7455-martin-stich-optix.pdf
See page 24 (Same box demo as the Ipad that can render it at 176Million rays/s). Meanwhile, the ipad could only do about 20 million rays in a more complicated scene.
Again stressing : The scene matters. Properly comparing cards involves showing how many ray/s is acheivable in the same scene and detailing what rays the metric is composed of : (Primary rays, secondary rays, shadow rays, etc).

See Page 35 (A range of performance metrics across various ray tracing scenes for Titan X (Pascal). Jensen's lauded gigaray figures better have been composed by the same measures.
 
Also note from the Apple video starting @22min, is that the ray/s metric largely depends on the scene.
An Ipad can do the box demo @ 176 Million rays/s.
This drops to 20 Million rays/s on a more complicated scene. It drops even more w/ 'reflective' surfaces (secondary divergent rays) which is why Jensen doesn't have many of them in his demos.

For reference, generally speaking Pascal is said to average around 400/500 Mrays/s... But in what?
This is the key. Jensen has provided no standard benchmark scene(s) as to how he arrives at this incredible 8/10 gigaray/sec figure. I'll be flat out honest : I think the real-world numbers will be much lower and the Pascal cards will be in the gigaray range themselves in the same scene.

If Nvidia weren't pulling any shenanigans, they would have shown the ray/s count, like apple, on a particularly known benchmark scene for ray tracing.
Much like FPS, they still haven't shown live demos... The only one I heard of has the 2080ti at around 3.2Gigarays/s. However, I am unaware of what Pascal does in comparison.
This is concerning but won't last much longer as people will be able to figure out exactly what the performance is when they get their hands on this hardware.

I truly hope Jensen wasn't pulling a fast one w/ these Gigaray measures and its a number formed by a range of industry standard Ray tracing scenes.

I think the gigarays/s metrics they put out are mostly junk, but so far the software devs are saying the performance improvement from Turing is large.
 
Status
Not open for further replies.
Back
Top