Nvidia Turing Speculation thread [2018]

Status
Not open for further replies.
If that's so then why do you say

[my bold]
I actually read the white paper you linked to and I did not find tensor cores being mentioned. Maybe you can help me out here? Note that I'm not arguing tensor cores being used for denoising or DLSS, just the ATAA example you criticized to vigorously seems to do without them.

You can read and digest something without making the proper reference to every detail especially things that are insignificant to something you're trying to discuss. I'm not discussing the later portions of the hybrid ray tracing pipeline. I'm discussing
fig5_optix_proprietary-625x302.png

And where Nvidia got their Ray/sec measure. How it compares in performance apples to apples to comparable pascal cards :
1080 -> 2080
1080ti -> 2080ti

If I was in the market for a Quadro card that cost $6000, I'd want a comparison to an equally expensive quadro card of Pascal grade or a professional card w/ scaled cost/performance. Comparing a Quadro 6000 ($6000) card with 24GB of ram to a $400 Geforce consumer 1080 with 8GB ram in any respect renders zero value or information. Titan V comparison are sensible if the referenced algorithm involves the tensor cores purely as it would be tensor core hw accel on volta vs turing. DLSS/ATAA have nothing to do w/ my considerations which is why I loosely referenced them. At the moment, i'm codifying my own test suites for benchmarking and determining the return policy on these cards. I'll pre-order one and do my own benching. If it is not up to the value they have marketed it as for my use case, I'll return it. This could have been avoided if they simply were transparent and clear about performance from the beginning.

O'well.. No loss to me
 
Last edited:
Yes it’s annoying but it’s really not worth getting worked up over nvidia’s marketing claims. Obviously the numbers are comparing performance of various Turing parts in some yet unknown random workload. In the end it doesn’t matter at all.

What we need is an unbiased benchmark that clearly showcases performance of each step of the pipeline - BVH construction, ray traversal and denoising. Only then we can begin to have a useful conversation.

It seems over the last few years feature tests have gone the way of the dodo. I miss the early days of 3dmark’s fillrate, texturing, instancing and shading tests.
 
Yes it’s annoying but it’s really not worth getting worked up over nvidia’s marketing claims. Obviously the numbers are comparing performance of various Turing parts in some yet unknown random workload. In the end it doesn’t matter at all.

What we need is an unbiased benchmark that clearly showcases performance of each step of the pipeline - BVH construction, ray traversal and denoising. Only then we can begin to have a useful conversation.

It seems over the last few years feature tests have gone the way of the dodo. I miss the early days of 3dmark’s fillrate, texturing, instancing and shading tests.
Agreed. All is not lost... The "marketing" caused me to dig a lot deeper into how this all functions and I learned an incredible amount. This helps me make a much more informed decision. Without this blackout period, I possible would have bought it blindly. Now I can deeply second guess such a purchase. I have downloaded Optix myself and its quite interesting what Pascal GPUs and existing denoising algorithms are already capable of. The varied amount of memory/resources various denoising can take up/etc.

Incredible achievement from Nvidia which I must acknowledge. However, as an intelligent consumer, I must scrutinize a purchase as much as possible. Now I know what to scrutinize and look out for beyond the headline slides. Thought I'd pass that along to others in case they too were caught up in the marketed figures and terminology.
 
I think during the "live stream" yesterday with Tom Petersen he mentioned approx. 1 Gigaray for Pascal (1080Ti).

Just got the raytracing samples running on a Titan XP (using the fallback of course, no driver yet supporting DXR)
It looks indeed as speculated before, the NV Grays/s claim is for raytracing just a single triangle.
The "Hello World" sample runs at 1 Grays/s on Pascal GP102.
The fallback layer is open source.
It shouldn't be too hard for someone to improve it and get some more Grays/s out of it, for the 1 triangle case I guess :)
 
Just got the raytracing samples running on a Titan XP (using the fallback of course, no driver yet supporting DXR)
It looks indeed as speculated before, the NV Grays/s claim is for raytracing just a single triangle.
The "Hello World" sample runs at 1 Grays/s on Pascal GP102.
The fallback layer is open source.
It shouldn't be too hard for someone to improve it and get some more Grays/s out of it, for the 1 triangle case I guess :)

I was sure this is what was referenced with : 10 Gigarays.
So, they most likely are referring primary ray generation, traversal, miss and closest hit throughput.
fig5_optix_proprietary-625x302.png

When divergence and deeper processing in the pipeline occur is when the troubles and performance issues begin. So, it will be interesting to see a range of more detailed benchmarks as to how Nvidia's new micro-architecture handles this. However, from a raw throughput of primary rays, it already seems they have made some good progress in achieving multiples single digits above what current pascal can achieve. Exciting achievements but brought to a more sensible reality.
 
Just got the raytracing samples running on a Titan XP (using the fallback of course, no driver yet supporting DXR)
It looks indeed as speculated before, the NV Gigarays/s claim is for raytracing just a single triangle.
The "Hello World" sample runs at 1 Grays/s on Pascal GP102.
I'm hopeful the upcoming reviews will use the the Microsoft Raytracing Sample Test as one test for Grays/s on Nvidia and AMD cards indicating fallback vs DXR modes.

Thanks @Voxilla for setting up and running the Microsoft Raytracing Sample test. It's good to have context and confirmation of Nvidia's statement regarding Pascal GP102 @ 1 Gigaray per second.
 
Something just crossed my mind (sorry if someone already mentioned it):
Since by using ray tracing, in theory, should make a lot of off buffer rendering tricks redundant (such as shadow maps, reflection cube maps, etc.), is it going to make SLI or other multi-GPU solutions easier to make?
Maybe there can finally be some real multi-GPU solution not based on AFR.
 
I saw this link on Reddit:
http://on-demand.gputechconf.com/si...rgan-mcguire-ray-tracing-research-update.html

It’s a pretty short presentation that shows the kind of ray tracing acceleration you could see when going from Pascal to Turing.

He describes in a bit more detail a few techniques that are pretty fascinating.

E.g. they use traditional TAA for the whole scene, then calculate which pixels will be blurry, and then shoot multiple RT rays to only those pixels to increase the quality there with SSAA.
 
Something just crossed my mind (sorry if someone already mentioned it):
Since by using ray tracing, in theory, should make a lot of off buffer rendering tricks redundant (such as shadow maps, reflection cube maps, etc.), is it going to make SLI or other multi-GPU solutions easier to make?
Maybe there can finally be some real multi-GPU solution not based on AFR.

Does raytracing actually reduce the number of buffers that get shuffled around? Seems we still have lots of buffers just that they’re raytraced instead.

The last slide of the SEED presentation proposes a multi GPU solution where ray generation (I assume this means g-buffer rasterization) happens on the primary GPU and then the other GPUs split the raytracing work and send the results back to primary for compositing and filtering.

Basically SFR and still requires a lot of work from devs. I don’t think we’ll see multi GPU really shine again until we get to 100% world space processing where each pixel really is independent.

https://media.contentapi.ea.com/con...-raytracing-in-hybrid-real-time-rendering.pdf
 
I'm hopeful the upcoming reviews will use the the Microsoft Raytracing Sample Test as one test for Grays/s on Nvidia and AMD cards indicating fallback vs DXR modes.

Thanks @Voxilla for setting up and running the Microsoft Raytracing Sample test. It's good to have context and confirmation of Nvidia's statement regarding Pascal GP102 @ 1 Gigaray per second.

Out of curiosity I delved a little deeper into the DXR fallback source code, as it seems rather slow. (Thanks Microsoft for making it open source)
After a couple of hours digging and changing code, I could get it faster significantly.
Now the "Hello World " sample runs at 1.7 Gray/s instead of 1 Gray/s.
Even my old Maxwell GM200 now can do 1Gray/s :)

For the technical interested, the core of the DXR fallback raytracing is done in TraverseFunction.hlsli.
There is a stack implemented as:
static uint stack[TRAVERSAL_MAX_STACK_DEPTH];​
This hurts a lot, allocating the stack in shared memory with:
groupshared uint stack[TRAVERSAL_MAX_STACK_DEPTH*WAVE_SIZE];​
results in an immediate 70% raytracing speedup of the sample.
(There are a couple more changes needed in StatePush/Pop, such as stack[stackIndex*WAVE_SIZE + tidInWave] = value; etc)

There is still a huge amount of optimizations possible, I think, to get the fallback even faster.
 
Out of curiosity I delved a little deeper into the DXR fallback source code, as it seems rather slow. (Thanks Microsoft for making it open source)
After a couple of hours digging and changing code, I could get it faster significantly.
Now the "Hello World " sample runs at 1.7 Gray/s instead of 1 Gray/s.
Even my old Maxwell GM200 now can do 1Gray/s :)

For the technical interested, the core of the DXR fallback raytracing is done in TraverseFunction.hlsli.
There is a stack implemented as:
static uint stack[TRAVERSAL_MAX_STACK_DEPTH];​
This hurts a lot, allocating the stack in shared memory with:
groupshared uint stack[TRAVERSAL_MAX_STACK_DEPTH*WAVE_SIZE];​
results in an immediate 70% raytracing speedup of the sample.
(There are a couple more changes needed in StatePush/Pop, such as stack[stackIndex*WAVE_SIZE + tidInWave] = value; etc)

There is still a huge amount of optimizations possible, I think, to get the fallback even faster.

When the optimizations are complete will you submit your changes
 
GeForce RTX owners should get the option to turn ray tracing off. However, there is no DXR (DirectX Ray Tracing) fallback path for emulating the technology in software on non-RTX graphics cards. And when AMD comes up with its own DXR-capable GPU, DICE will need to go back and re-tune Battlefield V to support it.

Holmquist clarifies, “…we only talk with DXR. Because we have been running only Nvidia hardware, we know that we have optimized for that hardware. We’re also using certain features in the compiler with intrinsics, so there is a dependency. That can be resolved as we get hardware from another potential manufacturer. But as we tune for a specific piece of hardware, dependencies do start to go in, and we’d need another piece of hardware in order to re-tune.”

https://www.tomshardware.com/news/battlefield-v-ray-tracing,37732.html
 
Out of curiosity I delved a little deeper into the DXR fallback source code, as it seems rather slow. (Thanks Microsoft for making it open source)
After a couple of hours digging and changing code, I could get it faster significantly.
Now the "Hello World " sample runs at 1.7 Gray/s instead of 1 Gray/s.
Even my old Maxwell GM200 now can do 1Gray/s :)

For the technical interested, the core of the DXR fallback raytracing is done in TraverseFunction.hlsli.
There is a stack implemented as:
static uint stack[TRAVERSAL_MAX_STACK_DEPTH];​
This hurts a lot, allocating the stack in shared memory with:
groupshared uint stack[TRAVERSAL_MAX_STACK_DEPTH*WAVE_SIZE];​
results in an immediate 70% raytracing speedup of the sample.
(There are a couple more changes needed in StatePush/Pop, such as stack[stackIndex*WAVE_SIZE + tidInWave] = value; etc)

There is still a huge amount of optimizations possible, I think, to get the fallback even faster.

Does the DXR fallback run on AMD VEGA yet?
 
In case we don't get a Pascal driver with good DXR support, the motivation might be higher though :)

I see absolutely 0 reason for Nvidia to improve Pascal DXR performance. If anything, that's one thing they'd want to avoid to make Turing more appealing.
 
I see absolutely 0 reason for Nvidia to improve Pascal DXR performance. If anything, that's one thing they'd want to avoid to make Turing more appealing.

It’s chicken/egg conundrum. Considering that Pascal has a huge chunk of current installment base, for DXR to keep getting traction and dev support it would need to be at least somewhat useful on such a large percentage of gaming hardware.
 
Status
Not open for further replies.
Back
Top