Nvidia Turing Speculation thread [2018]

dirtyb1t · Aug 30, 2018

CarstenS said:
If that's so then why do you say

[my bold]
I actually read the white paper you linked to and I did not find tensor cores being mentioned. Maybe you can help me out here? Note that I'm not arguing tensor cores being used for denoising or DLSS, just the ATAA example you criticized to vigorously seems to do without them.

You can read and digest something without making the proper reference to every detail especially things that are insignificant to something you're trying to discuss. I'm not discussing the later portions of the hybrid ray tracing pipeline. I'm discussing

And where Nvidia got their Ray/sec measure. How it compares in performance apples to apples to comparable pascal cards :
1080 -> 2080
1080ti -> 2080ti

If I was in the market for a Quadro card that cost $6000, I'd want a comparison to an equally expensive quadro card of Pascal grade or a professional card w/ scaled cost/performance. Comparing a Quadro 6000 ($6000) card with 24GB of ram to a $400 Geforce consumer 1080 with 8GB ram in any respect renders zero value or information. Titan V comparison are sensible if the referenced algorithm involves the tensor cores purely as it would be tensor core hw accel on volta vs turing. DLSS/ATAA have nothing to do w/ my considerations which is why I loosely referenced them. At the moment, i'm codifying my own test suites for benchmarking and determining the return policy on these cards. I'll pre-order one and do my own benching. If it is not up to the value they have marketed it as for my use case, I'll return it. This could have been avoided if they simply were transparent and clear about performance from the beginning.

O'well.. No loss to me

trinibwoy · Aug 30, 2018

Yes it’s annoying but it’s really not worth getting worked up over nvidia’s marketing claims. Obviously the numbers are comparing performance of various Turing parts in some yet unknown random workload. In the end it doesn’t matter at all.

What we need is an unbiased benchmark that clearly showcases performance of each step of the pipeline - BVH construction, ray traversal and denoising. Only then we can begin to have a useful conversation.

It seems over the last few years feature tests have gone the way of the dodo. I miss the early days of 3dmark’s fillrate, texturing, instancing and shading tests.

dirtyb1t · Aug 30, 2018

trinibwoy said:
Yes it’s annoying but it’s really not worth getting worked up over nvidia’s marketing claims. Obviously the numbers are comparing performance of various Turing parts in some yet unknown random workload. In the end it doesn’t matter at all.

What we need is an unbiased benchmark that clearly showcases performance of each step of the pipeline - BVH construction, ray traversal and denoising. Only then we can begin to have a useful conversation.

It seems over the last few years feature tests have gone the way of the dodo. I miss the early days of 3dmark’s fillrate, texturing, instancing and shading tests.

Agreed. All is not lost... The "marketing" caused me to dig a lot deeper into how this all functions and I learned an incredible amount. This helps me make a much more informed decision. Without this blackout period, I possible would have bought it blindly. Now I can deeply second guess such a purchase. I have downloaded Optix myself and its quite interesting what Pascal GPUs and existing denoising algorithms are already capable of. The varied amount of memory/resources various denoising can take up/etc.

Incredible achievement from Nvidia which I must acknowledge. However, as an intelligent consumer, I must scrutinize a purchase as much as possible. Now I know what to scrutinize and look out for beyond the headline slides. Thought I'd pass that along to others in case they too were caught up in the marketed figures and terminology.

Voxilla · Aug 31, 2018

pharma said:
I think during the "live stream" yesterday with Tom Petersen he mentioned approx. 1 Gigaray for Pascal (1080Ti).

Just got the raytracing samples running on a Titan XP (using the fallback of course, no driver yet supporting DXR)
It looks indeed as speculated before, the NV Grays/s claim is for raytracing just a single triangle.
The "Hello World" sample runs at 1 Grays/s on Pascal GP102.
The fallback layer is open source.
It shouldn't be too hard for someone to improve it and get some more Grays/s out of it, for the 1 triangle case I guess

dirtyb1t · Sep 1, 2018

Voxilla said:
Just got the raytracing samples running on a Titan XP (using the fallback of course, no driver yet supporting DXR)
It looks indeed as speculated before, the NV Grays/s claim is for raytracing just a single triangle.
The "Hello World" sample runs at 1 Grays/s on Pascal GP102.
The fallback layer is open source.
It shouldn't be too hard for someone to improve it and get some more Grays/s out of it, for the 1 triangle case I guess

I was sure this is what was referenced with : 10 Gigarays.
So, they most likely are referring primary ray generation, traversal, miss and closest hit throughput.

When divergence and deeper processing in the pipeline occur is when the troubles and performance issues begin. So, it will be interesting to see a range of more detailed benchmarks as to how Nvidia's new micro-architecture handles this. However, from a raw throughput of primary rays, it already seems they have made some good progress in achieving multiples single digits above what current pascal can achieve. Exciting achievements but brought to a more sensible reality.

Deleted member 2197 · Sep 2, 2018

Voxilla said:
Just got the raytracing samples running on a Titan XP (using the fallback of course, no driver yet supporting DXR)
It looks indeed as speculated before, the NV Gigarays/s claim is for raytracing just a single triangle.
The "Hello World" sample runs at 1 Grays/s on Pascal GP102.

I'm hopeful the upcoming reviews will use the the Microsoft Raytracing Sample Test as one test for Grays/s on Nvidia and AMD cards indicating fallback vs DXR modes.

Thanks @Voxilla for setting up and running the Microsoft Raytracing Sample test. It's good to have context and confirmation of Nvidia's statement regarding Pascal GP102 @ 1 Gigaray per second.

pcchen · Sep 2, 2018

Something just crossed my mind (sorry if someone already mentioned it):
Since by using ray tracing, in theory, should make a lot of off buffer rendering tricks redundant (such as shadow maps, reflection cube maps, etc.), is it going to make SLI or other multi-GPU solutions easier to make?
Maybe there can finally be some real multi-GPU solution not based on AFR.

silent_guy · Sep 3, 2018

I saw this link on Reddit:
http://on-demand.gputechconf.com/si...rgan-mcguire-ray-tracing-research-update.html

It’s a pretty short presentation that shows the kind of ray tracing acceleration you could see when going from Pascal to Turing.

He describes in a bit more detail a few techniques that are pretty fascinating.

E.g. they use traditional TAA for the whole scene, then calculate which pixels will be blurry, and then shoot multiple RT rays to only those pixels to increase the quality there with SSAA.

trinibwoy · Sep 3, 2018

pcchen said:
Something just crossed my mind (sorry if someone already mentioned it):
Since by using ray tracing, in theory, should make a lot of off buffer rendering tricks redundant (such as shadow maps, reflection cube maps, etc.), is it going to make SLI or other multi-GPU solutions easier to make?
Maybe there can finally be some real multi-GPU solution not based on AFR.

Does raytracing actually reduce the number of buffers that get shuffled around? Seems we still have lots of buffers just that they’re raytraced instead.

The last slide of the SEED presentation proposes a multi GPU solution where ray generation (I assume this means g-buffer rasterization) happens on the primary GPU and then the other GPUs split the raytracing work and send the results back to primary for compositing and filtering.

Basically SFR and still requires a lot of work from devs. I don’t think we’ll see multi GPU really shine again until we get to 100% world space processing where each pixel really is independent.

https://media.contentapi.ea.com/con...-raytracing-in-hybrid-real-time-rendering.pdf

Voxilla · Sep 4, 2018

pharma said:
I'm hopeful the upcoming reviews will use the the Microsoft Raytracing Sample Test as one test for Grays/s on Nvidia and AMD cards indicating fallback vs DXR modes.

Thanks @Voxilla for setting up and running the Microsoft Raytracing Sample test. It's good to have context and confirmation of Nvidia's statement regarding Pascal GP102 @ 1 Gigaray per second.

Out of curiosity I delved a little deeper into the DXR fallback source code, as it seems rather slow. (Thanks Microsoft for making it open source)
After a couple of hours digging and changing code, I could get it faster significantly.
Now the "Hello World " sample runs at 1.7 Gray/s instead of 1 Gray/s.
Even my old Maxwell GM200 now can do 1Gray/s

For the technical interested, the core of the DXR fallback raytracing is done in TraverseFunction.hlsli.
There is a stack implemented as:

static uint stack[TRAVERSAL_MAX_STACK_DEPTH];

This hurts a lot, allocating the stack in shared memory with:

groupshared uint stack[TRAVERSAL_MAX_STACK_DEPTH*WAVE_SIZE];

results in an immediate 70% raytracing speedup of the sample.
(There are a couple more changes needed in StatePush/Pop, such as stack[stackIndex*WAVE_SIZE + tidInWave] = value; etc)

There is still a huge amount of optimizations possible, I think, to get the fallback even faster.

A1xLLcqAgt0qc2RyMz0y · Sep 4, 2018

Voxilla said:
Out of curiosity I delved a little deeper into the DXR fallback source code, as it seems rather slow. (Thanks Microsoft for making it open source)
After a couple of hours digging and changing code, I could get it faster significantly.
Now the "Hello World " sample runs at 1.7 Gray/s instead of 1 Gray/s.
Even my old Maxwell GM200 now can do 1Gray/s

For the technical interested, the core of the DXR fallback raytracing is done in TraverseFunction.hlsli.
There is a stack implemented as:

static uint stack[TRAVERSAL_MAX_STACK_DEPTH];
This hurts a lot, allocating the stack in shared memory with:

groupshared uint stack[TRAVERSAL_MAX_STACK_DEPTH*WAVE_SIZE];
results in an immediate 70% raytracing speedup of the sample.
(There are a couple more changes needed in StatePush/Pop, such as stack[stackIndex*WAVE_SIZE + tidInWave] = value; etc)

There is still a huge amount of optimizations possible, I think, to get the fallback even faster.

When the optimizations are complete will you submit your changes

giannhs · Sep 5, 2018

GeForce RTX owners should get the option to turn ray tracing off. However, there is no DXR (DirectX Ray Tracing) fallback path for emulating the technology in software on non-RTX graphics cards. And when AMD comes up with its own DXR-capable GPU, DICE will need to go back and re-tune Battlefield V to support it.

Holmquist clarifies, “…we only talk with DXR. Because we have been running only Nvidia hardware, we know that we have optimized for that hardware. We’re also using certain features in the compiler with intrinsics, so there is a dependency. That can be resolved as we get hardware from another potential manufacturer. But as we tune for a specific piece of hardware, dependencies do start to go in, and we’d need another piece of hardware in order to re-tune.”

https://www.tomshardware.com/news/battlefield-v-ray-tracing,37732.html

Pressure · Sep 5, 2018

Voxilla said:
Out of curiosity I delved a little deeper into the DXR fallback source code, as it seems rather slow. (Thanks Microsoft for making it open source)
After a couple of hours digging and changing code, I could get it faster significantly.
Now the "Hello World " sample runs at 1.7 Gray/s instead of 1 Gray/s.
Even my old Maxwell GM200 now can do 1Gray/s

For the technical interested, the core of the DXR fallback raytracing is done in TraverseFunction.hlsli.
There is a stack implemented as:

static uint stack[TRAVERSAL_MAX_STACK_DEPTH];
This hurts a lot, allocating the stack in shared memory with:

groupshared uint stack[TRAVERSAL_MAX_STACK_DEPTH*WAVE_SIZE];
results in an immediate 70% raytracing speedup of the sample.
(There are a couple more changes needed in StatePush/Pop, such as stack[stackIndex*WAVE_SIZE + tidInWave] = value; etc)

There is still a huge amount of optimizations possible, I think, to get the fallback even faster.

Does the DXR fallback run on AMD VEGA yet?

Ike Turner · Sep 5, 2018

Pressure said:
Does the DXR fallback run on AMD VEGA yet?

It should AFAIK. Given that it already does on Polaris and Fiji.

Pressure · Sep 5, 2018

Ike Turner said:
It should AFAIK. Given that it already does on Polaris and Fiji.

Guess I should try the "Hello World" on VEGA FE then.

Deleted member 13524 · Sep 5, 2018

Looks like the Metro Exodus demo ran at 1080p 40FPS, on a 2080 Ti.

https://www.tweaktown.com/news/63052/metro-exodus-rtx-demo-ran-1080p-40fps-2080-ti/index.html

Meaning the 2070 is probably not able to run the game with RTX at all, and even the 2080 may have trouble keeping above 30FPS.

Deleted member 2197 · Sep 5, 2018

ToTTenTranz said:
Looks like the Metro Exodus demo ran at 1080p 40FPS, on a 2080 Ti.

Isn't that old news from the GamesCon conference?

Voxilla · Sep 5, 2018

A1xLLcqAgt0qc2RyMz0y said:
When the optimizations are complete will you submit your changes

No promise for a whole bunch more optimizations.
In case we don't get a Pascal driver with good DXR support, the motivation might be higher though

Clukos · Sep 9, 2018

Voxilla said:
In case we don't get a Pascal driver with good DXR support, the motivation might be higher though

I see absolutely 0 reason for Nvidia to improve Pascal DXR performance. If anything, that's one thing they'd want to avoid to make Turing more appealing.

Geeforcer · Sep 9, 2018

Clukos said:
I see absolutely 0 reason for Nvidia to improve Pascal DXR performance. If anything, that's one thing they'd want to avoid to make Turing more appealing.

It’s chicken/egg conundrum. Considering that Pascal has a huge chunk of current installment base, for DXR to keep getting traction and dev support it would need to be at least somewhat useful on such a large percentage of gaming hardware.

Nvidia Turing Speculation thread [2018]

dirtyb1t

trinibwoy

Meh

dirtyb1t

Voxilla

dirtyb1t

Deleted member 2197

Guest

pcchen

Moderator

silent_guy

trinibwoy

Meh

Voxilla

A1xLLcqAgt0qc2RyMz0y

giannhs

Pressure

Ike Turner

Pressure

Deleted member 13524

Guest

Deleted member 2197

Guest

Voxilla

Clukos

Bloodborne 2 when?

Geeforcer

Harmlessly Evil

Similar threads