Nvidia Turing Speculation thread [2018]

trinibwoy · Aug 30, 2018

Voxilla said:
Sure dedicated RT hardware will always be faster as GPU code. The point was however optimized RT code in the driver could be a lot faster as the DXR fallback. It's up to the GPU vendors to provide such drivers, so it will be interesting to see what will happen.

As others have already mentioned those optimized drivers already exist in the form of Optix. We already know it’s not fast enough so the whole notion of optimized DXR drivers is really a moot point.

There have already been for years realtime game oriented raytracing engines, see for example video of 2014, Brigade. With proper denoising this would have been quite good. (no need of AI for good denoising as BFV seems to prove)

Well firstly that’s not real time. And secondly those are all static scenes, no particles etc. Not exactly a relevant point of comparison.

dirtyb1t · Aug 30, 2018

pharma said:
I think during the "live stream" yesterday with Tom Petersen he mentioned approx. 1 Gigaray for Pascal (1080Ti).

Yes, I have viewed it. Simply put : the fact that he gave a 1 Gigaray figure for Pascal (1080ti) in an unknown ray tracing scene whereas the more commonly held figures is around 400-500Mray/s exhibits the very skepticism I have with Turing. They're going to easily boast a more reasoned number by 2x or so for marketing purposes. When he gave a gigaray figure to 1080ti, I knew immediately that it is likely that they're using some hoccus pocus benchmarks for these 10/8/6 Gigaray figures for Turing. These tests aren't hard or complicated. You can them right now on Pascal on a slew of programs. So, the skepticism comes as to why they haven't shown them. Turing does 10/8/6 Gigaray? Yeah? In what scene? Show me it.

Deleted member 2197 · Aug 30, 2018

We should know soon enough once the reviews come out.

Scott_Arm · Aug 30, 2018

@dirtyb1t The gigarays number are dumb, but if you look at the video I posted, Morgan McGuire is suggest 4-10x improvements in frame times for ray tracing algorithms with incoherent rays with turing vs pascal.

Voxilla · Aug 30, 2018

dirtyb1t said:
Yes, I have viewed it. Simply put : the fact that he gave a 1 Gigaray figure for Pascal (1080ti) in an unknown ray tracing scene whereas the more commonly held figures is around 400-500Mray/s exhibits the very skepticism I have with Turing. They're going to easily boast a more reasoned number by 2x or so for marketing purposes. When he gave a gigaray figure to 1080ti, I knew immediately that it is likely that they're using some hoccus pocus benchmarks for these 10/8/6 Gigaray figures for Turing. These tests aren't hard or complicated. You can them right now on Pascal on a slew of programs. So, the skepticism comes as to why they haven't shown them. Turing does 10/8/6 Gigaray? Yeah? In what scene? Show me it.

The Gray/s metric has been put in question before.

Voxilla · Aug 30, 2018

dirtyb1t said:
I also have no clue what these two are arguing about...

Optix / Vulkan / DirectX all sit at the same point
DirectX and Vulkan are nothing more than Hardware Agnostic platforms to extend down into the GPU.
Optix meanwhile is native and Hardware Specific to Nvidia/CUDA. Optix has the same if not better performance than DirectX in that it doesn't have to traverse any Hardware Abstraction layers. If the diagram was truly accurate, there would be a green section at the bottom of Vulkan and DXR denoting Nvidia's driver. In the case of any other hardware company, a section denoting their proprietary driver providing an interface to DXR/Vulkan.

Ray tracing is possible right now on Pascal with all of the fancy features you saw demo'd.
The Demos Jensen ran on stage run on Pascal right now via Optix 5.1. It's just slower than Turing because there is no dedicated hardware acceleration (ray trace cores and tensor cores)
As far as I understand it, the tensor cores are used for AI accelerated Denoising or DLSS.
The ray trace cores do the ray intersection tests
and the BVH generation/traversal/etc are done in CUDA cores/other areas and is mapped to the SM's through an improved and shared caching mechanism between the rasterizer pipeline.

I have no clue what is being spoken about w.r.t to 'drivers'. The driver for ray tracing is what everyone is already currently using in current gen hardware. DirectX nor Vulkan are needed for this. Each company has their own proprietary "driver" and API/SDK. All DirectX and Vulkan do is provide a higher level API that interfaces to this so that developers don't have to worry about hardware specific implementations. I'd expect Vulkan/DirectX to be slower than Optix or any other company's native software. What Microsoft means by 'fallback' path is probably some janky generic 'OpenCL' like implementation that can run on all cards w/o any optimizations.

It's important to distinguish between hardware/drivers and APIs.
DirectX is not a driver. It's an API :

https://en.wikipedia.org/wiki/DirectX :
Microsoft DirectX is a collection of application programming interfaces (APIs) for handling tasks related to multimedia, especially game programming and video, on Microsoft platforms.

Nothing of value is lost w/o it. The hardware has to already be capable and a driver from the manufacturer along w/ an API/SDK made available for DirectX to provide hooks into.
DirectX 12 (DXR) doesn't enable ray tracing for Nvidia. It has existed for years via Optix. All directX 12 (DXR) provides is a high level easy to use hardware agnostic API for developers.

Cut Vulkan/DirectX12 out of the picture, and you'd still have the same real-time ray tracing functionality on Turing via Optix. Because such a horrid job is done at detailing 'real-time' ray tracing at the conferences, here's a simplified walk through on how it all works w/o any marketing nonsense:
https://developer.apple.com/videos/play/wwdc2018/606/
Yes, that uber demo Jensen showed in the box room of 'real-time' ray tracing and denoising can run on an Ipad.

Maybe the thread on this forum can enlighten you. also this one.
"The goal of the compute based Fallback Layer (FL) is to provide the same C++ and shader API interface as DXR. This isn't completely possible, due to various reasons and design differences in DirectX Compute, but in the end the APIs are almost identical outside of few corner cases. I’ll refer you to the implementation details and limitations in the FL developer guide."

And another link
"The Fallback Layer uses DXR if a driver and OS supports it. Otherwise, it falls back to the compute pipeline to emulate raytracing. Developers aiming for wider HW support should target the Fallback Layer."

Malo · Aug 30, 2018

iMacmatician said:
From a Reddit user, I don't know if this claim is true or not:

TU106 in the 2070 matches the information in the AdoredTV Turing rumor from about three weeks ago. So I'm wondering if the 7 GB GDDR6 for the 2070 mentioned in that rumor might not be entirely wrong. Is it possible that the RTX 2070 uses a 7 GB + 1 GB memory configuration similar to the GTX 970?

Why does a device ID confirm what chip its using when there's no existing TU106 to compare?

dirtyb1t · Aug 30, 2018

Scott_Arm said:
Here's some talk on performance and such

http://on-demand.gputechconf.com/si...rgan-mcguire-ray-tracing-research-update.html

This is more handwaving and actually scary how many tricks are in each slide.
First off, ray tracing is not path tracing. They have completely different performance.
Second.. There's raw ray tracing ray/s and then there's up samples, filtering/denoising/AA/sub-sampling. Each and every single one of the performance slides contains some convoluted gimmick. In some slides they compare Titan V to Quadro RTX 6000. In others they compare a $6,000 24GB Quadro RTX 6000 to not even a 1080ti but a 1080. Do I have "stupid" written on my forward Nvidia?
Then, they carefully slide in "denoised" into the comparison once a Pascal Consumer GPU appears. Again, this his nothing to do w/ the ray tracing capability and everything to do w/ the upsample tensor cores. A 5x speedup in denoising? I'd hope so when you have 3x the memory and tensor cores. This was already a known with Pascal Titan V that too has Tensor cores. The speedups in their own slide deck are all over the place. In some cases its 50%+, 2x,3x,4x,5x,4-10x,15x.

They compare Titan V to a Quadro RTX 6000 and there's a 50% speedup in one place, 300% speedup in another, and then comes some convoluted : Algorithm Speedup measure which you might as well try to use to con someone... Because it is benchmarked agains the older algorithm approach. So, if you don't read it properly the Quadro RTX 6000 makes a 7.9x speedup vs a Titan V. It doesn't because the Titan V is 2.4x and the RTX 6000 is 5.4x. So, the Quadro RTX 6000 is only about 2x faster than Titan V.

The whole slide deck is filled with this. This is why you let engineers develop such slide decks and not manipulative marketing teams. What is this nonsense? It would literally insult anyone's intelligence in the room.

The last column claims a speedup due to an AA (Anti-aliasing) algorithm? This has nothing to do w/ ray tracing and everything to do w/ the Tensor core upsample/Anti-aliasing/denoising.
This is why they compared Titan V to a Quadro.. Because you need tensor cores for this. It's a joke to compare this to Pascal.

Nvidia : Stop jerking people around and show a series of 10 progressively complex ray tracing scenes and the performance in Rays/s that the core ray tracing algorithm runs on various cards. Compare a 1080 to a 2080, a 1080ti to a 2080ti. A comparable Quadro to a comparable RTX Quadro. Then show a separate performance demo of what you get in speedup using an AI enabled denoising/sampling algorithm running on tensor cores vs w/o in Pascal.

LOL, i want to punch a whole in the wall after going through this presentation. This makes me question these lauded Gigaray figures 100%.

dirtyb1t · Aug 30, 2018

trinibwoy said:
As others have already mentioned those optimized drivers already exist in the form of Optix. We already know it’s not fast enough so the whole notion of optimized DXR drivers is really a moot point.

Well firstly that’s not real time. And secondly those are all static scenes, no particles etc. Not exactly a relevant point of comparison.

Real-time is a joke and a marketing term in this context. Given the insane variability in quality and FPS, there's no such thing.
Furthermore, real-time is only achievable because of the Tensor core based AI denoising/up sampling.
The ray trace portion is still anything but real-time and ready for prime-time.
All of the magic to make a frame have reasonable quality after a small amount of ray tracing involves the tensor cores that upsample, filter, and fill-in all of the details among the super noisy image.
The rightful comparison would be to implement this "AI" algorithm in Cuda Cores and reflect the real difference. If 70% of the work is done by the AI algorithm interpolating and only 30% by actual ray tracing, it's a bit dishonest to wash this all over as real-time ray tracing. They've more correctly named this as hybrid ray tracing because they use a slew of of other pipeline components to produce a result. In correct comparison to the past, it's an apples to orange comparison because more correct ray tracing involves actual ray tracing to fill in the detail. There is little to no denoising.

dirtyb1t · Aug 30, 2018

Voxilla said:
Maybe the thread on this forum can enlighten you. also this one.
"The goal of the compute based Fallback Layer (FL) is to provide the same C++ and shader API interface as DXR. This isn't completely possible, due to various reasons and design differences in DirectX Compute, but in the end the APIs are almost identical outside of few corner cases. I’ll refer you to the implementation details and limitations in the FL developer guide."

And another link
"The Fallback Layer uses DXR if a driver and OS supports it. Otherwise, it falls back to the compute pipeline to emulate raytracing. Developers aiming for wider HW support should target the Fallback Layer."

DirectX is an API. It is not a Driver. DXR is a component of DirectX's API. It is the component that servers ray tracing. It too is an API
There's nothing to discuss or dig into further here. Drivers are written by manufacturer of Hardware so that higher level APIs can access it. In the links you reference for enlightenment is the very thing I stated.
Your first link :

http://forums.directxtech.com/index.php?PHPSESSID=394klvdd3683tt1fjkh2jteav1&topic=5892.0
"yes, upcoming NVIDIA cards with RTX support will have a DXR capable driver." -

Translation : Nvidia will provide a driver that supports DXR. Again, Nvidia Driver. DXR = API

"The goal of the compute based Fallback Layer (FL) is to provide the same C++ and shader API interface as DXR. This isn't completely possible, due to various reasons and design differences in DirectX Compute, but in the end the APIs are almost identical outside of few corner cases. I’ll refer you to the implementation details and limitations in the FL developer guide."

Translation : They state themselves that all they're providing is an API interface. It could be C++,OpenCL,Arubifiednodejsklobernet.
It's an API not a Driver.

Real-time Ray tracing is a simple algorithm that can run on just about any GPU. How it runs on a particular GPU is specific to the hardware driver for the GPU. How this is mapped to DXR (Microsoft's API) is where the support for DXR comes that the hardware company has to author. Why is it that Microsoft can created a fallback Layer that's C++/Shader API? Because you don't need anything complicated to compute ray tracing. I can write a program that does this in a day.. It will just run like garabage. So like I said, the fallback layer is a generic mess that simply runs on all GPUs. It will run like garbage because it doesn't utilize specific hardware features. Optix runs "real-time" ray tracing and is specific to Nvidia. It will run faster than Microsoft's generic fallback layer. Download it from Nvidia's site and run it for yourself on a Pascal card. I can run the same box demo Jensen ran in under 5min of dling' and properly installing Optix on linux w/o DirectX. I'm not using DXR.

Please slow down and understand what is being said.

CarstenS · Aug 30, 2018

McGuire's presentation was about ATAA, not DLSS in the part in question.

dirtyb1t · Aug 30, 2018

Scott_Arm said:
@dirtyb1t The gigarays number are dumb, but if you look at the video I posted, Morgan McGuire is suggest 4-10x improvements in frame times for ray tracing algorithms with incoherent rays with turing vs pascal.

Suggests comparing a Titan V ($3000) to a Quadro 6000 which costs ($6000) and has 24GB ram.
Then gets even more creative and compares a Quadro 6000 to a $400 8GB 1080 Pascal.

Speeds are anywhere from 0.5x to 15x and they're all timing related w/ hilarious footnotes detailing that the timings come from upsampling/denoising. Every other slide does this to throw you off. This is flat out manipulative nonsense. To sell the card they claim X gigarays/sec not time (msec). So, show me how that number is arrived. The msec measures are dumb as well because there is no benchmark scene or series of scenes and there is no apples to apples comparison or similar cards. They're comparing a $6000 card with 3x the memory of $400 card and claiming 15x performance.
Guess what : $400x15 = $6000. I'd sure hope it has 15x performance. 1080 to 2080. 1080ti to 2080ti. Compare quadros to similar priced quadros. Cut out the shenanigans. People buying cards this expensive aren't dumb and its looks quite foolish of Nvidia to try to market to them like they are.

Voxilla said:
The Gray/s metric has been put in question before.

Nothing is answered by comparing the performance of a $400 card 8GB card to a $6000 professional Quadro w/ 3x its memory.
This is what's known as a marketing slide. Show me a 1080 compared to a 2080. A 1080ti compared to a 2080ti. Show me 10 varied ray tracing scenes w/ increasing complexity w/ no denoising or other tensor core gimmicks. Show me the ray/s and how that number is arrived at (shadow rays, primary rays, secondary rays, etc). Draw a line avg. through them. And Cut out the gimmicks. These are $800/$1000+ cards. Nvidia's left the neighborhood of being able to get by with such marketing gimmicks. They need to put up the orange to orange comparisons and shut up. 2080 does X Gigarays/sec? You want to sell me on that figure when it is held that Pascal does 400 Mrays/sec? Show me how it does X Gigarays. They haven't answered a single thing.

Scott_Arm · Aug 30, 2018

Looks like I have a new member on my block list.

Morgan McGuire is an engineer, not a marketing person. There are no claims that the cards being compared are equivalent. There is no deception. What's being compared is shown on the slides. De-noising is not a gimmick. Whether using tensor ccores or not, de-noising will be fundamental to pretty much all real-time ray tracing algorithms, so it's actually a valid aspect of ray-tracing performance.

Also, Morgan McGuire has more info about performance numbers on this twitter thread

https://twitter.com/x/status/1035222228734861313

Path tracing uses the whole machine. Shading is on the main "CUDA"/gfx/compute cores. The traversal and ray intersection uses RT cores. Denoising uses Tensor cores. The BVH build and refit uses a mix of different kinds of cores. Driver doing scheduling and transfer is on CPU.

The rule of thumb I've been using is 10x-20x faster than OptiX on Pascal for the straightforward cases like shadows, AO, and reflections with simplified or no shading. That's based on prerelease hardware and drivers, though. Release version will probably tighten the range.

dirtyb1t · Aug 30, 2018

CarstenS said:
McGuire's presentation was about ATAA, not DLSS in the part in question.

Which has nothing to do w/ the raw performance of ray tracing and Gigaray/sec. It's a completely different topic and aspect of the pipeline.
NVIDIA DLSS : AI-Powered Anti-Aliasing
Adaptive Temporal Antialiasing (ATAA)
Spot the difference? Both use tensor cores to get their speedup. Both can run on w/o Tensor cores on Pascal at a lower rate
Both or one of many image cleanup algorithms to hide the results of the noisy and incomplete ray tracing results that are produced in "real-time" (less than 16ms).

Rasterizer + Ray tracing -> Image cleanup -> What you see
Rasterizer pipeline + Ray trace cores -> Tensor cores -> What you see
[muh giga rays] -> [Muh magic pixie dust cleanup algorithm] -> what you see

Lets not get hung up on the meme level algorithms.
Performance in ray tracing [gigarays/sec] has nothing to do w/ the latest state of the art technique to cleanup/fake image quality after the fact. At least I hope they didn't pull a number out of their behind that combines the two pipelines.

CarstenS · Aug 30, 2018

dirtyb1t said:
Which has nothing to do w/ the raw performance of ray tracing and Gigaray/sec. It's a completely different topic and aspect of the pipeline.

Yet it's (part of) what seemingly got you wound up so much.

dirtyb1t said:
The last column claims a speedup due to an AA (Anti-aliasing) algorithm? This has nothing to do w/ ray tracing and everything to do w/ the Tensor core upsample/Anti-aliasing/denoising.
This is why they compared Titan V to a Quadro.. Because you need tensor cores for this. It's a joke to compare this to Pascal.

dirtyb1t said:
NVIDIA DLSS : AI-Powered Anti-Aliasing
Adaptive Temporal Antialiasing (ATAA)
Spot the difference? Both use tensor cores to get their speedup. Both can run on w/o Tensor cores on Pascal at a lower rate
Both or one of many image cleanup algorithms to hide the results of the noisy and incomplete ray tracing results that are produced in "real-time" (less than 16ms).

For a second I thought, you got it, when you asked if I spotted the difference. But then … I suggest you re-watch the video from about 7:15, where McGuire explains what they did with ATAA:
http://on-demand.gputechconf.com/si...rgan-mcguire-ray-tracing-research-update.html

dirtyb1t · Aug 30, 2018

Scott_Arm said:
Looks like I have a new member on my block list.

Morgan McGuire is an engineer, not a marketing person. There are no claims that the cards being compared are equivalent. There is no deception. What's being compared is shown on the slides. De-noising is not a gimmick. Whether using tensor ccores or not, de-noising will be fundamental to pretty much all real-time ray tracing algorithms, so it's actually a valid aspect of ray-tracing performance.

Do as you please. I'm an engineer and I am not allowed to do public exhibitions without my slide decks being revised and approved by a slew of people people from marketing as anyone would expect for any well formed company. My commentary has nothing to do w/ Morgan McGuire and everything to do w/ the clear issues with how the data is presented in the slide deck.

Scott_Arm said:
There are no claims that the cards being compared are equivalent.

Then what is the nature of your reply? I pointed out this fact as it seemed to be one that was glossed over when his presentation was referenced. You'd have to take me for an absolute fool to try compare a $6000 Professional card to a $400 Consumer GPU that isn't even the higher end of its class. What I claim is that this is not by accident. I also make the claim that to be honest and fair that one should compare apples to apples when making bold declarations as to the new performance of a micro-architecture. I shouldn't feel like I want to punch a hole through the wall when I pause your presentation, read the fine print at the bottom of the slide deck and look closely at how manipulative the numbers are when you do a comparison such that a 7.9x speedup upon first glance is more like a 2x speedup. Anyone in engineering and tech knows exactly where such slides come from such that my claims are substantiated.

QUOTE="Scott_Arm, post: 2041451, member: 2873"]
There is no deception. What's being compared is shown on the slides
[/QUOTE]
Fine print :
Quadro 6000 [Professional card] 24GB RAM ($6000)
Geforce GTX 1080 [Consumer card] 8GB RAM (~$400)
Yes viewers, this is what were comparing on our slides. At the higher end in special measures, you get an amazing 15x speedup. If you're keen on basic math ($400x15) = $6,000. Quadro 6000 .. $6000.
Get it?

Yes, there clearly is no deception. This is the first thing that stood out to me and I wasn't impressed.

Scott_Arm said:
De-noising is not a gimmick. Whether using tensor ccores or not, de-noising will be fundamental to pretty much all real-time ray tracing algorithms, so it's actually a valid aspect of ray-tracing performance.

Denoising is denoising
Ray tracing is ray tracing.

When a word describes a chain of processes, you are best and honestly served piecing them out when discussing performance.
If you are trying to honestly do so, you do so. When you interleave and interweave them every other slide and change graphics cards every other slide when doing comparisons, this is either a glaring mistake or it results in confusion. Confusion that coincidently and significantly serves to over-state what you have accomplished.

Lets stick to the facts/data. This is nothing to do w/ Morgan McGuire and everything to do w/ how manipulative the data has been about these cards. If you want to block me for pointing that out, so be it.

dirtyb1t · Aug 30, 2018

CarstenS said:
Yet it's (part of) what seemingly got you wound up so much.

For a second I thought, you got it, when you asked if I spotted the difference. But then … I suggest you re-watch the video from about 7:15, where McGuire explains what they did with ATAA:
http://on-demand.gputechconf.com/si...rgan-mcguire-ray-tracing-research-update.html

I've already read the white paper some time ago found here :
https://research.nvidia.com/publication/2018-08_Adaptive-Temporal-Antialiasing
I get it exactly. I also get what is involved with ray tracing :

And where Gigaray/sec figures come from which has nothing to do w/ post processing. I understand that the overall pipeline towards presenting a quality image, involves other pieces. These later pieces are where various things like ATAA come in.

So, if you want to talk about AA/Denoising/Shaders/Temporal AA and or NVIDIA DLSS AI-Powered Anti-Aliasing, or any other part of the pipeline and speedups therein, we can talk about that. If you want to talk about how new state of the art algorithms have replaced older ones to make traditional aspects of the quality of the ray trace output more performant, we can also talk about that too as a separate topic. I'm concerned with the above and Gigarays/sec. There might be cases where I want zero interpolation. What's the performance? Gigarays/sec.

Some may be interested in the interpolation aspects of the pipeline, there are others interested in the raw performance of the ray tracing portion only. When someone has enough wherewithal and understanding to distinguish and detail the various components separately but choses not to in a such a way that largely makes performance numbers better, there are time tested reasons for it. They've achieved something great. There's no question about that. It's taken years of hard work and research. There's no question about that. It is a hybrid ray tracing solution, there's no question about that. There is a question about the performance of the hardware of this generation compared to the past generation. This question is best answered with apples to apples comparison. A series of new AA algorithms were invented. Great, how does it run on Turing? How does it run on Pascal? What's the difference in performance?

Apples to apples. No shenanigans. No slight of hand.
I got everything I need to get which is why I can piece off the shenanigans.

Scott_Arm · Aug 30, 2018

@dirtyb1t

ATAA casts rays in places where the TAA algorithm detects that the temporal information will blur/fail. It's a regular raster program without ray traced lighting, but it is using ray tracing to supplement failure cases to improve image quality. It's well explained. There isn't any deception here. This is a valid ray-tracing algorithm with performance between two non-equal but clearly identified cards. The algorithm speed number refers to the frame times of ATAA vs SSAA. The 7.9x number is comparing ATAA on Quadro to SSAA on Volta. That is also clearly stated. I dont' think Tensor has any factor in this part of the presentation.

The spatio-temporal guidance filtering part is pretty straight forward. It's quadro vs titan xp. Clearly stated. The path tracing portion runs 5.3x faster. They are shaded pixels, so this would include differences in shading performance between the two cards. The de-noising is shown as a separate metric, but again, reconstruction/de-noising will be essential to real-time ray-tracing performance.

The corrected de-noised area lights slide clearly state the hardware. It also tells you what parts of the final output are included/excluded from the metrics shown. In the example for 4 area lights per pixel, he says there is a filtering portion that is twice as fast, but the overall average is 5x as fast because of ray tracing performance. On the notes below it says it does not include the de-noising pass, so I'm curious as to what the filtering is, but if I went and referenced the paper he's talking about I'm sure I could figure it out.

There's nothing deceptive about this talk. It may not be the metrics you want, but calm down.

dirtyb1t · Aug 30, 2018

Scott_Arm said:
@dirtyb1t

ATAA casts rays in places where the TAA algorithm detects that the temporal information will blur/fail. It's a regular raster program without ray traced lighting, but it is using ray tracing to supplement failure cases to improve image quality. It's well explained. There isn't any deception here. This is a valid ray-tracing algorithm with performance between two non-equal but clearly identified cards. The algorithm speed number refers to the frame times of ATAA vs SSAA. The 7.9x number is comparing ATAA on Quadro to SSAA on Volta. That is also clearly stated. I dont' think Tensor has any factor in this part of the presentation.

The spatio-temporal guidance filtering part is pretty straight forward. It's quadro vs titan xp. Clearly stated. The path tracing portion runs 5.3x faster. They are shaded pixels, so this would include differences in shading performance between the two cards. The de-noising is shown as a separate metric, but again, reconstruction/de-noising will be essential to real-time ray-tracing performance.

The corrected de-noised area lights slide clearly state the hardware. It also tells you what parts of the final output are included/excluded from the metrics shown. In the example for 4 area lights per pixel, he says there is a filtering portion that is twice as fast, but the overall average is 5x as fast because of ray tracing performance. On the notes below it says it does not include the de-noising pass, so I'm curious as to what the filtering is, but if I went and referenced the paper he's talking about I'm sure I could figure it out.

There's nothing deceptive about this talk. It may not be the metrics you want, but calm down.

0.5x,1x,2x,3-10x,15x with a million and one variables involved
10 Gigarays
Pascal : ~1 Gigaray .. 5.3x = 5.3 Gigarays. Doesn't come up to 10 gigarays. A 2080ti is supposed to be capable of 10 Gigarays

So, here's the thing about all the handwaving/and technical details which I do understand...
Ultimately you give a figure and that figure sells the cards which is why you give it.
This figure happens to be measurable.
I take 10 ray tracing scenes of varied complexity. I run it on card 1,2,3,4,5,6,7.
I run them using the same set of core algorithms implemented in your driver.
I take the results and I compare the Gigarays/sec throughput.

This can be put on one slide. There are in fact zero slides of this whereas there are libraries of technical presentations talking about every other feature (mainly the denoising/quality portions) meanwhile ignoring that the performance of which varies on a per scene basis. In some scenes the speedup could be 0.5x.. in others 10x.
In your infinite wisdom, you know why this is. We don't need to pass white paper references off to each other to understand this.

Random sampling. 10 ray tracing scenes of varied complexity.
1080ti -> 2080ti
1080 -> 2080
What's the Ray/sec throughput.
Honest details... Honest Gigaray/sec.

Variability is all over the place which is why you standardize and produce honest clear and concise numbers. In a very complex scenes w/ very detailed static elements, they already suggest baking certain aspects for the very reason that performance (if that's all you want to focus on) comes to a grinding halt. Interpolation algorithms aren't a holy grail. They work when they work. If you desire a more professional and accurate image in a timely fashion, you'd want the GPU to be able to actual meet its lauded ray/sec throughput figure. If you're instead trying to cram 100FPS w/ a sprinkle of ray tracing w/ loads of interpolation and AI assisted Denoising, where accuracy matters less, then come the suite of speedup tools. Quadros are meant for the former. Geforce cards the later which is why you don't compare the two. Their Quadro line has pascal micro-architecture too right?

Apple and Orange comparisons of this kind are done for a time tested marketing based reason.
There's really no more I think i can productively say until very detailed (non-populist) benchmarks come out. Having dug threw a slew of information on this subject, including much better presentations from Apple, and having played with the Optix tooling myself. I'm not sold that I actually need an RTX card whereas the way its been marketed is that RTX cards have blanket 25x the ray/sec throughput as pascal. They don't. Nowhere near it.

CarstenS · Aug 30, 2018

dirtyb1t said:
I've already read the white paper some time ago found here :
https://research.nvidia.com/publication/2018-08_Adaptive-Temporal-Antialiasing
I get it exactly. I also get what is involved with ray tracing :

If that's so then why do you say

dirtyb1t said:
NVIDIA DLSS : AI-Powered Anti-Aliasing
Adaptive Temporal Antialiasing (ATAA)
Spot the difference? Both use tensor cores to get their speedup. Both can run on w/o Tensor cores on Pascal at a lower rate.

[my bold]
I actually read the white paper you linked to and I did not find tensor cores being mentioned. Maybe you can help me out here? Note that I'm not arguing tensor cores being used for denoising or DLSS, just the ATAA example you criticized to vigorously seems to do without them.

Nvidia Turing Speculation thread [2018]

trinibwoy

Meh

dirtyb1t

Deleted member 2197

Guest

Scott_Arm

Voxilla

Voxilla

Malo

Yak Mechanicum

dirtyb1t

dirtyb1t

dirtyb1t

CarstenS

Moderator

dirtyb1t

Scott_Arm

dirtyb1t

CarstenS

Moderator

dirtyb1t

dirtyb1t

Scott_Arm

dirtyb1t

CarstenS

Moderator

Similar threads