Nvidia Turing Speculation thread [2018]

Discussion in 'Architecture and Products' started by Voxilla, Apr 22, 2018.

Tags:
Thread Status:
Not open for further replies.
  1. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,405
    Likes Received:
    401
    Location:
    New York
    As others have already mentioned those optimized drivers already exist in the form of Optix. We already know it’s not fast enough so the whole notion of optimized DXR drivers is really a moot point.

    Well firstly that’s not real time. And secondly those are all static scenes, no particles etc. Not exactly a relevant point of comparison.
     
  2. dirtyb1t

    Newcomer

    Joined:
    Aug 28, 2017
    Messages:
    31
    Likes Received:
    27
    Yes, I have viewed it. Simply put : the fact that he gave a 1 Gigaray figure for Pascal (1080ti) in an unknown ray tracing scene whereas the more commonly held figures is around 400-500Mray/s exhibits the very skepticism I have with Turing. They're going to easily boast a more reasoned number by 2x or so for marketing purposes. When he gave a gigaray figure to 1080ti, I knew immediately that it is likely that they're using some hoccus pocus benchmarks for these 10/8/6 Gigaray figures for Turing. These tests aren't hard or complicated. You can them right now on Pascal on a slew of programs. So, the skepticism comes as to why they haven't shown them. Turing does 10/8/6 Gigaray? Yeah? In what scene? Show me it.
     
    trinibwoy likes this.
  3. pharma

    Veteran Regular

    Joined:
    Mar 29, 2004
    Messages:
    2,734
    Likes Received:
    1,467
    We should know soon enough once the reviews come out.
     
  4. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    13,103
    Likes Received:
    3,403
    @dirtyb1t The gigarays number are dumb, but if you look at the video I posted, Morgan McGuire is suggest 4-10x improvements in frame times for ray tracing algorithms with incoherent rays with turing vs pascal.
     
    pharma likes this.
  5. Voxilla

    Regular

    Joined:
    Jun 23, 2007
    Messages:
    702
    Likes Received:
    272
    The Gray/s metric has been put in question before.
     
  6. Voxilla

    Regular

    Joined:
    Jun 23, 2007
    Messages:
    702
    Likes Received:
    272
    Maybe the thread on this forum can enlighten you. also this one.
    "The goal of the compute based Fallback Layer (FL) is to provide the same C++ and shader API interface as DXR. This isn't completely possible, due to various reasons and design differences in DirectX Compute, but in the end the APIs are almost identical outside of few corner cases. I’ll refer you to the implementation details and limitations in the FL developer guide."

    And another link
    "The Fallback Layer uses DXR if a driver and OS supports it. Otherwise, it falls back to the compute pipeline to emulate raytracing. Developers aiming for wider HW support should target the Fallback Layer."
     
    #586 Voxilla, Aug 30, 2018
    Last edited: Aug 30, 2018
  7. Malo

    Malo YakTribe.games
    Legend Veteran Subscriber

    Joined:
    Feb 9, 2002
    Messages:
    6,674
    Likes Received:
    2,711
    Location:
    Pennsylvania
    Why does a device ID confirm what chip its using when there's no existing TU106 to compare?
     
  8. dirtyb1t

    Newcomer

    Joined:
    Aug 28, 2017
    Messages:
    31
    Likes Received:
    27
    This is more handwaving and actually scary how many tricks are in each slide.
    First off, ray tracing is not path tracing. They have completely different performance.
    Second.. There's raw ray tracing ray/s and then there's up samples, filtering/denoising/AA/sub-sampling. Each and every single one of the performance slides contains some convoluted gimmick. In some slides they compare Titan V to Quadro RTX 6000. In others they compare a $6,000 24GB Quadro RTX 6000 to not even a 1080ti but a 1080. Do I have "stupid" written on my forward Nvidia?
    Then, they carefully slide in "denoised" into the comparison once a Pascal Consumer GPU appears. Again, this his nothing to do w/ the ray tracing capability and everything to do w/ the upsample tensor cores. A 5x speedup in denoising? I'd hope so when you have 3x the memory and tensor cores. This was already a known with Pascal Titan V that too has Tensor cores. The speedups in their own slide deck are all over the place. In some cases its 50%+, 2x,3x,4x,5x,4-10x,15x.

    They compare Titan V to a Quadro RTX 6000 and there's a 50% speedup in one place, 300% speedup in another, and then comes some convoluted : Algorithm Speedup measure which you might as well try to use to con someone... Because it is benchmarked agains the older algorithm approach. So, if you don't read it properly the Quadro RTX 6000 makes a 7.9x speedup vs a Titan V. It doesn't because the Titan V is 2.4x and the RTX 6000 is 5.4x. So, the Quadro RTX 6000 is only about 2x faster than Titan V.

    The whole slide deck is filled with this. This is why you let engineers develop such slide decks and not manipulative marketing teams. What is this nonsense? It would literally insult anyone's intelligence in the room.

    The last column claims a speedup due to an AA (Anti-aliasing) algorithm? This has nothing to do w/ ray tracing and everything to do w/ the Tensor core upsample/Anti-aliasing/denoising.
    This is why they compared Titan V to a Quadro.. Because you need tensor cores for this. It's a joke to compare this to Pascal.

    Nvidia : Stop jerking people around and show a series of 10 progressively complex ray tracing scenes and the performance in Rays/s that the core ray tracing algorithm runs on various cards. Compare a 1080 to a 2080, a 1080ti to a 2080ti. A comparable Quadro to a comparable RTX Quadro. Then show a separate performance demo of what you get in speedup using an AI enabled denoising/sampling algorithm running on tensor cores vs w/o in Pascal.

    LOL, i want to punch a whole in the wall after going through this presentation. This makes me question these lauded Gigaray figures 100%.
     
  9. dirtyb1t

    Newcomer

    Joined:
    Aug 28, 2017
    Messages:
    31
    Likes Received:
    27
    Real-time is a joke and a marketing term in this context. Given the insane variability in quality and FPS, there's no such thing.
    Furthermore, real-time is only achievable because of the Tensor core based AI denoising/up sampling.
    The ray trace portion is still anything but real-time and ready for prime-time.
    All of the magic to make a frame have reasonable quality after a small amount of ray tracing involves the tensor cores that upsample, filter, and fill-in all of the details among the super noisy image.
    The rightful comparison would be to implement this "AI" algorithm in Cuda Cores and reflect the real difference. If 70% of the work is done by the AI algorithm interpolating and only 30% by actual ray tracing, it's a bit dishonest to wash this all over as real-time ray tracing. They've more correctly named this as hybrid ray tracing because they use a slew of of other pipeline components to produce a result. In correct comparison to the past, it's an apples to orange comparison because more correct ray tracing involves actual ray tracing to fill in the detail. There is little to no denoising.
     
  10. dirtyb1t

    Newcomer

    Joined:
    Aug 28, 2017
    Messages:
    31
    Likes Received:
    27
    DirectX is an API. It is not a Driver. DXR is a component of DirectX's API. It is the component that servers ray tracing. It too is an API
    There's nothing to discuss or dig into further here. Drivers are written by manufacturer of Hardware so that higher level APIs can access it. In the links you reference for enlightenment is the very thing I stated.
    Your first link :
    Translation : Nvidia will provide a driver that supports DXR. Again, Nvidia Driver. DXR = API

    Translation : They state themselves that all they're providing is an API interface. It could be C++,OpenCL,Arubifiednodejsklobernet.
    It's an API not a Driver.


    Real-time Ray tracing is a simple algorithm that can run on just about any GPU. How it runs on a particular GPU is specific to the hardware driver for the GPU. How this is mapped to DXR (Microsoft's API) is where the support for DXR comes that the hardware company has to author. Why is it that Microsoft can created a fallback Layer that's C++/Shader API? Because you don't need anything complicated to compute ray tracing. I can write a program that does this in a day.. It will just run like garabage. So like I said, the fallback layer is a generic mess that simply runs on all GPUs. It will run like garbage because it doesn't utilize specific hardware features. Optix runs "real-time" ray tracing and is specific to Nvidia. It will run faster than Microsoft's generic fallback layer. Download it from Nvidia's site and run it for yourself on a Pascal card. I can run the same box demo Jensen ran in under 5min of dling' and properly installing Optix on linux w/o DirectX. I'm not using DXR.

    Please slow down and understand what is being said.
     
  11. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,758
    Likes Received:
    1,994
    Location:
    Germany
    McGuire's presentation was about ATAA, not DLSS in the part in question.
     
  12. dirtyb1t

    Newcomer

    Joined:
    Aug 28, 2017
    Messages:
    31
    Likes Received:
    27
    Suggests comparing a Titan V ($3000) to a Quadro 6000 which costs ($6000) and has 24GB ram.
    Then gets even more creative and compares a Quadro 6000 to a $400 8GB 1080 Pascal.

    Speeds are anywhere from 0.5x to 15x and they're all timing related w/ hilarious footnotes detailing that the timings come from upsampling/denoising. Every other slide does this to throw you off. This is flat out manipulative nonsense. To sell the card they claim X gigarays/sec not time (msec). So, show me how that number is arrived. The msec measures are dumb as well because there is no benchmark scene or series of scenes and there is no apples to apples comparison or similar cards. They're comparing a $6000 card with 3x the memory of $400 card and claiming 15x performance.
    Guess what : $400x15 = $6000. I'd sure hope it has 15x performance. 1080 to 2080. 1080ti to 2080ti. Compare quadros to similar priced quadros. Cut out the shenanigans. People buying cards this expensive aren't dumb and its looks quite foolish of Nvidia to try to market to them like they are.

    Nothing is answered by comparing the performance of a $400 card 8GB card to a $6000 professional Quadro w/ 3x its memory.
    This is what's known as a marketing slide. Show me a 1080 compared to a 2080. A 1080ti compared to a 2080ti. Show me 10 varied ray tracing scenes w/ increasing complexity w/ no denoising or other tensor core gimmicks. Show me the ray/s and how that number is arrived at (shadow rays, primary rays, secondary rays, etc). Draw a line avg. through them. And Cut out the gimmicks. These are $800/$1000+ cards. Nvidia's left the neighborhood of being able to get by with such marketing gimmicks. They need to put up the orange to orange comparisons and shut up. 2080 does X Gigarays/sec? You want to sell me on that figure when it is held that Pascal does 400 Mrays/sec? Show me how it does X Gigarays. They haven't answered a single thing.
     
  13. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    13,103
    Likes Received:
    3,403
    Looks like I have a new member on my block list.

    Morgan McGuire is an engineer, not a marketing person. There are no claims that the cards being compared are equivalent. There is no deception. What's being compared is shown on the slides. De-noising is not a gimmick. Whether using tensor ccores or not, de-noising will be fundamental to pretty much all real-time ray tracing algorithms, so it's actually a valid aspect of ray-tracing performance.

    Also, Morgan McGuire has more info about performance numbers on this twitter thread



     
    #593 Scott_Arm, Aug 30, 2018
    Last edited: Aug 30, 2018
    pharma likes this.
  14. dirtyb1t

    Newcomer

    Joined:
    Aug 28, 2017
    Messages:
    31
    Likes Received:
    27
    Which has nothing to do w/ the raw performance of ray tracing and Gigaray/sec. It's a completely different topic and aspect of the pipeline.
    NVIDIA DLSS : AI-Powered Anti-Aliasing
    Adaptive Temporal Antialiasing (ATAA)
    Spot the difference? Both use tensor cores to get their speedup. Both can run on w/o Tensor cores on Pascal at a lower rate
    Both or one of many image cleanup algorithms to hide the results of the noisy and incomplete ray tracing results that are produced in "real-time" (less than 16ms).

    Rasterizer + Ray tracing -> Image cleanup -> What you see
    Rasterizer pipeline + Ray trace cores -> Tensor cores -> What you see
    [muh giga rays] -> [Muh magic pixie dust cleanup algorithm] -> what you see

    Lets not get hung up on the meme level algorithms.
    Performance in ray tracing [gigarays/sec] has nothing to do w/ the latest state of the art technique to cleanup/fake image quality after the fact. At least I hope they didn't pull a number out of their behind that combines the two pipelines.
     
  15. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,758
    Likes Received:
    1,994
    Location:
    Germany
    Yet it's (part of) what seemingly got you wound up so much.
    For a second I thought, you got it, when you asked if I spotted the difference. But then … I suggest you re-watch the video from about 7:15, where McGuire explains what they did with ATAA:
    http://on-demand.gputechconf.com/si...rgan-mcguire-ray-tracing-research-update.html
     
    pharma and Scott_Arm like this.
  16. dirtyb1t

    Newcomer

    Joined:
    Aug 28, 2017
    Messages:
    31
    Likes Received:
    27
    Do as you please. I'm an engineer and I am not allowed to do public exhibitions without my slide decks being revised and approved by a slew of people people from marketing as anyone would expect for any well formed company. My commentary has nothing to do w/ Morgan McGuire and everything to do w/ the clear issues with how the data is presented in the slide deck.

    Then what is the nature of your reply? I pointed out this fact as it seemed to be one that was glossed over when his presentation was referenced. You'd have to take me for an absolute fool to try compare a $6000 Professional card to a $400 Consumer GPU that isn't even the higher end of its class. What I claim is that this is not by accident. I also make the claim that to be honest and fair that one should compare apples to apples when making bold declarations as to the new performance of a micro-architecture. I shouldn't feel like I want to punch a hole through the wall when I pause your presentation, read the fine print at the bottom of the slide deck and look closely at how manipulative the numbers are when you do a comparison such that a 7.9x speedup upon first glance is more like a 2x speedup. Anyone in engineering and tech knows exactly where such slides come from such that my claims are substantiated.

    QUOTE="Scott_Arm, post: 2041451, member: 2873"]
    There is no deception. What's being compared is shown on the slides
    [/QUOTE]
    Fine print :
    Quadro 6000 [Professional card] 24GB RAM ($6000)
    Geforce GTX 1080 [Consumer card] 8GB RAM (~$400)
    Yes viewers, this is what were comparing on our slides. At the higher end in special measures, you get an amazing 15x speedup. If you're keen on basic math ($400x15) = $6,000. Quadro 6000 .. $6000.
    Get it?


    Yes, there clearly is no deception. This is the first thing that stood out to me and I wasn't impressed.

    Denoising is denoising
    Ray tracing is ray tracing.

    When a word describes a chain of processes, you are best and honestly served piecing them out when discussing performance.
    If you are trying to honestly do so, you do so. When you interleave and interweave them every other slide and change graphics cards every other slide when doing comparisons, this is either a glaring mistake or it results in confusion. Confusion that coincidently and significantly serves to over-state what you have accomplished.

    Lets stick to the facts/data. This is nothing to do w/ Morgan McGuire and everything to do w/ how manipulative the data has been about these cards. If you want to block me for pointing that out, so be it.
     
  17. dirtyb1t

    Newcomer

    Joined:
    Aug 28, 2017
    Messages:
    31
    Likes Received:
    27
    I've already read the white paper some time ago found here :
    https://research.nvidia.com/publication/2018-08_Adaptive-Temporal-Antialiasing
    I get it exactly. I also get what is involved with ray tracing :
    [​IMG]
    And where Gigaray/sec figures come from which has nothing to do w/ post processing. I understand that the overall pipeline towards presenting a quality image, involves other pieces. These later pieces are where various things like ATAA come in.

    So, if you want to talk about AA/Denoising/Shaders/Temporal AA and or NVIDIA DLSS AI-Powered Anti-Aliasing, or any other part of the pipeline and speedups therein, we can talk about that. If you want to talk about how new state of the art algorithms have replaced older ones to make traditional aspects of the quality of the ray trace output more performant, we can also talk about that too as a separate topic. I'm concerned with the above and Gigarays/sec. There might be cases where I want zero interpolation. What's the performance? Gigarays/sec.

    Some may be interested in the interpolation aspects of the pipeline, there are others interested in the raw performance of the ray tracing portion only. When someone has enough wherewithal and understanding to distinguish and detail the various components separately but choses not to in a such a way that largely makes performance numbers better, there are time tested reasons for it. They've achieved something great. There's no question about that. It's taken years of hard work and research. There's no question about that. It is a hybrid ray tracing solution, there's no question about that. There is a question about the performance of the hardware of this generation compared to the past generation. This question is best answered with apples to apples comparison. A series of new AA algorithms were invented. Great, how does it run on Turing? How does it run on Pascal? What's the difference in performance?

    Apples to apples. No shenanigans. No slight of hand.
    I got everything I need to get which is why I can piece off the shenanigans.
     
    #597 dirtyb1t, Aug 30, 2018
    Last edited: Aug 30, 2018
  18. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    13,103
    Likes Received:
    3,403
    @dirtyb1t

    ATAA casts rays in places where the TAA algorithm detects that the temporal information will blur/fail. It's a regular raster program without ray traced lighting, but it is using ray tracing to supplement failure cases to improve image quality. It's well explained. There isn't any deception here. This is a valid ray-tracing algorithm with performance between two non-equal but clearly identified cards. The algorithm speed number refers to the frame times of ATAA vs SSAA. The 7.9x number is comparing ATAA on Quadro to SSAA on Volta. That is also clearly stated. I dont' think Tensor has any factor in this part of the presentation.

    The spatio-temporal guidance filtering part is pretty straight forward. It's quadro vs titan xp. Clearly stated. The path tracing portion runs 5.3x faster. They are shaded pixels, so this would include differences in shading performance between the two cards. The de-noising is shown as a separate metric, but again, reconstruction/de-noising will be essential to real-time ray-tracing performance.

    The corrected de-noised area lights slide clearly state the hardware. It also tells you what parts of the final output are included/excluded from the metrics shown. In the example for 4 area lights per pixel, he says there is a filtering portion that is twice as fast, but the overall average is 5x as fast because of ray tracing performance. On the notes below it says it does not include the de-noising pass, so I'm curious as to what the filtering is, but if I went and referenced the paper he's talking about I'm sure I could figure it out.

    There's nothing deceptive about this talk. It may not be the metrics you want, but calm down.
     
    #598 Scott_Arm, Aug 30, 2018
    Last edited: Aug 30, 2018
  19. dirtyb1t

    Newcomer

    Joined:
    Aug 28, 2017
    Messages:
    31
    Likes Received:
    27
    0.5x,1x,2x,3-10x,15x with a million and one variables involved
    10 Gigarays
    Pascal : ~1 Gigaray .. 5.3x = 5.3 Gigarays. Doesn't come up to 10 gigarays. A 2080ti is supposed to be capable of 10 Gigarays
    [​IMG]
    So, here's the thing about all the handwaving/and technical details which I do understand...
    Ultimately you give a figure and that figure sells the cards which is why you give it.
    This figure happens to be measurable.
    I take 10 ray tracing scenes of varied complexity. I run it on card 1,2,3,4,5,6,7.
    I run them using the same set of core algorithms implemented in your driver.
    I take the results and I compare the Gigarays/sec throughput.

    This can be put on one slide. There are in fact zero slides of this whereas there are libraries of technical presentations talking about every other feature (mainly the denoising/quality portions) meanwhile ignoring that the performance of which varies on a per scene basis. In some scenes the speedup could be 0.5x.. in others 10x.
    In your infinite wisdom, you know why this is. We don't need to pass white paper references off to each other to understand this.

    Random sampling. 10 ray tracing scenes of varied complexity.
    1080ti -> 2080ti
    1080 -> 2080
    What's the Ray/sec throughput.
    Honest details... Honest Gigaray/sec.

    Variability is all over the place which is why you standardize and produce honest clear and concise numbers. In a very complex scenes w/ very detailed static elements, they already suggest baking certain aspects for the very reason that performance (if that's all you want to focus on) comes to a grinding halt. Interpolation algorithms aren't a holy grail. They work when they work. If you desire a more professional and accurate image in a timely fashion, you'd want the GPU to be able to actual meet its lauded ray/sec throughput figure. If you're instead trying to cram 100FPS w/ a sprinkle of ray tracing w/ loads of interpolation and AI assisted Denoising, where accuracy matters less, then come the suite of speedup tools. Quadros are meant for the former. Geforce cards the later which is why you don't compare the two. Their Quadro line has pascal micro-architecture too right?

    Apple and Orange comparisons of this kind are done for a time tested marketing based reason.
    There's really no more I think i can productively say until very detailed (non-populist) benchmarks come out. Having dug threw a slew of information on this subject, including much better presentations from Apple, and having played with the Optix tooling myself. I'm not sold that I actually need an RTX card whereas the way its been marketed is that RTX cards have blanket 25x the ray/sec throughput as pascal. They don't. Nowhere near it.
     
    #599 dirtyb1t, Aug 30, 2018
    Last edited: Aug 30, 2018
  20. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,758
    Likes Received:
    1,994
    Location:
    Germany
    If that's so then why do you say
    [my bold]
    I actually read the white paper you linked to and I did not find tensor cores being mentioned. Maybe you can help me out here? Note that I'm not arguing tensor cores being used for denoising or DLSS, just the ATAA example you criticized to vigorously seems to do without them.
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...