Next gen lighting technologies - voxelised, traced, and everything else *spawn*

Discussion in 'Rendering Technology and APIs' started by Scott_Arm, Aug 21, 2018.

  1. OlegSH

    Regular Newcomer

    Joined:
    Jan 10, 2010
    Messages:
    348
    Likes Received:
    219
    Sure, I took 15% out of total 47%, that's (7 ms/47)*15 = 2.23 ms in my calculations

    That's the issue, without knowing test scene, settings, etc, we can not compare these results with Turing ones.
    I remember there were press drivers with enabled RTX on GTX 1080 Ti (or was it the fallback layer?), the difference was drastic - https://www.ixbt.com/img/x780x600/r30/00/02/11/02/swreflectionsdemo.png
    Pretty sure Volta would still lag at least 2x in heavy regimes

    There is a power wall, it would be problematic to achieve more flops with more shader cores on the same tech process, GV100 is way wider, yet it achieves less flops than high end TU102 SKUs because both chips are power limited
    Pretty sure more general SMs would cost way more transistors, frequencies would be lower because of the mentioned power limitation and even with special SM instructions to accellerate ray-triangle intersection tests, it would still be much slower in RTX games. C'mon, NVIDIA has been working on optix for a decade, do you really think they cannot model and calculate such simple things? Give a little bit of credit to them.

    Nope - https://www.ixbt.com/img/r30/00/02/14/58/swreflectionsdemo.png
     
    #621 OlegSH, Dec 31, 2018
    Last edited: Dec 31, 2018
    OCASM likes this.
  2. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    13,104
    Likes Received:
    3,409
    I never really considered the power limits of the gpu before. Rt cores and tensor cores a way to add relatively low power transistors?

    Is that Star Wars thing available freely as a benchmark? Is it customizable at all?
     
  3. OlegSH

    Regular Newcomer

    Joined:
    Jan 10, 2010
    Messages:
    348
    Likes Received:
    219
    Sure, TCs provide much more flops at the same power due to higher data reuse, RT Cores are much more area and power efficient at ray-triangle intersection tests as these are specialized processors (it would have been insane to make them as wide as standart SMs)
     
    pharma and BRiT like this.
  4. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    2,581
    Likes Received:
    2,134
    Yes, I feel like without a controlled like for like scenes, the comparison with a Titan V is not really useful. There is also the possibility that the Titan V is not really running "proper" DXR in BFV.
    Could you provide that link again? I searched for it and I couldn't find anything on Titan V. However it was known that the Star Wars demo required 4 Titan Vs to run, but only one RTX 2080Ti.

    Yeah I read that the new driver fixed this problem, can't find the link though right now.
     
    BRiT likes this.
  5. JoeJ

    Regular Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    323
    Likes Received:
    365
    He gave quite accurate specs in following post and assured the comparison is fair, other than differing CPUs. He says he is able to tell the difference between RTX and SSR.
    I hope for some more guys with Titans and BFV. Maybe a new years joke :)

    https://forum.beyond3d.com/posts/2050012/
    Notice the DLSS available only on Turing, older GPUs render at full 4K. And check my math. Maybe i did it wrong and it would fit to the png @OlegSH has posted then.
     
  6. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    2,581
    Likes Received:
    2,134
    #626 DavidGraham, Jan 1, 2019
    Last edited: Jan 1, 2019
    OCASM likes this.
  7. JoeJ

    Regular Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    323
    Likes Received:
    365
    I disagree and get 7 ms of RTX time in total, and 15% tracing is only 1 ms.
    But no matter, if we take 2.23 ms tracing cost on turing, and turing is 10 times faster, then the tracing cost alone would be 22ms. Plus 7 ms raster would be even worse, just 30 FPS. But if the 10 times speedup would be true we should get back to the initial 68 fps (or only 15ms).
    We differ because you used RTX and i used Titan maybe, but in any case: Turing has not 10 times faster tracing. Otherwise both GPUs would not end up equally fast. We don't need any math for this - it's just obvious.
    It seems, Titan with a few hundred compute threads more (?) compensates missing RT cores, or not? I don't see how power targets are related here, other than distracting from the obvious.

    So you think you owe them credit because they are.. awesome??
    No! They want my money, and i want to know what i get for it.

    A decade of experience... i have that myself, and many others too. That's just necessary at least. And from that i know efficient raytracing is NOT a simple thing. If you think it's about a fast triangle test you are just wrong.
    We know GCN has issue to utilize CUs with rasterization. (some insiders know it has insane compute power instead, hihi)
    There is a rumor Vega has broken binning rasterizer.
    Is it so unbelievable awesome NV could make a 'mistake' too?

    Well... could be all fake. I hope it turns out.
     
  8. JoeJ

    Regular Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    323
    Likes Received:
    365
    No Titan, but fer we can compare GTX 10X0 GPUs vs RTX.
    I see the tests you posted show similar results.
    Now please convert FPS to time, and also consider GTX can NOT do DLSS, so it traces at much higher resolution than Turing. PCGH has pointed this out in text, otherwise i would not have spotted this myself.
    You should end up with the same result as i did (work score = real performance). I have ignored the DLSS cost here, also that BVH update is resolution independent, etc. So it's not that worse but still shocking.

    You really need to do the math yourself. FPS can be extremely misleading and together with the resolution mismatch just looking at benchmark bars seems unremarkable, but it is not. (assuming i did it correctly - i tend to mistake divisions and multiplications :) )

    Oh, i'm so sorry. Yes this was my mistake:

    2070: 19.8 fps = 50ms per frame x (3.8 x 2.1 res) = 399 work score
    1080ti: 10.1 fps = 100ms per frame x (1.4 x 2.5 res) = 350 work score

    should be:

    2070: 19.8 fps = (3.8 x 2.1 res) / 50ms per frame x = 0.23 work score
    1080ti: 10.1 fps = (1.4 x 2.5 res) / 100ms per frame x = 0.035 work score

    So RTX is 6.5 times faster here. Makes much more sense! Embrassing... :)

    Apologize the noise, guys! Feel free to clean up some of that BS...

    Sigh - both is wrong! all wrong... the 1080 renders at 4K, not the other way around.
    I end up with both being equally fast, but i do not dare to post the math... please help! Too much drinks already and better get some sleep... Happy new year! :)
     
    #628 JoeJ, Jan 1, 2019
    Last edited by a moderator: Jan 1, 2019
    Shifty Geezer likes this.
  9. Malo

    Malo YakTribe.games
    Legend Veteran Subscriber

    Joined:
    Feb 9, 2002
    Messages:
    6,683
    Likes Received:
    2,728
    Location:
    Pennsylvania
    I think the biggest difference between Pascal and Volta is the Tensors, which not only means DLSS but denoising, the latter being a significant factor in RTRT performance, not DLSS. I don't think Pascal gives us any hints as to Titan V performance in the Star Wars demo.
     
    pharma and DavidGraham like this.
  10. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    2,581
    Likes Received:
    2,134
    Titan V can't be directly compared to Pascal, because Titan V has DLSS and the denoising part of RTX.

    In general, a 2080Ti is 35% faster than a 1080Ti, at 4K the 1080Ti scores 4 fps, which means the 2080Ti without DLSS and RT will score 6 fps at best, however it does 33fps with them. A 5.5 folds increase directly from using DLSS + RT acceleration.

    [​IMG]

    If you don't like that math, you can compare the 1080Ti at 1440p to the 2080Ti at 4K DLSS results, though these results are not entirely representative as DLSS still incurs a performance hit on the 2080Ti, and DLSS only improves performance by 25~30%, but we will do it anyway:

    [​IMG]

    1080Ti @1440p: 10fps
    2080Ti @4K DLSS: 33 fps

    Now add 10fps or more to the 2080Ti to offset the DLSS hit (which is about 30% or more), and you end up with 40fps at the very least, which still means minimum 4 folds increase purely from the RT cores.
     
    #630 DavidGraham, Jan 1, 2019
    Last edited: Jan 1, 2019
    pharma and BRiT like this.
  11. pixeljetstream

    Newcomer

    Joined:
    Dec 7, 2013
    Messages:
    30
    Likes Received:
    60
    Happy new year. Unfortunately pcgh unnecessarily jumped the gun and prematurely made a lot of fuss.
    The latest numbers as well as different users trying to compare in Rotterdam map show improved perf for Turing.
     
    pharma likes this.
  12. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    39,710
    Likes Received:
    9,765
    Location:
    Under my bridge
    Isn't BFV using Frostbite's own denoising though? Or was that only the other demos like SW?
     
    iroboto likes this.
  13. JoeJ

    Regular Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    323
    Likes Received:
    365
    Just woke up, head hurts, but...

    2070: 19.8 fps = 50ms per frame x (3.8 x 2.1 res) = 399 work score
    1080ti: 10.1 fps = 100ms per frame x (1.4 x 2.5 res) = 350 work score

    should be simply

    2070: 19.8 fps = (1.4 x 2.5 res) / 50ms per frame = 0.07 RT work per frame
    1080ti: 10.1 fps = (3.8 x 2.1 res) / 100ms per frame = 0.079 RT work per frame

    My goal here is to remove DLSS from the equation, because we don't need tensor cores to upscale reflections. Bilateral filter and TAA will do. Likely as flickery as we saw in the GT demo, but this can be fixed with some extra work which is not expensive.
    We could start the obvious 'Do we need tensor cores for games' discussion now, but my long answer is: 'I can't use tensors for games at all yet - neither Cuda nor GameWorks is an option for me.' And my short answer is just no. You don't need NNs to upscale.

    Thanks NV, for showing me RT cores are not necessary. I think i understand why those demos are not public, and mostly why GTX GPUs have been disabled from running BFV. Likely those loyal customers better get a new RTX - they surely not want to see upscaled reflections on their already too high res screens.
    I now know enough. Targeting consoles anyways, i will focus on compute traced reflections. You showed me it will work, as i have assumed anyways.
    Also thanks for showing people that are willing to see, that i'm right with my claim 'we don't need fixed function for RT'. I do not proof it at all to them. You already did.

    Draw your own conclusions. I'll remove RTX from my todo list for now. (And work on my math skills instead :D )
     
    Voxilla likes this.
  14. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    2,581
    Likes Received:
    2,134
    I just showed you a minimum 4 folds (400%) increase in performance just due to RT cores! How are they not needed?!

    Your math here is wrong! You are assuming DLSS is cost free, it's not, it incurs performance hit even when upscaling from 1440p, the hit is variable depending on the original resolution, some have compared it to running the game on 1800p.
    Yes they use an inhouse denoising solution.
     
    OCASM, vipa899 and pharma like this.
  15. BRiT

    BRiT (╯°□°)╯
    Moderator Legend Alpha Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    11,370
    Likes Received:
    7,174
    Location:
    Cleveland
    I'm getting a bit confused in the Terminology and the discussion now. I thought the following were distinct and separate items in hardware, but some of whats written here uses them interchangeably ( @JoeJ ):
    1. Tensor Cores
    2. RayTracing Cores
    Here is what NVIDIA says about it: https://developer.nvidia.com/rtx

    @DavidGraham , are those two cores physically implimented as separate sections of hardware?

    For my sanity and for clarity in discussions can we please keep the terms straight?
     
  16. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    2,581
    Likes Received:
    2,134
    Yeah, separate specific fixed hardware for both.

    Volta has only Tensor cores, which can be used for AI upscale and for AI denoising for ray tracing, which means Volta can do DLSS and denoising at the hardware level for ray tracing, so it can theoretically run DXR although no acceleration will happen, except for the denoising part.

    Turing has Tensor cores and Ray Tracing cores (RT), which means it can accelerate BVH, and do DLSS and denoising at the hardware level.
     
    #636 DavidGraham, Jan 1, 2019
    Last edited: Jan 1, 2019
    OCASM, vipa899, iroboto and 2 others like this.
  17. pixeljetstream

    Newcomer

    Joined:
    Dec 7, 2013
    Messages:
    30
    Likes Received:
    60
    OCASM, Scott_Arm, vipa899 and 2 others like this.
  18. JoeJ

    Regular Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    323
    Likes Received:
    365
    Yes, i assume it is cost free because upscaling can be done cheep. I also assume the BVH building to be related to pixel count, which is wrong too.
    But i am only interestend in a coarse ratio to draw my occlusions. As a dev i am am interested in speedups of orders of magnitude, not percentages. I have achieved such speed ups multiple times through the years - it's common in software development.

    Yes they are seperate. Someone here on the forum pointed out they can not be programmed with game APIs, only with CUDA, which rules them out of my interest so i never brought them up.
    I'm no AI guy anyways and leave it to others to discuss tensors.

    I have seen anything of this. But what we talk here are real world end results, out of any influences or looking at individual speedups in isolation.

    I don not assume those cores to be useless, but for now it just a vendor extension.
    RT is the future, part of it. I agree with this, but with my target platforms i am happy to make lower res stuff which works for everything first. (I may change my mind...)
     
  19. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    2,581
    Likes Received:
    2,134
    It's not unfortunately, latest testing on Final Fantasy 15 reveals it has a hit of about 35% from native 1440p, however it gives 25% more performance than native 4K with TAA.

    [​IMG]
    http://i68.tinypic.com/20fure9.jpg
     
    pharma, OlegSH and vipa899 like this.
  20. OlegSH

    Regular Newcomer

    Joined:
    Jan 10, 2010
    Messages:
    348
    Likes Received:
    219
    That matches quite well with 29 FPS I calculated (+- for rounding error)

    I was talking about 2560x1440 resolution (I mentioned this several times), while 80 FPS with RTX Ultra can be achieved on Titan V only in 1080p and below, that's 1.78 less rays to trace

    Moreover, there is a geometry processing cost (skinning, other transformations, etc both in Raster and RT), which is a constant across all resolutions (mostly, except for adaptive LODs, tesselation and early CS-based backface/frustum/subpixel tris culling in Raster), hence Titan V will lose more as resolution grows (since tracing fraction will grow with resolution), so Titan V can easily be 1.7x slower at 1440p depending on how heavy the ray-tris intersection part is.

    2080 Ti is just 1.26x times slower going from 1080p to 1440p and 1.62x slower going from 1440p to 2160p, that's 2x slowdown going from 1080p to 4K.

    As for Titan V, there can easily be 4x slowdown because ray-tris intersection part will dominate at higher resolutions, which will result into 2x difference between 2080 Ti and Titan V at higher resolutions

    So all this comes down to a simple thing - how heavy the ray-tris intersection part is, obviously, the heavier it's, the more RTX GPUs will win, and the acceleration factor "x-times faster framerate", as I mentioned in the very beginning, comes down to Amdahl's law applied to accelerated part. So no surprises here at all.
     
    vipa899 and pharma like this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...