Ray-Tracing, meaningful performance metrics and alternatives? *spawn*

Discussion in 'Rendering Technology and APIs' started by Scott_Arm, Aug 21, 2018.

  1. troyan

    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    94
    Likes Received:
    134
    Turing supports every feature of Volta: https://devblogs.nvidia.com/cuda-turing-new-gpu-compute-possibilities/
     
  2. pixeljetstream

    Newcomer

    Joined:
    Dec 7, 2013
    Messages:
    30
    Likes Received:
    60
    @JoeJ. I fail to see how research just using compute is blocked. But I also disagree with you that a 2x or more is not worth putting in hw.

    As for API design, here I think it's totally valid to criticize the exact abstraction chosen. But you have to start somewhere.
    If you start too low, it's not ideal to make changes later due to backwards compatibility. Too high can leave some performance/multi use on the table. So this will always be tough. However imo from high to low is a lot easier. But I agree that over time it shouldn't stay as high as is. The experience from now and the feedback from sw and the other hw vendors will help. It's hard it needs multiple iterations. Even rasterization isn't "done" and even there we don't expose low level rasterization details, despite twelve DX versions ;)

    Developers are using the api differently already, we have extensions for texture space shading etc.

    Now yes it costs a shitload of money to make all that happen, so being first and a lot of the internals being protected by NDAs and IP frameworks like Khronos etc. is natural. But that is business as usual.
     
  3. OlegSH

    Regular Newcomer

    Joined:
    Jan 10, 2010
    Messages:
    338
    Likes Received:
    184

    Funny story, these were added to Turing and are listed in Uniform Datapath Instructions
    But I am pretty sure uniform instructions were added to save on power and vector register space because integer SIMDs are already decoupled in Turing, hence the same math can be done via more general purpose integer SIMD units concurrently to FP ops
    DX RT should be quite stressing on resources handling, so both vector and uniform integer pipelines should be usefull and can likely be executed concurrently with some overlap

    Why do you keep repeating this?
    DX8,9,10 are different pipelines.
    DX RT has DX12 binding model and it's the first time GPGPU controls work creation, data expansion, work scheduling / distribution / termination, etc for fixed function units.
    In rasterisation, FFP hardware controls all these aspects (with partial exception of Mesh Shaders), pixel / other shaders are fed by FFP blocks and are slaves to raster graphics pipeline (they can't manage data expansion, work scheduling / distribution / termination, etc).
    In DX RT, you can write your own intersection shaders and probably even overlap these with standard accelerated intersections tests, that's actually much more flexible than current raster pipeline, where the whole pipeline is strictly serialized and controlled by FP hardware.
     
    #663 OlegSH, Jan 2, 2019
    Last edited: Jan 2, 2019
  4. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    2,427
    Likes Received:
    1,811
    The jury is still out on this one, we need to see controlled testing first, preferably with IQ validation as well.
     
    pharma and iroboto like this.
  5. JoeJ

    Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    133
    Likes Received:
    163
    In my particular case i already have a BVH of surfels in place, which works fine for GI and rough reflections (for the latter i store 4x4 or 8x8 env maps for each texel of a lightmap and update them in real time, similar to Many LoDs. SH etc. would work as well)
    For GI this is all fine: Fully dynamic large scene, Infinite bounces, area lights from emitting surfaces and area shadows all for free. I calculate visibility with RT in compute efficiently and there is no need for RTX here. Algorithm has time complexity pretty much of O(n), which is remarkable for GI.
    But surfels have a spatial maximum of detail (targeted 10cm for current gen), and env maps have limited angular precision because of low resolution.

    So i want some classical RT as well to add high frequency details, mainly sharp reflections, and maybe some hard shadows. (Shadow maps or just living with the soft GI shadows are options too)
    I'm optimistic at this point a very big step towards photorealism is possible with current hardware. RTX can do all things well that i can do not, and the other way around.
    In the future one might want to add some local high frequency GI, finally my stuff could still serve as a fallback after 2nd or 3rd hit within a pathtracer, and after that my work would be no longer needed.

    The natural way to add sharp reflections now would be to use classical RT at triangles close at the origin, at some distance fall back to my surfels because here we could solves the LOD problem as well, and at large distance fall back to the env map.

    As it is now, to do so i have to maintain two BVHs, and fallback to surfels can not utilize sharing of my custom data in LDS for multiple rays within a DXR custom intersection shader.
    It is thus possible that a compute only solution could handle the tracing more efficiently, even in comparison to RTX (accepting the limitation of 'no custom shaders for triangle hits', which in my case of already available low res shaded textures makes sense.)

    But likely i wont try this. Because competing a hardware solution sounds just dump. So, (after swallowing my anger :) ) i will just use RTX as you intend, and i will discontinue my 'research' in this direction.
    In contrast, assuming there would be no RT cores i had to utilize, experimenting with all this would still make sense and i might end up with a solution that is more efficient.

    Now seeing RT cores struggle a bit to deliver, i'm frustrated and all that RTX thing appears wrong to me again... it's an emotional issue.
    I feel guilty and sorry for the noise. But because game developers make smaller steps while making games, i also feel the need to bring up such topics right now, assuming i would be indeed ahead with GI. (Not sure yet - my need for seamless global parametrization is a big downside - making automated tool is hard, only after that i can test my stuff on real game geometry.)

    So i hope the above example makes sense to you. Getting heard is all i want.



    But now to another point, and i think this one is more important, not only for me:

    We finally need a way GPUs can generate their own work! Please bring this to game APIs. I'm fine with vendor extensions.
    Stacking up large brute force workloads as it is common in games does not work for everything.
    My command buffers are full of indirect dispatches and barriers that do zero work at runtime and cost sums up.
    Async compute can compensate, but the sync between multiple queues is too expensive to make it a win for small planning workloads. (On GCN, never tried on any NV)
    Launching compute shaders from compute shaders is what we need.

    So whatever you have, please expose it. VK device generated command buffers are nice, but no barriers so it's useless for me.
    Seeing one additional shader stage introduced every year but no progress here really is frustrating. (Or did i miss something? I did no GPU work during last year.)


    I repeat myself because i never know if people understand what i mean. I'm new here and don't know anyone or what they do, and i'm no hardware expert or enthusiast either.
    What i want is this functionality exposed to compute shaders in Vulkan - i would change to DX12 as well to get it. But CUDA is no option, and OpenCL 2.0 neither.
    RTX shows there is really fine grained work sheduling under the hood, but i need this to be available in compute.
     
  6. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    39,350
    Likes Received:
    9,323
    Location:
    Under my bridge
    It's going to be hard to be heard if you aren't showcasing actual results. ;) If you have demonstrations showing what you're doing and what you're wanting and how current compute works with it and how future developments could be handled, the discussion would be more than a theoretical feet-dragging.
     
    vipa899 and Ike Turner like this.
  7. pixeljetstream

    Newcomer

    Joined:
    Dec 7, 2013
    Messages:
    30
    Likes Received:
    60
    Thanks. Things are clearer now.
    Not sure what this is for, but assuming it's mostly personal project, do "both" and experiment like crazy. You can record your custom hits and shade later in compute and compare.

    "It depends" is universally true, so it's unlikely you get a "do this" reply. The benefit of it being new, there is no right or wrong atm ;) just people experimenting, making experience, and repeat.
    Also the work you invested so far, even if slower is not wasteful, you learn a lot by doing it all manually...

    If that level of graphics is your passion, if the goal of the project is something else and the raytracing aspect is just a part, I would go with whatever is least amount of work and focus on the other stuff more important to the project.

    And don't forget not to pressure yourself too much about right or wrongs. These other developers are pretty big with lots of veterans and yet it takes time to make experience.

    BTW I've implemented most of the device-generated extension, so I am happy you like that route. I guess you would prefer something like the GL NV commandlist with indirect tokens for barriers?
    But do you really need a lot of runtime generated barrier points?
     
    pharma, OCASM and JoeJ like this.
  8. JoeJ

    Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    133
    Likes Received:
    163
    But it would be a lot harder to sell after it has been shown in public: Someone might have figured out how it works, and the value decreases ;)
    AFAIK this even makes it harder to get a patent, if who ever is (hopefully) interested wants this.

    However, just imagine a low poly quake level with moving boxes and metal balls with lightmaps updating in real time... not so impressive anyways.
    I have not even a render framework - need to start from scratch with this, after finishing damn geometry processing work...

    So i'm just average Joe, not John Carmack :)
     
  9. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    39,350
    Likes Received:
    9,323
    Location:
    Under my bridge
    Understandable, but then you have to accept that discussion will be limited. "I wish nVidia didn't do it this way but I can't talk about it," ain't the subject of great conversation.

    Although just a YT video of realtime visuals without any explanations shouldn't give anyone any clues as to how its done.
     
    vipa899, pharma and eloyc like this.
  10. JoeJ

    Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    133
    Likes Received:
    163
    Yes sounds good!
    I need the barriers mostly for cases like this:
    Process one level of a tree
    barrier
    process the next level (depending on parent data)
    barrier and continue with levels deeper down

    Mostly the processing does zero work so when generating on GPU the count would reduce a lot already, but i definitely need them (i guess maybe still about 30-50 per frame).
    Another, simpler option would be to skip over commands (including the barriers) in the command lists on GPU side. Not so advanced but it would already fix that case.
    OpenCL 2.0 is more flexible here, but with the above i would be already happy. CL smells too high level to be hardware friendly anyways.

    I don't know if GL NV commandlist supports addressing multiple queues. If it's easily possible that would be nice too - but not so much of an requirement.


    Many Thanks! :) :) :)
     
  11. OCASM

    Regular Newcomer

    Joined:
    Nov 12, 2016
    Messages:
    649
    Likes Received:
    608
    The Dreams approach certainly has benefits but performance isn't one of them. You say huge worlds but so far we've only seen small scenes (AFAIK). As you say it's noisy and doesn't support the standard PBR workflow. It works for what it's trying to accomplish but I wouldn't recommend it for the majority of games.

    I do agree on the point of softness vs sharpness. Too shap looks fake. In addition to avoiding ultra high resolutions games should support light diffusion effects as well.
     
  12. milk

    Veteran Regular

    Joined:
    Jun 6, 2012
    Messages:
    2,469
    Likes Received:
    1,875
    Joe Jarmack?
     
    JoeJ, vipa899, Scott_Arm and 2 others like this.
  13. Dictator

    Newcomer

    Joined:
    Feb 11, 2011
    Messages:
    54
    Likes Received:
    53
    Just want to say, that the idea of Titan V being the same performance wise as an RTX 2080 Ti in BFV is a bit problematic right at the start. It goes against all the information we have about the performance advantage the RT core provides for ray triangle intersections - it seems like sensationalist reporting not looking into the specifics to claim to the contrary.

    I could imagine the following without testing it since I do not have a Titan V: the Fallback layer for Volta is programmatically and visually different, a direct comparison of its performance is made troubled then. When talking with DICE render devs in August they mentioned then how the game was much faster with RTX and they were used to deving at much lower framerates with Titan V at worse quality (optimisations to the BVH structure and more execessive snapping were there in the gamescom build at the time as a hold-over from deving on Volta).

    Also we know from a number of presentations now that the ray/triangle intersection is just one part of the expense of using RT - and differs from scene to scene and effect to effect. Surely we can all imagine a scenario where the greatest performance bottleneck in a given scene is not the ray triangle intersection bit of RT, but the surface shading and denoising, thus shifting away from the importance of the RT core to change performance.

    I won't be testing this sadly, but I think GamersNexus might! I hope they look at the visuals on the Titan V as well in comparsion to RTX, that should be neat to see if they really are the same.

    EDIT - should I have posted this in the BFV RT thread? :D
     
    DavidGraham, pharma and iroboto like this.
  14. Samwell

    Newcomer

    Joined:
    Dec 23, 2011
    Messages:
    108
    Likes Received:
    123
    I have a question about the Titan V vs RTX thing in BFV and RT in general. Is the number of rays having an big impact on shading requirements with Raytracing or is it mostly independent, as the shading cost are growing because rays are hitting objects outside screenspace. More Rays don't hit so many more objects outside of screenspace and don't add so much shading cost?

    In BF V RT is at a max of 0.4 Rays per Pixel, in the test cases from the user it might even be at 0.1 Rays per Pixel, if there aren't many reflections. So it's not really surprising the RT cores aren't helping much. If we look at Remedys Control numbers, it's 5 ms Titan V vs 1ms RTX with 1 Ray per Pixel. Bad simplification: 0.5 ms on Titan V vs 0.1 ms on RTX for 0.1 Rays per Pixel. This would then lead to numbers, where the impact of RT cores is negligible. But if the shading cost aren't changing so much, they might increase the number of Rays a lot for RTX.
     
    pharma likes this.
  15. Dictator

    Newcomer

    Joined:
    Feb 11, 2011
    Messages:
    54
    Likes Received:
    53
    A nice way to look at it I think and a good point. Also that would mean pumping up the resolution would increase ray count and then change the importance of the RT core. Maybe there are similarities of performance at lower resolution/lower ray count.
     
  16. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    13,020
    Likes Received:
    3,281
    https://www.3dcenter.org/news/raytr...uft-mit-guten-frameraten-auch-auf-der-titan-v

    Looks to me like Titan RTX beats TitanV quite easily in the reflection heavy maps (rotterdam). It's about 40 or 50% faster though these benchmarks are pretty simple.

    I guess that goes back to the question of what BFV is actually doing. How much of the % difference is from shading performance. Would be nice to see both cards with RTX off to get a baseline performance difference.

    Spec-wise, Titan RTX does not have any major performance advantages for general gaming except a huge amount of memory, that I'm not sure BFV could even take advantage of. TFlop/s and bandwidth are roughly equal.

    Edit: In a roundabout way, this may actually explain what BFV is doing. If a map were primarily casting screen-space rays, you would expect performance to be roughly equal. The performance divergence suggests that there is a significant burden of DXR rays on-screen in Rotterdam map, even after the patch.
     
    #676 Scott_Arm, Jan 2, 2019
    Last edited: Jan 2, 2019
    pharma, vipa899, iroboto and 2 others like this.
  17. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    2,427
    Likes Received:
    1,811
    These are user generated tests still, and they only use High DXR, I believe at Ultra DXR, and at bigger resolutions than 1080p, the difference would be higher.

    Notice the Titan V in question is running OC'ed to 2.0GHz, which gives it 20TFLOPs. Titan RTX @2.0GHz gets you 18.4 TFLOPs.
     
    #677 DavidGraham, Jan 2, 2019
    Last edited: Jan 2, 2019
    pharma and BRiT like this.
  18. JoeJ

    Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    133
    Likes Received:
    163
    That's not quite accurate. Performance of splatting is totally dynamic. You can think of it as a form of rasterization but with built in LOD mechanism.
    In Dreams they do this by generating points at the surface with uniform distribution but irregular. At distance they just skip over a subset of points. (Think of instead drawing triangles, draw only the texels of its texture as points, and at distance you go up one mip level so the number reduces to 1/4th)
    They have videos where people copy paste parts of the scene to build a city in very short time, and there seems little FPS drop. (I'm puzzled by that myself, because i doubt they use any form of hidden surface removal - just insane brute force compute power?)

    Having no support for LOD ist really the main limitation of triangle based rasterization and RT. Computer graphics is mainly about two problems: Visibility and LOD. The rise of GPUs has put the latter out of our attention a bit, but it is still important, especially if we aim for GI in realtime.
    This is the major point why i am not so convinced about fixed function hardware, because LOD is a open problem everywhere and solving it always disagrees with FF HW.

    PBR works with splatting without any issue. I only meant for Dreams they did not do it because the content is made by the players and they would not like to place environment probes manually, and the devs have very different goals for the artstyle anyways (painterly).
    If you read the paper, the programmer has experimented with all kinds of awesome high tech, but it was the artist that has pushed him to do 'boring' splatting, which gave the artistic results finally as intended.


    I agree it's not meant to replace triangles for games yet, but at increasing detail levels it would beat triangle rasterization at some point without any doubt. (A graph of triangles is a very complex data structure in comparison to a point hierarchy, but both are just an approximation.)
    I expected rasterization HW to become deprecated and finally removed from GPUs. Just compute would remain, and texture filters of course. No limitations. I still think it will happen this way and also RT cores will disappear again... but i see it will take much longer than i hoped for :)
    I might sound unrealistic here, but we made raster HW to put triangles on screen fast. Now this is the smallest problem we have. For GI we need to 'render' the scene from any point, not just for the eye. We have a very different problem now than we have had 20 years ago.
     
  19. chris1515

    Veteran Regular

    Joined:
    Jul 24, 2005
    Messages:
    3,150
    Likes Received:
    1,708
    I don't find the tweet now but it seems Dreams rendering has evolved, point splatting had too much holes, now they mix points splatting and raymarching cubes ( it was a tweet by Alex Evans).

    And it seems they use hybrid raytraced shadows/shadow maps

     
    milk, pharma, BRiT and 1 other person like this.
  20. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    39,350
    Likes Received:
    9,323
    Location:
    Under my bridge
    No-one should use Dreams as a reference until it's released in a matter of weeks, followed by a Siggraph/Devcon presentation.
     
    OCASM, Malo and pharma like this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...