Next gen lighting technologies - voxelised, traced, and everything else *spawn*

Discussion in 'Rendering Technology and APIs' started by Scott_Arm, Aug 21, 2018.

  1. pixeljetstream

    Newcomer

    Joined:
    Dec 7, 2013
    Messages:
    30
    Likes Received:
    60
    Personally I am more on the rasterization side of things, but I think having an alternative to 3d textures for spatial lookups is great, and I see dxr just as a beginning to something more generic after a bit of evolution.
     
  2. JoeJ

    Regular Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    323
    Likes Received:
    365
    I do not doubt their expertise. They know much more about RT then i do. And about hardware anyways.

    Although i see serious issues here with bringing the performance to the street, i guess we will see RT cores to take off with following GPUs.
    But as it is now the hardware seems far from optimal, with a questionable need for RT cores at all, and TitanV == 2080ti in BFV really proofs this, or not? But due the lack of competition their success IS guaranteed anyways.

    And to make adoption easy and hinder competition, they protect their 10y of experience with an API that is more like OpenGL 1.0 than DX12, and they stamp questionable FF HW into silicon which can only be underutilized as long as we are hybrid.
    We will be much longer hybrid than necessary, i'm afraid of.

    But the research is now entirely in their hands, locking out all the other experts with 10yo experience. They become just consumers with further inventions prevented.
    A more open and flexible approach would have been better and also possible as we see now. I would LOVE TitanV kind of GPU without RT cores and fine grained sheduling exposed to GPGPU as well - it would have been f###ing perfect!!!
    It's 2019 - why do you think we need to go through DX8,9,10... again, just because it's about rays?

    I do not get why everybody just follows without any doubt and critique. Just adopting and moving on seems the only option, but i'm just not happy with that.
     
  3. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    39,710
    Likes Received:
    9,764
    Location:
    Under my bridge
    Titan V is bigger and more expensive.

    Because they're more willing to accept the compromises needed to get realtime raytracing into affordable hardware. ;) I haven't followed the numbers of this thread to know how RTX compares to non-RTX other than it accelerates aspects by a significant margin, but the lack of clear consensus shows it's not at all obvious as you imply, that everyone should be aware of the disadvantage RT cores brings. The data should be obviously "RT cores don't accelerate anything so what's the point?" for your argument to make sense. Your last numbers show a 2.4x speed increase with RT cores. That's significant.

    At this point though, it's all wild speculation. Until someone (AMD) brings non-RT-core raytracing to the GPU to compare, it's only theoretical improvements one could make in software versus theoretical performance from a fixed-function unit and a couple of GPU comparisons. It's far from conclusive data.
     
    OCASM, pharma and DavidGraham like this.
  4. troyan

    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    119
    Likes Received:
    179
    Turing supports every feature of Volta: https://devblogs.nvidia.com/cuda-turing-new-gpu-compute-possibilities/
     
  5. pixeljetstream

    Newcomer

    Joined:
    Dec 7, 2013
    Messages:
    30
    Likes Received:
    60
    @JoeJ. I fail to see how research just using compute is blocked. But I also disagree with you that a 2x or more is not worth putting in hw.

    As for API design, here I think it's totally valid to criticize the exact abstraction chosen. But you have to start somewhere.
    If you start too low, it's not ideal to make changes later due to backwards compatibility. Too high can leave some performance/multi use on the table. So this will always be tough. However imo from high to low is a lot easier. But I agree that over time it shouldn't stay as high as is. The experience from now and the feedback from sw and the other hw vendors will help. It's hard it needs multiple iterations. Even rasterization isn't "done" and even there we don't expose low level rasterization details, despite twelve DX versions ;)

    Developers are using the api differently already, we have extensions for texture space shading etc.

    Now yes it costs a shitload of money to make all that happen, so being first and a lot of the internals being protected by NDAs and IP frameworks like Khronos etc. is natural. But that is business as usual.
     
  6. OlegSH

    Regular Newcomer

    Joined:
    Jan 10, 2010
    Messages:
    348
    Likes Received:
    219

    Funny story, these were added to Turing and are listed in Uniform Datapath Instructions
    But I am pretty sure uniform instructions were added to save on power and vector register space because integer SIMDs are already decoupled in Turing, hence the same math can be done via more general purpose integer SIMD units concurrently to FP ops
    DX RT should be quite stressing on resources handling, so both vector and uniform integer pipelines should be usefull and can likely be executed concurrently with some overlap

    Why do you keep repeating this?
    DX8,9,10 are different pipelines.
    DX RT has DX12 binding model and it's the first time GPGPU controls work creation, data expansion, work scheduling / distribution / termination, etc for fixed function units.
    In rasterisation, FFP hardware controls all these aspects (with partial exception of Mesh Shaders), pixel / other shaders are fed by FFP blocks and are slaves to raster graphics pipeline (they can't manage data expansion, work scheduling / distribution / termination, etc).
    In DX RT, you can write your own intersection shaders and probably even overlap these with standard accelerated intersections tests, that's actually much more flexible than current raster pipeline, where the whole pipeline is strictly serialized and controlled by FP hardware.
     
    #666 OlegSH, Jan 2, 2019
    Last edited: Jan 2, 2019
  7. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    2,581
    Likes Received:
    2,134
    The jury is still out on this one, we need to see controlled testing first, preferably with IQ validation as well.
     
    pharma and iroboto like this.
  8. JoeJ

    Regular Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    323
    Likes Received:
    365
    In my particular case i already have a BVH of surfels in place, which works fine for GI and rough reflections (for the latter i store 4x4 or 8x8 env maps for each texel of a lightmap and update them in real time, similar to Many LoDs. SH etc. would work as well)
    For GI this is all fine: Fully dynamic large scene, Infinite bounces, area lights from emitting surfaces and area shadows all for free. I calculate visibility with RT in compute efficiently and there is no need for RTX here. Algorithm has time complexity pretty much of O(n), which is remarkable for GI.
    But surfels have a spatial maximum of detail (targeted 10cm for current gen), and env maps have limited angular precision because of low resolution.

    So i want some classical RT as well to add high frequency details, mainly sharp reflections, and maybe some hard shadows. (Shadow maps or just living with the soft GI shadows are options too)
    I'm optimistic at this point a very big step towards photorealism is possible with current hardware. RTX can do all things well that i can do not, and the other way around.
    In the future one might want to add some local high frequency GI, finally my stuff could still serve as a fallback after 2nd or 3rd hit within a pathtracer, and after that my work would be no longer needed.

    The natural way to add sharp reflections now would be to use classical RT at triangles close at the origin, at some distance fall back to my surfels because here we could solves the LOD problem as well, and at large distance fall back to the env map.

    As it is now, to do so i have to maintain two BVHs, and fallback to surfels can not utilize sharing of my custom data in LDS for multiple rays within a DXR custom intersection shader.
    It is thus possible that a compute only solution could handle the tracing more efficiently, even in comparison to RTX (accepting the limitation of 'no custom shaders for triangle hits', which in my case of already available low res shaded textures makes sense.)

    But likely i wont try this. Because competing a hardware solution sounds just dump. So, (after swallowing my anger :) ) i will just use RTX as you intend, and i will discontinue my 'research' in this direction.
    In contrast, assuming there would be no RT cores i had to utilize, experimenting with all this would still make sense and i might end up with a solution that is more efficient.

    Now seeing RT cores struggle a bit to deliver, i'm frustrated and all that RTX thing appears wrong to me again... it's an emotional issue.
    I feel guilty and sorry for the noise. But because game developers make smaller steps while making games, i also feel the need to bring up such topics right now, assuming i would be indeed ahead with GI. (Not sure yet - my need for seamless global parametrization is a big downside - making automated tool is hard, only after that i can test my stuff on real game geometry.)

    So i hope the above example makes sense to you. Getting heard is all i want.



    But now to another point, and i think this one is more important, not only for me:

    We finally need a way GPUs can generate their own work! Please bring this to game APIs. I'm fine with vendor extensions.
    Stacking up large brute force workloads as it is common in games does not work for everything.
    My command buffers are full of indirect dispatches and barriers that do zero work at runtime and cost sums up.
    Async compute can compensate, but the sync between multiple queues is too expensive to make it a win for small planning workloads. (On GCN, never tried on any NV)
    Launching compute shaders from compute shaders is what we need.

    So whatever you have, please expose it. VK device generated command buffers are nice, but no barriers so it's useless for me.
    Seeing one additional shader stage introduced every year but no progress here really is frustrating. (Or did i miss something? I did no GPU work during last year.)


    I repeat myself because i never know if people understand what i mean. I'm new here and don't know anyone or what they do, and i'm no hardware expert or enthusiast either.
    What i want is this functionality exposed to compute shaders in Vulkan - i would change to DX12 as well to get it. But CUDA is no option, and OpenCL 2.0 neither.
    RTX shows there is really fine grained work sheduling under the hood, but i need this to be available in compute.
     
  9. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    39,710
    Likes Received:
    9,764
    Location:
    Under my bridge
    It's going to be hard to be heard if you aren't showcasing actual results. ;) If you have demonstrations showing what you're doing and what you're wanting and how current compute works with it and how future developments could be handled, the discussion would be more than a theoretical feet-dragging.
     
    vipa899 and Ike Turner like this.
  10. pixeljetstream

    Newcomer

    Joined:
    Dec 7, 2013
    Messages:
    30
    Likes Received:
    60
    Thanks. Things are clearer now.
    Not sure what this is for, but assuming it's mostly personal project, do "both" and experiment like crazy. You can record your custom hits and shade later in compute and compare.

    "It depends" is universally true, so it's unlikely you get a "do this" reply. The benefit of it being new, there is no right or wrong atm ;) just people experimenting, making experience, and repeat.
    Also the work you invested so far, even if slower is not wasteful, you learn a lot by doing it all manually...

    If that level of graphics is your passion, if the goal of the project is something else and the raytracing aspect is just a part, I would go with whatever is least amount of work and focus on the other stuff more important to the project.

    And don't forget not to pressure yourself too much about right or wrongs. These other developers are pretty big with lots of veterans and yet it takes time to make experience.

    BTW I've implemented most of the device-generated extension, so I am happy you like that route. I guess you would prefer something like the GL NV commandlist with indirect tokens for barriers?
    But do you really need a lot of runtime generated barrier points?
     
    pharma, OCASM and JoeJ like this.
  11. JoeJ

    Regular Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    323
    Likes Received:
    365
    But it would be a lot harder to sell after it has been shown in public: Someone might have figured out how it works, and the value decreases ;)
    AFAIK this even makes it harder to get a patent, if who ever is (hopefully) interested wants this.

    However, just imagine a low poly quake level with moving boxes and metal balls with lightmaps updating in real time... not so impressive anyways.
    I have not even a render framework - need to start from scratch with this, after finishing damn geometry processing work...

    So i'm just average Joe, not John Carmack :)
     
  12. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    39,710
    Likes Received:
    9,764
    Location:
    Under my bridge
    Understandable, but then you have to accept that discussion will be limited. "I wish nVidia didn't do it this way but I can't talk about it," ain't the subject of great conversation.

    Although just a YT video of realtime visuals without any explanations shouldn't give anyone any clues as to how its done.
     
    vipa899, pharma and eloyc like this.
  13. JoeJ

    Regular Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    323
    Likes Received:
    365
    Yes sounds good!
    I need the barriers mostly for cases like this:
    Process one level of a tree
    barrier
    process the next level (depending on parent data)
    barrier and continue with levels deeper down

    Mostly the processing does zero work so when generating on GPU the count would reduce a lot already, but i definitely need them (i guess maybe still about 30-50 per frame).
    Another, simpler option would be to skip over commands (including the barriers) in the command lists on GPU side. Not so advanced but it would already fix that case.
    OpenCL 2.0 is more flexible here, but with the above i would be already happy. CL smells too high level to be hardware friendly anyways.

    I don't know if GL NV commandlist supports addressing multiple queues. If it's easily possible that would be nice too - but not so much of an requirement.


    Many Thanks! :) :) :)
     
  14. OCASM

    Regular Newcomer

    Joined:
    Nov 12, 2016
    Messages:
    804
    Likes Received:
    782
    The Dreams approach certainly has benefits but performance isn't one of them. You say huge worlds but so far we've only seen small scenes (AFAIK). As you say it's noisy and doesn't support the standard PBR workflow. It works for what it's trying to accomplish but I wouldn't recommend it for the majority of games.

    I do agree on the point of softness vs sharpness. Too shap looks fake. In addition to avoiding ultra high resolutions games should support light diffusion effects as well.
     
  15. milk

    Veteran Regular

    Joined:
    Jun 6, 2012
    Messages:
    2,654
    Likes Received:
    2,130
    Joe Jarmack?
     
    JoeJ, vipa899, Scott_Arm and 2 others like this.
  16. Dictator

    Newcomer

    Joined:
    Feb 11, 2011
    Messages:
    84
    Likes Received:
    170
    Just want to say, that the idea of Titan V being the same performance wise as an RTX 2080 Ti in BFV is a bit problematic right at the start. It goes against all the information we have about the performance advantage the RT core provides for ray triangle intersections - it seems like sensationalist reporting not looking into the specifics to claim to the contrary.

    I could imagine the following without testing it since I do not have a Titan V: the Fallback layer for Volta is programmatically and visually different, a direct comparison of its performance is made troubled then. When talking with DICE render devs in August they mentioned then how the game was much faster with RTX and they were used to deving at much lower framerates with Titan V at worse quality (optimisations to the BVH structure and more execessive snapping were there in the gamescom build at the time as a hold-over from deving on Volta).

    Also we know from a number of presentations now that the ray/triangle intersection is just one part of the expense of using RT - and differs from scene to scene and effect to effect. Surely we can all imagine a scenario where the greatest performance bottleneck in a given scene is not the ray triangle intersection bit of RT, but the surface shading and denoising, thus shifting away from the importance of the RT core to change performance.

    I won't be testing this sadly, but I think GamersNexus might! I hope they look at the visuals on the Titan V as well in comparsion to RTX, that should be neat to see if they really are the same.

    EDIT - should I have posted this in the BFV RT thread? :D
     
    DavidGraham, pharma and iroboto like this.
  17. Samwell

    Newcomer

    Joined:
    Dec 23, 2011
    Messages:
    109
    Likes Received:
    123
    I have a question about the Titan V vs RTX thing in BFV and RT in general. Is the number of rays having an big impact on shading requirements with Raytracing or is it mostly independent, as the shading cost are growing because rays are hitting objects outside screenspace. More Rays don't hit so many more objects outside of screenspace and don't add so much shading cost?

    In BF V RT is at a max of 0.4 Rays per Pixel, in the test cases from the user it might even be at 0.1 Rays per Pixel, if there aren't many reflections. So it's not really surprising the RT cores aren't helping much. If we look at Remedys Control numbers, it's 5 ms Titan V vs 1ms RTX with 1 Ray per Pixel. Bad simplification: 0.5 ms on Titan V vs 0.1 ms on RTX for 0.1 Rays per Pixel. This would then lead to numbers, where the impact of RT cores is negligible. But if the shading cost aren't changing so much, they might increase the number of Rays a lot for RTX.
     
    pharma likes this.
  18. Dictator

    Newcomer

    Joined:
    Feb 11, 2011
    Messages:
    84
    Likes Received:
    170
    A nice way to look at it I think and a good point. Also that would mean pumping up the resolution would increase ray count and then change the importance of the RT core. Maybe there are similarities of performance at lower resolution/lower ray count.
     
  19. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    13,104
    Likes Received:
    3,409
    https://www.3dcenter.org/news/raytr...uft-mit-guten-frameraten-auch-auf-der-titan-v

    Looks to me like Titan RTX beats TitanV quite easily in the reflection heavy maps (rotterdam). It's about 40 or 50% faster though these benchmarks are pretty simple.

    I guess that goes back to the question of what BFV is actually doing. How much of the % difference is from shading performance. Would be nice to see both cards with RTX off to get a baseline performance difference.

    Spec-wise, Titan RTX does not have any major performance advantages for general gaming except a huge amount of memory, that I'm not sure BFV could even take advantage of. TFlop/s and bandwidth are roughly equal.

    Edit: In a roundabout way, this may actually explain what BFV is doing. If a map were primarily casting screen-space rays, you would expect performance to be roughly equal. The performance divergence suggests that there is a significant burden of DXR rays on-screen in Rotterdam map, even after the patch.
     
    #679 Scott_Arm, Jan 2, 2019
    Last edited: Jan 2, 2019
    pharma, vipa899, iroboto and 2 others like this.
  20. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    2,581
    Likes Received:
    2,134
    These are user generated tests still, and they only use High DXR, I believe at Ultra DXR, and at bigger resolutions than 1080p, the difference would be higher.

    Notice the Titan V in question is running OC'ed to 2.0GHz, which gives it 20TFLOPs. Titan RTX @2.0GHz gets you 18.4 TFLOPs.
     
    #680 DavidGraham, Jan 2, 2019
    Last edited: Jan 2, 2019
    pharma and BRiT like this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...