Next gen lighting technologies - voxelised, traced, and everything else *spawn*

Discussion in 'Rendering Technology and APIs' started by Scott_Arm, Aug 21, 2018.

  1. Ike Turner

    Veteran Regular

    Joined:
    Jul 30, 2005
    Messages:
    1,884
    Likes Received:
    1,756
    Heinrich4, jlippo, milk and 3 others like this.
  2. Dictator

    Newcomer

    Joined:
    Feb 11, 2011
    Messages:
    124
    Likes Received:
    288
    THere it is:
    "It runs in 1080p with 30 fps on a Vega 56"

    So kind of like I heard - "injecting" geometry at those points when needed at a lower LOD for the ultra mirror like reflections and falling back to voxel cone tracing for less smooth surfaces + diffuse stuff. That explains a heck a lof the visual things going on in the NOIR demo! I love that idea though, tiered quality and different techniques used. Though it does mean that real time dynamic moving objects will not really effect diffuse GI or reflections on less than mirror like reflections. AFAIK character models and moving objects are not voxelised in CryEngine, just the static geo.
    "However, RTX will allow the effects to run at a higher resolution. At the moment on GTX 1080, we usually compute reflections and refractions at half-screen resolution. RTX will probably allow full-screen 4k resolution. It will also help us to have more dynamic elements in the scene, whereas currently, we have some limitations. Broadly speaking, RTX will not allow new features in CRYENGINE, but it will enable better performance and more details."

    Hell yeah! This sounds awesome. It seems like the Noir demo being devoid of many more dynamic elements is one of the things getting that 30 fps on the Vega56 even then!
     
    Heinrich4, jlippo, milk and 5 others like this.
  3. Ike Turner

    Veteran Regular

    Joined:
    Jul 30, 2005
    Messages:
    1,884
    Likes Received:
    1,756
  4. AlBran

    AlBran Ferro-Fibrous
    Moderator Legend

    Joined:
    Feb 29, 2004
    Messages:
    20,716
    Likes Received:
    5,813
    Location:
    ಠ_ಠ
    *lights the @willardjuice, @iroboto @BRiT signal* What was I saying yesterday? xD

    Phil walks on stage.
    “It’s @Rys Tracer! Ryyyyyyyyyyyys Tracer!”

    /flees :runaway:


    ... 3 hours of sleep.
     
    #1684 AlBran, May 15, 2019
    Last edited: May 15, 2019
  5. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    40,691
    Likes Received:
    11,138
    Location:
    Under my bridge
    So PVR's solution is significantly more engineered than nVidia's. The idea here is to license the tech for other GPU vendors such as AMD to incorporate? And presumably the ideas are patent protected where they can be, so acceleration concepts (scene hierarchy generator?) that ImageTec are first on will remain exclusive to them?

    Could this actually explain some of nVidia's choices? Are some ideas locked IP?
     
  6. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    7,885
    Likes Received:
    6,160
    that is a hard software based implementation by AMD. It's an early graphic, so I'm not sure if that's going to require updating once Navi is released. But we will see.

    oooooof if there is no change
     
    #1686 iroboto, May 15, 2019
    Last edited: May 15, 2019
  7. JoeJ

    Regular Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    444
    Likes Received:
    519
    Yeah, that's still the question.
    ImgTec can generate BVH in hardware while running a kind of vertex shader, as i've seen on their earlier docs. NV does this from compute with implementation (AFAIK) hidden behind API.

    But the more interesting claim in the whitepaper is: NV has nothing similar to 'Ray Coherency Engine', which is the most interesting part about RT performance.
    We know NV did most of the research in this field. Just because thy do not talk about it does not mean they don't have it? I doubt this claim.

    If it's true however, we can expect a big perf boost with NV next gen.
    Personally i think the options here are too many patent issues could become a real problem, and likely NV has patents over their RT research. But i'm no lawyer.

    ...oh - i've missed the most important part here actually:
    If ImgTecs BVH generation is so fast and allows per frame rebuild, this would fix the largest DXR/RTX problem: Missing LOD support.
    With the option to generate geometry and BVH on the fly any LOD mechanism can be implemented.
    We still want to reuse data over multiple frames ofc, so while moving through the world stuff coming closer becomes more detailed but only parts of geometry change each frame.
    I like this :) With such issues solved and everything just working the mentioned black boxes are no more problem.
     
  8. PSman1700

    Newcomer

    Joined:
    Mar 22, 2019
    Messages:
    232
    Likes Received:
    50




    Can't run it yet but its awesome how many Ray-Traced games are already available. For non-RTX gpu's you can get away with something like a Radeon VII.
     
    OCASM, pharma and Heinrich4 like this.
  9. jlippo

    Veteran Regular

    Joined:
    Oct 7, 2004
    Messages:
    1,336
    Likes Received:
    434
    Location:
    Finland
    Would be more interesting, if it wasn't a screenspace tracer.
    Inability to use backup methods for SS misses or excisting GI methods is quite big limitation.
     
    Dictator likes this.
  10. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    6,805
    Likes Received:
    473
    There's only so much reordering rays can do ... you trade incoherence in intersection test for incoherence for the memory access for the re-ordering.

    It's curious with how much NVIDIA has published on ray tracing in general recently, how little it has published on ray re-ordering in recent years. All the work of Timo Aila is relatively far in the past. Maybe it's just nowhere near relevant to real time ray tracing on commodity hardware?
     
    #1690 MfA, May 17, 2019
    Last edited: May 17, 2019
    pharma, chris1515 and iroboto like this.
  11. PSman1700

    Newcomer

    Joined:
    Mar 22, 2019
    Messages:
    232
    Likes Received:
    50
    Perhaps we will see something like this on next gen consoles too, a Radeon VII can do it and it has no RT cores like RTX Nvidia has.
     
  12. JoeJ

    Regular Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    444
    Likes Received:
    519
    Surely finding to best trade off isn't easy, but there must be a sweet spot worth to find. Otherwise ImgTec would not have their Coherency Engine. Although they list this as optional. Because of minimal power / die area, or because it is no big win? They don't give any data, just mentioning additional silicon cost. It's also not clear if reordering happens only once at each hit or frequently during traversal.

    I remember Ailas paper which somebody linked here recently, about 'treelets'. That's a similar idea i have called 'caching branches of BVH to LDS', so quite an interesting read. In the paper they mentioned wide trees and stackless traversal as interesting future work. I would consider both to be essential - the lost front to back order with stackless could be counteracted by dividing rays into front to back segments. I agree the paper appears quite dated, but i doubt they stopped working in it. Maybe they changed from 'publicate and patent' to 'keep secret' strategy for some reason.

    From the DXR API we have the impression tracing rays is an atomic operation and execution waits on results. This would hint there is no batching, reordering or however we call it. But that's no proof. It remains a mystery :)
     
    OCASM likes this.
  13. JoeJ

    Regular Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    444
    Likes Received:
    519
    Oh no... more screenspace crap :)
    I'd like to see something like Crytek has shown instead. Likely we will, considering RT mentioned for PS5.
     
    OCASM likes this.
  14. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    6,805
    Likes Received:
    473
    It's possible that the core of the coherence work is hard to patent, so they'd rather just not talk about it.

    I'd guess Kirill Garanzha's work describes quite a lot of the patently obvious mechanisms necessary and he did that before joining NVIDIA.
     
    Ethatron likes this.
  15. keldor

    Newcomer

    Joined:
    Dec 22, 2011
    Messages:
    74
    Likes Received:
    107
    The API is very high level - shaders dispatch some rays, and at some point a hit/miss/whatever kernel is executed with the results. There's a lot of resemblance to Nvidia's dynamic parallelism in Cuda, honestly.

    Rays are already inherently batched to some extent at the wavefront level. They're often even coherent here too!

    Scheduling is complicated to say the least. Last thing you want are SMs sitting idle for thousands of clock cycles waiting for rays! And the latency will be thousands of cycles since traversing an accelleration structure is pointer chasing with a lot of it uncached.

    So what do you do with the kernel while the ray dispatch is active? Evicting a block from an SM is an expensive operation since it has a huge chunk of registers and possibly shared memory too that needs to be persisted. We're talking kilobytes here. This is of course assuming the shader isn't just terminated at this point - one possible optimization would indeed be to put the dispatch at the end of a kernel, allowing it to terminate without waiting for results, and have the hit/miss kernels responsible for writing results to a UAV or something.

    Asynchronous compute could be very useful at covering stalls from ray dispatch. Though the presence of the stalled kernels limits occupancy.

    A driver/compiler level optimization is to let the kernel continue running past the ray dispatch point as far as possible. This sort of thing is a common optimization used by many compilers - by putting as much code between a high latency operation and where the result is consumed, you can cover part or all of the stall. Static reordering!

    We can see that depending on where the ray results are used, tracing can either be latency sensitive or not. In the first case, large scale batching and reordering actually hurts performance. In the second, they might be a win. There's a lot of room for hardware, driver, and compiler optimization here.

    Anyway, https://devblogs.nvidia.com/rtx-best-practices/ has a lot of indirect information about how RTX works under the hood.
     
    milk, imerso, MfA and 1 other person like this.
  16. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    40,691
    Likes Received:
    11,138
    Location:
    Under my bridge
    milk likes this.
  17. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    6,805
    Likes Received:
    473
    More samples gives better estimates whatever the estimator and these methods all improvise heavily enough to need a long history, the lag comes with the territory.

    As I said before, there's something to be said for faking it ...
     
  18. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    40,691
    Likes Received:
    11,138
    Location:
    Under my bridge
    If laggy light is part of real-time raytracing, I'm starting to take umbrage with calling it 'real-time'. If the processing power needed to perform adequate sampling is that great that we need a half second delay in when our lighting catches up, resulting in smeary visuals on dynamic lights, that's significantly different from the ideal and the ways RT has been portrayed. "It's a solution to all your lighting problems! (just please ignore the new one it introduces)"
     
  19. Ike Turner

    Veteran Regular

    Joined:
    Jul 30, 2005
    Messages:
    1,884
    Likes Received:
    1,756
    Well, this has been a thing since the very fist demos more than a year ago (Star wars Reflection & everything after). This is the current price to pay for "noise free" IQ with extremely low sample count.
     
    #1700 Ike Turner, May 19, 2019
    Last edited: May 19, 2019
    chris1515 likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...