GPU Ray Tracing Performance Comparisons [2021] *spawn*

Discussion in 'Architecture and Products' started by DavidGraham, Mar 29, 2021.

  1. manux

    Veteran Regular

    Joined:
    Sep 7, 2002
    Messages:
    3,034
    Likes Received:
    2,273
    Location:
    Self Imposed Exhile
    Another way is to consider that BVH is hw dependent. If BVH is done in driver no problem for IHV. They can add support for new hw in new driver. If BVH was implemented in game engine each game would have to be patched to have new BVH implementation whenever new hw(architecture or chip) comes available. Also game would have to be specifically optimized for old hw like turing variants also. This creates ton of work. Considering this and first generation RT games it would be very unlikely old games would get patched for new hw to function or perform optimally. In this case it would be tremendously difficult for amd to bring RT support to games after the fact. AMD would have needed to go back to each developer and ask them to make amd specific BVH implementation or non of the "dxr" games would run on amd. This would imply each game would have to be patched and released again, onus being game developer to verify amd hw+rt versus currently the onus being on amd to provide a performant driver.

    Even simple things like data format and possibly compression used to describe and store BVH can(is) be HW specific. Compression, structuring of data to be hw friendly etc. Much better for now to let driver handle this than force game developers to deal with each chip and architecture separately in their code. It get's more complicated once developer has to implement hw friendly grouping of rays etc. to go through BVH efficiently. Just naively using hw will likely be very cache unfriendly and poor performance. The black box called BVH is not at all trivial at scale to own. It's fine to own BVH if you do a demo for one specific gpu or console exclusive game.
     
  2. JoeJ

    Veteran Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    1,427
    Likes Received:
    1,680
    Agree to that, which makes the goal of having general BVH format not desireable.
    What might work is to query the max branching factor from driver, then build nodes from compute providing bounding box and child indices. Driver can then change those commands to create nodes in HW format without a need to buffer them.
    Though, the developer needs to provide shaders to support multiple branching factors, so there is a need to port from say streamed octree to BVH4, which is not trivial but still linear time and orders of magnitude faster than building from scratch.
    Then we also need ability to modify existing BVH instead just building it, meaning adding or removing leaf nodes for LOD support, to avoid a full rebuild just because some patches of surface change topology.

    Surely hard to solve (e.g. thinking about avoiding dynamic memory allocation), but a solution has to be found.
    For a start, vendor extensions to expose BVH would help. Even if not popular that's better than waiting unknown time until some big guys agree and specify something which might be fine or not.
     
  3. manux

    Veteran Regular

    Joined:
    Sep 7, 2002
    Messages:
    3,034
    Likes Received:
    2,273
    Location:
    Self Imposed Exhile
    I don't think it's realistically possible to open BVH traversal yet. To open it every gpu manufacturer+microsoft+khronos would need to agree on data formats and instructions used. Once those are in spec vendors can implement hw that can then run code using standard data formats and instructions in compatible way. I think this will happen once RT matures and different parties come to agreement on how low level API and data formats + potential compression should look like.

    Issue is currently every hw can be(is) very different. If you opened up BVH traversal you would need to support and optimize for each chip separately in game engine. Optimize for specific data formats and use hw architecture specific instructions to go through BVH. And of course also implement grouping coherent rays together in hw specific way to optimize throughput. Likely hardware architectures(turing, ampere, rdna2, future hw architectures)are so different that what works for one doesn't even run on another hw,...

    Another thing, if going through BVH was done using NV proprietary mechanism in game engines how would amd ever support RT in already released games? Reimplement BVH to all old game engines and rerelease games? Game developer is unlikely to want to do this after the fact(or at all if IHV is willing to do this in driver, time is money)? Nvidia would have had same issue with ampere hw versus turing,... This is even worse for intel who is coming in even later than amd,... And intel would add more to the burden of poor developer who tries to do hw specific implementations.
     
    #83 manux, May 6, 2021
    Last edited: May 6, 2021
    DegustatoR and DavidGraham like this.
  4. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    13,668
    Likes Received:
    16,876
    Location:
    The North
    Yea. It seems as though the API is very high level to ensure that if there is progress in RT hardware R&D; this API doesn’t force them a specific path just yet.

    I do agree low level DXR will eventually arrive, but only after the technology and research and the industry has had time to converge onto a specific best path.
     
    pharma, DegustatoR, Dictator and 3 others like this.
  5. JoeJ

    Veteran Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    1,427
    Likes Received:
    1,680
    That's not what i meant - i talked about access to BVH data structure but not its traversal.

    No, it's not that bad. There is no need to agree on HW data format, we only need basic agreement of using spatial, hierarchical data structures at all. Which is the case on any HW - the commonly used term 'BVH' is enough to ensure this.
    We also need agreement on interface instructions to write and read this data, but IHV can convert those to produce their native format in place, also handling compression / precision transparently.
    So my request is only a software interface about common ground. We want to prevent future restrictions on both ends of course, but even the minimum common ground i can imagine is enough.

    No. A game engine can use dynamic BVH (or programmable traversal if practical), but it does not have to use it.
    And if they decide to use it, the only per chip difference i see is different branching factors. Everybody uses bounding boxes and child indices/pointers.
    Effort is minimal on all ends, and to the game developer it even is optional. (I don't want to scrap DXR - i want to extend it.)

    If there ever was a game to use, say NV device generated command buffers extension, then it surely has a code path to do without such vendor extension, as usual.
    But notice my proposal is to make an abstraction of BVH - accessible to game devs, still allowing different implementations across chip generations and future changes from IHVs. So this problem would not come up at all.
    I don't want to get some struct in memory and hacking all bits of that - i only want to set / get the common variables expected from any generic implementation of BVH nodes.
     
  6. manux

    Veteran Regular

    Joined:
    Sep 7, 2002
    Messages:
    3,034
    Likes Received:
    2,273
    Location:
    Self Imposed Exhile
    You are over simplifying. Reality is more complex.

    BVH you want to access is likely packed and compressed in a very hw dependent way to optimize cache line and memory usage. You would need to be able to decode the BVH on per architecture/chip basis. There is no common BVH format that you could use. This is the whole point of black box. It allows each vendor to innovate as API doesn't limit trickery hw is allowed to do.

    HW to go through BVH structure is very heavily different between vendors and architectures. Even very simple things like how many bounding boxes can be parallelly processed affects what is optimal BVH structure and data format.

    Creating high level API that is not dependent on HW was a great first move. Once things mature I bet the black boxes will be opened. However this requires amd, nvidia, intel, microsoft, khronos etc. to work together and agree.

    You definitely can do lods. You can create multiple BVH structures. For example trace against one BVH for primary rays. For secondary rays shoot rays against different BVH containing lower level lod. This however runs to all the usual problems like light bleeding. We kind of learnt this already with doom3 and tesselation. It's important that visible geometry and geometry used for lightning is same to avoid artifacts.
     
  7. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    13,668
    Likes Received:
    16,876
    Location:
    The North
    The IHVs know what will make RT faster as they develop each generation, this feedback will go to MS. MS will work with all IHVs to define the behaviours of the next APIs but all the IHVs have to agree on it. If they don't you get nvidia games/RTX works and radeon fidelity solutions. When they do agree you get Tier 2 and Tier 3 supports. Overall as much as I want to see higher performance, it makes sense to wait and give the IHV time to come to their own conclusions and adapt on how best to move forward. Even though the API is behind, it should be to ensure the API does not present strong performance biases for 1 vendor over the other.
     
    manux likes this.
  8. manux

    Veteran Regular

    Joined:
    Sep 7, 2002
    Messages:
    3,034
    Likes Received:
    2,273
    Location:
    Self Imposed Exhile
    Completely agree. If MS had done naive thing we would just get what amd did. Few more instructions for compute shaders. At the moment dedicated and fixed function(?) hw seems to be the way to go if ampere or even turing is compared against rdna2. Will be interesting to see in years to come how hw and api's evolve. I suspect in the end unified shaders will win again but meanwhile there is a time period where fixed function seems to be king.
     
    pharma, iroboto and DavidGraham like this.
  9. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    13,668
    Likes Received:
    16,876
    Location:
    The North
    June 18
    let's GOOO
    this is how you do a remaster.


    edit- higher quality video

    just comparing console and PC footage, I think the difference is visible without DF; reflections are really a big lift here.

    4A studios way in front of the curve here in terms of RT fidelity and performance. Didn't expect this to be so far ahead of Resident Evil given the budget and studio sizes.

    I will use for the first time on this forum; these guys are wizards.
     
    #89 iroboto, May 6, 2021
    Last edited: May 6, 2021
  10. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    19,256
    Likes Received:
    22,060
    Good thing I knew the update would be free for Series X, so I picked it up during the last sale. Looking forward to giving it a go.

    Here's the video footnote:

    upload_2021-5-6_11-19-5.png
     
  11. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    13,668
    Likes Received:
    16,876
    Location:
    The North
    yea it's a full price game again nearly with the expansion etc.
    But I'm okay with giving the full amount. Gotta support this type of development.
     
    BRiT likes this.
  12. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,659
    Likes Received:
    4,464
    PCGH comments regarding Resident Evil 8 reflections ..

    https://www.pcgameshardware.de/Resi...l-Village-Benchmarks-Resi-8-Review-1371311/2/
     
    pharma, PSman1700 and DegustatoR like this.
  13. manux

    Veteran Regular

    Joined:
    Sep 7, 2002
    Messages:
    3,034
    Likes Received:
    2,273
    Location:
    Self Imposed Exhile
    I bought metro exodus from steam sale for cheap after the enhanced edition was announced. Not really the biggest fan of genre so I didn't want to pay full price.
     
  14. Arwin

    Arwin Now Officially a Top 10 Poster
    Moderator Legend

    Joined:
    May 17, 2006
    Messages:
    18,487
    Likes Received:
    2,218
    Location:
    Maastricht, The Netherlands
    Interesting ... if it is true that the console versions don't get any reflections at all, that's kind of disappointing. On the other hand the loadtimes, 3D Audio and DualSense support is nice. This will be a very, very interesting comparison feature for Digital Foundry ... and personally very curious to see which version I would prefer to play, the console or the PC version, given that I have a PC with the bare minimum specs GPU wise (RTX 2060) but with 32GB RAM, but DLSS2.1 support and raytraced reflections may still make that version visually preferable, really curious about how that will pan out.
     
    manux likes this.
  15. Jay

    Jay
    Veteran Regular

    Joined:
    Aug 3, 2013
    Messages:
    3,697
    Likes Received:
    3,045
    It's an Intresting approach.

    Maybe forgo RT reflections, for a fully RT lighting engine and the benefits that affords including in art, asset, level creation etc.

    The consoles RTRT may be enough for that going forward.
    Guess a lot of R&D in reflections to be done.

    XSS will be interesting especially in terms of texture quality, as it doesn't sound like it's using XVA.
     
  16. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    2,658
    Likes Received:
    2,407
    Location:
    msk.ru/spb.ru
    This was announced with the Enhanced Edition itself. Console versions get SSR, PC - hybrid RT+SSR.

    It is interesting though that they chose to do GI and not reflections on consoles with RT. You'd think that RDNA2 RT would fare better with the latter and not the former. But I guess it all boils down to the need to use RT when you base you lighting on it, with no clear option of falling back to raster here.
     
    iroboto and PSman1700 like this.
  17. JoeJ

    Veteran Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    1,427
    Likes Received:
    1,680
    I'd like to see an example of BVH data structure which can't be made from bounding boxes and child pointers, or where compression prevents accessing / changing this data. Sounds much more you make assumptions than i'm over simplifying.

    However, likely you are right and i have to wait for IHVs to be done with their innovating, so i can start to innovate too. Parallelism is overrated.
     
  18. manux

    Veteran Regular

    Joined:
    Sep 7, 2002
    Messages:
    3,034
    Likes Received:
    2,273
    Location:
    Self Imposed Exhile
    I think one can innovate quite a lot already as shown by metro exodus enhanced edition, cp2077, minecraft,... For those low level work maybe best is to work on console exclusive or r&d inside amd/intel/nvidia/microsoft/sony. Either be allowed to optimize for one hw or be allowed to do work with future hw. When something is out in stores it's old news for IHV at that point. New stuff is already well on the way to be designed and implemented. Replacement for rdna2/ampere probably is already very long in design phase and the next next architectures probably are also under work.

    For BVH you could also consider things like instantiation. Is there standard way to handle instances of same objects or is it perhaps something hw can do tricks with? What about rotating, resizing etc. the instances during BVH creation? Naively one could create unique geometry out of instances(bigger BVH). Perhaps there are better ways to instantiate and also have tricks in hw to support this efficiently. Another thing is flexibility of hw. HW can be so hardcoded today that there is no point opening it up further. AMD is ahead here as they are using regular compute, on the other hand this approach seems to have a pretty huge performance hit(blender, cp2077,...)

    edit. Your thought about parallelism is naive. Nvidia has separate fixed function hardware to handle bounding boxes and triangles. You need to keep all units fed or you will go slow. Parallelism is very important as is sorting for coherent rays etc. Luckily for now the black box hides a lot of low level stuff and developers can focus on the big things like how to cast rays that contribute most and how to reuse results between frames.
     
    #98 manux, May 6, 2021
    Last edited: May 6, 2021
    iroboto likes this.
  19. PSman1700

    Legend Newcomer

    Joined:
    Mar 22, 2019
    Messages:
    5,371
    Likes Received:
    2,368
    Rather huge difference, and thats just early games. Anyway, load times will be fast on PC aswell im sure. DualSense is supported on W10 might use that instead of kb/m.
     
  20. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    13,668
    Likes Received:
    16,876
    Location:
    The North
    Yea ;) with nothing to fall back on the choice is basically made for you. Hopefully in time a developer will discover to incorporate all of this + reflections on console.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...