GPU Ray Tracing Performance Comparisons [2021] *spawn*

Discussion in 'Architecture and Products' started by DavidGraham, Mar 29, 2021.

  1. trinibwoy

    trinibwoy Meh Legend

    Ok, don't think anyone is arguing with you there.
     
    PSman1700, JoeJ and pjbliverpool like this.
  2. HLJ

    HLJ Regular

    It makes sense with "fixed function" hardware at first (NVIDIA's RT performance over AMD shows that current "compute units" do not have sufficient power when compared") and then migrate it over when the " compute units" are capable enough.

    RT is a computational beast and AMD's implementation shows the downside of using "compute units" versus "fixed units".
    This will most likely change in the future...but for now, it is not the optimal solution...all reviews shows this.

    You have to crawl before you walk...
     
    PSman1700 likes this.
  3. DegustatoR

    DegustatoR Veteran

    The question is what's the cost of this flexibility? Would you be okay with RT performance dropping 2-3 times universally just so some (not all) engines be able to use h/w RT in a "nice" way (as in instead of relying on hacks like all graphics do)? What would that give us? Nanite native mesh RT at performance lower than that of compute based Lumen?

    There's a reason why current APIs are limited. This reason is performance. Full flexibility gives you general compute, good luck using it for per pixel RT.

    Console APIs are not better either since if that would be the case we would have seen the advantages they would provide already.
     
    PSman1700, pharma and xpea like this.
  4. troyan

    troyan Regular

    Dont understand this. You can not make "general compute units" more capable for raytracing. Professionel ISV have used GPUs for raytracing since years. nVidia has a pure software solution with Optix. And yet nearly everyone has adapted hardware accelerated raytracing.

    Obviously "new" cores or compute units are necessary for hardware accelerated raytracing.
     
    PSman1700 and xpea like this.
  5. neckthrough

    neckthrough Newcomer

    But that's always been the case. Consoles allow you to spend dev cycles programming to the metal because there's only one or a handful of hardware platforms to support. With PCs you need abstraction, which is a boon for many but a bane for ninja devs. Always has been, always will be.

    I think @JoeJ 's complaint is that DXR's abstraction level is even higher than usual. I could see that, it's wrapping a fairly complex set of hardware primitives. The API will evolve with time (as will the hardware), but it'll never reach console-level control.
     
  6. HLJ

    HLJ Regular

    Look at how "compute units" and API's have evolved over time.
    The G80 marked NVIDIA's first transistion to a more compute architecture.
    This is a nice look back:
    https://www.extremetech.com/gaming/...orce-8800-changed-pc-gaming-computing-forever

    Look at what he says about Tesla to Fermi to Pascal...
     
    PSman1700 likes this.
  7. JoeJ

    JoeJ Veteran

    No, in context of open BVH and Nanite that's not the question, because it does not come up:
    Nanite is regular triangle meshes, RT Cores expect just hat, so their operation is not affected, nor do we need new flexibility here. Traversal and RT Core can and should remain black boxed as is.
    We only need to modify the BVH data to update partial mesh clusters to switch geometry resolution. The result is again static and regular triangle meshes.

    So the question if we need or want flexibility within traversal is a very different one, see Intels stochastic LOD paper.
    Nanite solves LOD on the level of geometry, so both RT and rasterization can 'fake' continuous LOD using discrete changes of the mesh.
    Stochastic LOD in contrast solves LOD in image space, requiring to switch discrete detail levels individually per pixel, and so requiring traversal shaders to work properly.

    Both is interesting, but Nanite is quite convincing by its results and proven to work. It's harder to implement, but no need to question actual hardware acceleration.

    Actually the only affect on tracing performance comes from differing BVH generated from offline custom engine code, vs. driver building it in real time. But that's a hypothetical question, because the driver can not do this in practice.
     
    pjbliverpool and DavidGraham like this.
  8. trinibwoy

    trinibwoy Meh Legend

    Yeah the BVH structure is opaque for a very good reason. Microsoft probably didn’t want the responsibility of defining one acceleration structure to rule them all. DXR doesn’t even care if underneath the hood it’s a BVH, k-d tree or something else as long as you can cast a ray into it and hit something.

    The irony is that in order to give developers flexibility you have to limit the flexibility of the hardware implementation by mandating a specific acceleration structure. This would probably also mean mandating how compression of that structure works. This is no different to mandating the structure of a triangle strip or a texture compression format. The only difference is that triangle strips have an obvious “best” representation. This isn’t the case for RT acceleration structures so Microsoft decided to punt. Or the IHVs demanded control.
     
  9. JoeJ

    JoeJ Veteran

    Yes. But that's solvable, e.g. using abstracted shader language structures and/or functions to access nodes and set child pointers, and then running some post process for vendor compression.
    Such post process could even handle conversion from BVH4 to BVH8 for example, to make it really easy for the devs.
    Though, personally i think this would end up again compromising performance or having limitations still, not sure.
    Seems better to get started with vendor extensions, and make DXR API after it turns out what's the differences, practices, problems, etc.
    We see on console, where there is only one vendor, it's quite easy. And treating all vendors specifically seams easier than forcing conventions on them yet.
     
  10. trinibwoy

    trinibwoy Meh Legend

    What data structure would allow you to do this? Unless you treat each cluster as its own BLAS there is no practical way for a developer to reference a specific node or treelet within the BVH. How would they even know where to look for the geometry they want to modify?

    If we treat each cluster as its own BLAS then we can accomplish LOD today with DXR. Just delete/rebuild the BLAS as needed.
     
    PSman1700 likes this.
  11. trinibwoy

    trinibwoy Meh Legend

    If you do this you still bias the api toward a particular hardware implementation. If an IHV chooses to go their own way they will have to pay the cost of constantly converting back and forth from Microsoft’s data structure. Not worth it.
     
    PSman1700 and DavidGraham like this.
  12. JoeJ

    JoeJ Veteran

    Yeah, in theory. But then we have much too many BLASes, and building huge TLAS every frame takes too long.
    That's bad, we still talk about static geometry, se we can keep the node count for TLAS as small as is.
    Now we could maybe add some more levels of interaction here, like TL, BL0, BL1, BL2. Maybe this is what Karis has in mind from is twitter posts. IDK, my own proposals here are just my personal visions and may differ from Epics ideas.

    The data structure we need is a node, it's child pointers / triangle indices. The developer knows which node refers to which patch of triangles by linking this to his own tree he uses to select LOD.
    (Notice this means we still have duplicated AS data this way: ours and RT BVH. A traversal shader in a flexible future could eventually work using just one, totally custom AS. But that's far fetched, and not sure if we ever want this.)

    Edit: Ofc. we also need to set bounding boxes per node, and most difficult: We need to generate / delete nodes, involving memory management and compaction problems, and eventually expected memory orderings from HW. Depending on HW, this can be quite a big problem.
     
    Last edited: Jul 9, 2021
  13. HLJ

    HLJ Regular

    In regards to "compute units" and perfomance:
    https://www.gamersnexus.net/dictionary/2-cuda-cores

    "Architecture changes in a fashion that makes cross-generation comparisons often non-linear, but generally speaking (within a generation), more CUDA cores will equate more raw compute power from the GPU. The Kepler to Maxwell architecture jump saw nearly a 40% efficiency gain in CUDA core processing ability, illustrating the difficulty of linearly drawing comparisons without proper benchmarks."
     
    PSman1700 and pharma like this.
  14. Scott_Arm

    Scott_Arm Legend

    Imagining the hypothetical scenario where you could install and run full Windows10 on a PS5 or Series X, I doubt that the console APIs offer less performance than DXR for exactly the same hardware. Whatever additional flexibilty the console APIs have probably does not come with any overall performance cost. So I don't really know what people are arguing. We have the hardware. I don't know if there are features that are not exposed in DXR. I'm not exactly sure what extra options you have for manipulating the ray tracing data in the console APIs. I just think it's kind of plainly true that over time the DXR api will expose more access to hardware features and more opportunities to manipulate the data. This seems non controversial. I'm not talking about weird hypothetical scenarios like what if DXR never existed and we could have a do-over and redesign the ray tracing hardware to work differently and be a generalized compute function.
     
    PSman1700 and HLJ like this.
  15. trinibwoy

    trinibwoy Meh Legend

    Yeah when you start unpacking the details it becomes super obvious why DXR is the way it is. Consoles have the benefit of a fixed platform so there’s no need to be flexible from a hardware perspective.
     
  16. JoeJ

    JoeJ Veteran

    Thus i want vendor extensions first, to be sure.
    However, BVH data structures are usually simple. The only potential variations seem this:
    Branching factor (BVH2,4,8...64?)
    Pointer per child vs. pointer for first child and child count. (Same for triangles in leafs)
    Treelets with relative boxcoords from the treelet root.


    What else?
    API to expose all this, plus query info from driver on what's expected would do, and we would already have a vendor independent BHV API.
     
    Krteq and pjbliverpool like this.
  17. DavidGraham

    DavidGraham Veteran

    You have to understand to politics of the situation, only NVIDIA seems to regard RT highly, and they think their current approach is the optimal one.

    AMD is not promoting RT, their public speakers regard it as a compelemntary feature along side rasterization, two years ago they outright downplayed RT completely, their RT roadmap wants current RT to be limited to shadows, with the next step happening on the clouds, they actively sponsor games to implement RT shadows alone (Godfall, WoW, Dirt 5, Riftbreaker, Far Cry 6), they didn't even release a public demo to showcase RT to it's user base, even their pre rendered RT demo is downright unimpressive, heck .. Sony and Microsoft appear more enthusiastic about RT than AMD themselves.

    Then, add to that, the almost non existent professional RT acceleration, their low market share, and their not competitive hardware implementation which could likely change to a completely better solution in RDNA3 .. if you think they are going to waste time on creating custom extensions on PC for RDNA2 after all of that, then I would say you are unrealistically optimistic.
     
    PSman1700 likes this.
  18. trinibwoy

    trinibwoy Meh Legend

    Nvidia isn't really incentivized to create extensions either. Their immediate goal should be to drive adoption in a fair game they're currently winning. Not to split the user base and encourage griping about proprietary features / consumer lock-in etc.
     
  19. PSman1700

    PSman1700 Legend

    Exactly, thats how things have been going for ages now.

    Thats not what im wanting to get to say atleast on my part. What im debating is that 'console RT is better' than PC ray tracing, which is a totally false statement.

    You dont need 'high end' to greatly surpass the consoles in ray tracing capabilities either. I do agree API's are having lower access on consoles, but that doesnt really help the case right now in the stance of console RT vs PC RT.

    No your not :p

    Agree on your post, nicely written. I wasnt debating software vs hardware, cause obviously software is better if possible, but we dont have 200TF GPU's just yet. Its like PS2 vs GF4 Ti4600....
    Again what i didnt agree on is that 'console RT is better than PC RT', which is the first time i heard someone claiming this btw, not to withstand what the actual results show.

    Im sure Nvidia will eventually go the direction of RT on compute aswell, just like with any tech before it that went HW first on nv hardware.

    Been thinking this aswell, whos to say DXR/PC api's wont improve over time. Anyway, thanks for a civil/polite discussion ;)
     
  20. JoeJ

    JoeJ Veteran

    I'm far from optimistc.
    hehe, ok - maybe i'm more optimistic than i admit :)
     
Loading...

Share This Page

Loading...