Ray-Tracing, meaningful performance metrics and alternatives? *spawn*

Discussion in 'Rendering Technology and APIs' started by Scott_Arm, Aug 21, 2018.

  1. OCASM

    Regular Newcomer

    Joined:
    Nov 12, 2016
    Messages:
    754
    Likes Received:
    709
    1) Downsides... from your point of view. Maybe not to other developers.

    2) No, I was talking about the Danger Planet tech. The slides I linked to have more pictures. One of them is a Sponza test. The demo at the link you posted looks nice. And talking about Brigade, maybe Otoy will have something new to show us soon now that they're about to support UE4 in addition to Unity.

    3) Compared to before DXR, there was much less interest in RTRT. That's a fact. Now you see plenty of developers on Twitter learning and experimenting with ray tracers.

    4) GDC is around the corner. Maybe we'll some interesting algorithms/use cases.
     
    DavidGraham and vipa899 like this.
  2. JoeJ

    Regular Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    260
    Likes Received:
    269
    The most lucky devs are those who approach RT just now, because they will never experience those limitations.
    But remember sebbies twitter page i have posted, with some of the most recognized and experienced gfx devs making the exact same critique as me.
    Still i need to defend myself constantly, which is exhausting.

    Sorry man, i would not have thought this sponza scene exists for so long time already.

    Of course there now is more RT development for games. My point is SSR would have been replaced by RT also without NV / MS.

    Sorry for this as well. Likely you meant i would say RTX and alternatives can not be combined.
    But no, the context was: Can BVH be shared? Can the combination be made ideal? And here the answer is no in my case.


    Let's close this argue finally. My points seem much too technical and detailed to discuss this here, and my intention never was to stretch minor issues over multiple pages.
    I'll keep my development and experience out of discussion from now on.
     
  3. vipa899

    Regular Newcomer

    Joined:
    Mar 31, 2017
    Messages:
    922
    Likes Received:
    348
    Location:
    Sweden
    I dont think you can go too technical here, i do think theres something on both sides, RTX RT isnt crap and so isnt RT via compute, both have their advantages and disadvantages. Im sure that when used well, and a game optimized for it, nvidias current RT solution can show some intresting results that arent obtainable with just compute, especially anything below a Titan V. Were also quite fast at judging how bad the tech is, or how bad for that matter.
    Atomic Heart, Metro Exodus, and if the list is anything to go by we will see more examples. I also hope the community come with things like the Quake 2 'mod'.

    Hardware functions like bump mapping, T&L and whatever effects from the early 2000's where a thing, PS2 could do those too in a slower but more flexible solution, the platform was pushed more then any other but i dont think there was ever bump mapping on it, perhaps a title or two over its 13 years lifespan. This state of fixed function can exist, Turing and AMD's next probally dont lack in compute either so nothing to worry about.
     
    OCASM likes this.
  4. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,004
    Likes Received:
    2,509
    Location:
    Well within 3d
    From Nvidia's description of RTX, the driver handles building and refitting of the BVH, and the RT cores autonomously handle traversal. The BVH implementation used seems to be pretty black-box.
    RTX has some of Nvidia's particular spin on the concept, but the overarching idea behind the API for ray tracing is that the low-level acceleration structures are encapsulated so that other implementors can have different structures while allowing them to plug into the API.
    What specific elements of the BVH are developers exposed to?

    Tensor cores accelerate ML in the form of weights and connections worked through dense matrix and vector multiplication with digital ALUs and crossbars that either map very well to existing hierarchies or extend them in a reasonable way.
    By the logic applied to RTX and its BVH, they discriminate against various neuromorphic and analog methods, and steer devs away from optical and quantum methods as well.

    RT cores do accelerate BVH traversal, and also the intersection tests (although the latter can be replaced). The BVH is the immediate implementation's solution for a more general problem.
    Without an acceleration structure, no alternative methods have been able to get themselves close to practicality. BVH is the choice Nvidia want for in terms of what it thought it could map to the existing architecture. It's not the only one, but it's the one Nvidia seems to have been able to best map to the existing SIMD hardware for construction.
    Traversal of the acceleration structure is a challenge for a lot of alternative methods, however. I thought cone tracing still had need of an acceleration structure, and its intersection evaluation would be more complex than for a ray. The latter point would seem to favor a different sort of hardware optimization, since Nvidia offers to accelerate intersection testing for a simpler case.

    It wasn't always the case that these were taken for granted. AMD got nailed for lying by omission about R600's lack of a UVD block, so today's settled question had a period of pathfinding work and initial efforts.
    It wasn't settled whether there would be T&L hardware on the graphics chip, texture compression, AA, or AF until someone put in the hardware to do it--and there were any number of now-gone implementations before a rough consensus was reached.
    If Nvidia's specific version of RT hardware doesn't catch on, it's no different than other items didn't pan out in the long run (Truform, Trueaudio, quadric surfaces,.etc), or from many of the features we have now where someone had to commit hardware before there would be adoption.


    I think there's a desire to have the same games or similar games with similar features on mobile platforms as there are in the PC and console. Among other things, it helps mobile devices steal more time from the other platforms, and can help the same product expand to multiple markets more readily.

    Depending on the level of change in a scene, a refit can take significant fractions of the cost of a rebuild. There's no theoretical ceiling to this, and if the cost of a rebuild is no longer the dominant one, scene's complexity can raise until the refit becomes a similar limit.

    Traversal of some structure to find arbitrarily related geometry in an unknown place in DRAM or cache is a fundamental challenge. "Don't do that" is both true and unhelpful.

    Mobile hardware often favors fixed-function more because it is cost and power constrained to a degree PCs and consoles are not. General compute resources have a higher baseline of area and power consumption, and the mm2 and milliwatts it takes are more costly.

    This tends to be true in a wide range of cases, and if IMG really wanted to poke that hornet's nest it would likely have the resources to figure out if Nvidia was using patented tech.
    Cross-licensing is common, and silently developing with disregard as to whether a competitor's technique is invented in parallel is constant. There's good odds that if there wasn't pre-existing licensing, there's a case of mutually assured destruction where IMG could infringe somewhere else that Nvidia hasn't yet taken them to task over.
    Apple and Intel for example did get caught out for infringing on memory disambiguation hardware whose patents were enforced by an organization related to the University of Wisconsin, who couldn't be sued back like another IHV could.

    One other way non-disclosure can help is if Nvidia finds a better method, they can change things while minimizing how much bleeds out from under the abstraction.
     
    dobwal, JoeJ, Malo and 5 others like this.
  5. JoeJ

    Regular Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    260
    Likes Received:
    269
    None. The only way to access the BVH nodes is to trace rays against them. There is no Box query for example, so you could not implement a physics broad phase collision detection.
    Though, box query should be easy to add and generalize by API even if other vendors choose different BVH data structures. So i expect this to come in the future. (Pretty sure everyone agrees on BVH over octree or kd tree etc., and AABB over OBB)
    Disclaimer: I could have missed box query option while reading API.

    Cone tracing is an open problem and would be the holy grail in the field of ray tracing. Whenever you think about it, you come up with 2 options: 1. trace many rays instead. 2. Use a parallel friendly approximation with techniques from signal processing.
    Neither is good, but option one is preferable because it does not leak. To optimize this, it would make sense to use parallel algorithms for ray bundles, which the API does not allow. (Rays are single threaded so isolated from others, similar to pixel shaders)
    This restriction however makes sense to keep options changing HW design, and makes it easier to put a common high level API over multiple vendors.

    The central idea of practicable realtime RT as shown however is: Avoid cone tracing and replace it with temporal accumulation of single rays (denoising).

    Sure, but there also is the major performance difference. On mobile it makes more sense to stream data too expensive to calculate on chip, and FF is also more justified because the alternative is impossible much more likely.
    However - i'm not sure if ImgTec ever made a mobile RT chip. I think i got all this wrong and they only made a co-processor for desktops, likely targeting AR and content creation there?

    True. One option is to use a sphere tree. Then for a skinned character all there is necessary is to transform the node centers, so no dependency between tree levels and no need to update bounds at all. (Same can work for AABB, just extend the bounds large enough they cover any animation at any orientation.)
    Because DXR is unaware of such things, optimizations like this are ruled out. Though, they also slightly reduce tracing efficiency.

    Interesting: ImgTec seems to rebuild the tree per frame from vertex shader per object (per frame only for dynamic objects). But DXR allows refitting for lower level trees and rebuilds only to connect them, IIRC.

    Good catch. Depends on granulary and algorithm suitable to current task. Inefficient outliers exist in both approaches.


    Likely i sound again like criticizing, but that's all minor things which will be taken over by the FF speedup - i could agree to that eventually.
    The main failure of DXR / RTX is the inability to allow any useful LOD mechanism. LOD is very hard, there is plenty of research going on.
    It goes from 'how can i approximate branches of a tree with a 3D volume texture?', down to 'how can i grow procedural grass or other fine grained stuff?', or 'how can i merge complex materials from all this - how can a single pixel still represent the reflection properties of a whole tree, wood or mountain?'.
    Thinking LOD would be just to reduce poly count is like looking at only the tip of an iceberg. And DXR does not even allow to reduce poly count.
    This is the major restriction i see that slows down progress.
    Of course this is hard to agree upon if you take current state in games as reference, because they do not yet handle proper LOD at all. This is a result of GPU raster power - they did not need to care. But we must address this if we want realistic images. It's not only about lighting.
    This is why i see the limitation to static triangle raytracing as a short sighted decision not in the name of progress. Dynamic geometry and BVH building must become programmable at day one.
     
    Shifty Geezer and chris1515 like this.
  6. manux

    Veteran Regular

    Joined:
    Sep 7, 2002
    Messages:
    1,406
    Likes Received:
    265
    Location:
    Earth
    Shouldn't lod be implementable by building multiple bvh structures? Then collect rays and trace them against bvh containing desired lod level.

    Lod might be painful as cracks could happen leading to artifacts or shadow edges might not match silhuettes... There is tradeoff here between correct&consistent versus performance to be made.

    My view is that ray tracing against bvh looks a lot like texture operation,... Fairly hardcoded but using result of hardwired operations is flexible.
     
    #1046 manux, Feb 12, 2019
    Last edited: Feb 12, 2019
    OCASM likes this.
  7. JoeJ

    Regular Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    260
    Likes Received:
    269
    Yes, that's what you'll do if you have fixed LOD meshes. With RT the resulting popping becomes more visible because it appears also in reflections and GI, not only the object itself. (denoising could help a bit if it's lag is large)
    But a continuous LOD mechanism can only be handled by complete BVH rebuild each frame. Continuos LOD is what we want, but RTX now makes it harder to get there.
    Continuous LOD is much more attractive the you might think, because it not only solves the popping issues. If done right it also opens the door to combine / blend traditional triangle rendering with volumetric shells, voxels, SDF, point clouds etc to achieve a true LOD solution.

    Of course a true LOD solution has no cracks. Seamless global parametrization resolves this issue and enables displacement mapping on any surface not just terrain. (replace the term 'displacement mapping' with any 'unlimited detail' tech you prefer here.)
    To be clear, seamless global parametrization makes those promises:
    Megatextures as seen in Rage
    Object space lighting and irradiance caching
    Displacement mapping everywhere
    Volumetric shells for diffuse stuff impossible to handle by triangles (e.g. detailed foliage)
    Continuous LOD
    ... all of this adaptable to a given performance budget. So it's not just about better performance by decreasing detail, it's also about increasing details.
    It's very hard and will take many years to make all of this work finally, but keep in mind you can not achieve 'consistency' with popping LOD levels.

    This seems the philosophy of DXR API as well: Treat a traversal as an atomic operation.
    But a traversal is O(N log N), so seeing it this way is like ignoring this cost. Texture lookup is O(1) so you can not compare this.
    Your view would be good if you worked on an offline renderer maybe, where you care about accuracy, features, and code maintenance more than on performance.
    But for games this seems very wrong to me, and it does not fit into a low level API.

    However we can no argue about this because we don't know how the hardware works. If there is no batching under the hood, the operation is atomic and you are right.
     
  8. manux

    Veteran Regular

    Joined:
    Sep 7, 2002
    Messages:
    1,406
    Likes Received:
    265
    Location:
    Earth
    One interesting thing to do would be to write benchmark and see how triangle counts and various organisation of triangles affects performance. In essence figure out how much perf is left on table if agressive lods cannot be used.

    I wonder if quake2 geometry could be tesselated and used to measure effect geometry complexity has to turing ray tracing performance.
     
    OCASM likes this.
  9. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    13,096
    Likes Received:
    3,385
    pharma and OCASM like this.
  10. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    6,936
    Likes Received:
    5,219
    vipa899 likes this.
  11. Ext3h

    Regular Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    329
    Likes Received:
    286
    O(n) for worst case degenerated BVH. Just log n on average for non-degenerated.

    Which isn't as bad as it sounds either. Mostly because it never is a full rebuild:
    [​IMG]
    Keep the bottom levels for each LOD level, and only link them into the top level as required. Top-level may need to be rebuilt, but if you are only replacing a a single bottom level instance with identical scene space bounding box, it may as well just recycle the instance slot. Making a LOD swap potentially as cheap as just finding the correct slot in the top level structure.

    The still so nasty part is if you have to rebuild the bottom level parts. Because that more or less involves rasterizing into 3D texture and then packing sparse texture into tree (or is the implementation straight out updating the tree from triangle list? Who knows...). Not so problematic either though, if you go two-step you only have the one-time memory overhead for the 3D texture, (almost) fixed function rasterization, and if the 3D texture was suitably swizzled, transformation into packed tree is just a matter of compressing the bitstream representation of the 3D texture.

    However, all of this is still insufficient for LOD. Because LOD when raytracing requires you to choose LOD based on effective cone width. (And keep in mind you need to carry that one anyway, or texture LOD doesn't work either, as seen with reflections in the Quake3 demo.) You actually may require highest level LOD for primary hit or clear reflection, and simultaneously lowest LOD for diffuse reflection for an object right in front of the camera.

    So the current structure of the top-level acceleration structure is insufficient, as it does not support any LOD for the links to the bottom levels yet. Stupid slip in the API design.
     
  12. JoeJ

    Regular Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    260
    Likes Received:
    269
    My bad, my initial N was a mistake. (confused by N rays times log n cost)

    I would not go so far with my requests and set the LOD simply relative to camera, so just once per frame but not per ray. Distant objects need only little detail for reflections as well.

    That's quite an interesting catch.
    But a hard switch on cone width would also restrict to either unique traversal per ray or too less available work for iterating over rays per batch of geometry. Likely this can only be done with alternative geometry reasonable fast?
    Also, LOD is not enough to support cone tracing. It can just help to approximate it better as you still have sharp edges and results at such 'depth continuities' would not be smooth. So you still need to jitter and denoise - no big win for the huge effort.
    So no, only makes sense if you had total control over the BVH. Vendors would need to agree on BVH and expose it all. RT cores could become incompatible... will not happen anytime soon.

    More likely they will allow for dynamic lod from camera in form of support for geometry / tesselation shaders at the cost of complete rebuilt.
    ImgTec has FF hardware for tree building, so that's one option to make it faster maybe, but your conservative raster idea is hard to beat with more FF i guess.

    Thinking about the big advantage of having geometry, LOD and material in one data structure (e.g. point hierarchy or unpractical voxels), it would be interesting to bring this to triangles.
    This could work to some degree: If you have seamless parametrization, you also can make a quadrangulation. And those quads form the same nice hierarchy as texels and mip maps.
    But it would be no solution for everything, and it would break above a certain level. Above this level connectivity breaks, no more continuous LOD.
    Above this one could fall back to progressive meshes and geomorphing so at least the geometry is continuous but not textures.
    Promising, but too complicated for FF. Sadly this is true for almost anything nowadays.

    ... I need to add, although RTX adds quite a restriction when thinking about LOD, this does not mean it is not possible.
    The solution is to change LOD only on a subset of geometry per frame and rebuilt BVH just for that.
    You want to do this in any case most likely, so the restriction is not that bad. (Problem is large continuous geometry like terrain and how to avoid visible discontinuities when splitting it into pieces.)
     
    #1052 JoeJ, Feb 13, 2019
    Last edited by a moderator: Feb 13, 2019
  13. jlippo

    Veteran Regular

    Joined:
    Oct 7, 2004
    Messages:
    1,236
    Likes Received:
    289
    Location:
    Finland
    Unreal Engine 4.22 preview released.
    https://forums.unrealengine.com/unr...d-releases/1583659-unreal-engine-4-22-preview

    Disappointed to see that there is no refractions yet.
     
    OCASM likes this.
  14. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    39,581
    Likes Received:
    9,605
    Location:
    Under my bridge
    Aren't those included under reflections? it's the same thing only sending the ray through the other side of the surface. If not, it shouldn't be at all hard to rework the reflection shader to make a refraction shader.
     
  15. milk

    Veteran Regular

    Joined:
    Jun 6, 2012
    Messages:
    2,605
    Likes Received:
    2,051
    Technically, though, this kind of parametric continuous LOD JoeJ describes (and which I used to think was the unavoidable direction we'd eventually go but am still waiting) is only actually proper for primary visibility, but for other day tracing effects, rays often diverge in ways that require more or less detail in different areas than what can be seen from the camera perspective.
    Imagine for example a curved glass or mirror that forms a kind of magnifying glass effect, ideally you'd want rays hitting that lens to reach a higher detail LOD because the objects reflected/refracted would appear bigger than they are. Of course, this is a hypothetical we are far from considering a real problem anytime soon considering the much cruder problems we face with modern rendering.
    A more realistic example may be the long stretching shadow of a distant object that hits a surface close to the camera. That might give away your LOD system and look distracting.
     
    #1055 milk, Feb 13, 2019
    Last edited: Feb 13, 2019
  16. jlippo

    Veteran Regular

    Joined:
    Oct 7, 2004
    Messages:
    1,236
    Likes Received:
    289
    Location:
    Finland
    Yes, for simple surfaces like plane of glass it should be possible to use reflection to get it done.. (Basically do internal trace in shader and spawn ray outward according to edge you hit.)

    For more complex surfaces there is additional complexity and behaviour due index of refraction and possibly few additional traces within object before ray gets out. (And possible reflection rays spawned in within.)
     
  17. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    39,581
    Likes Received:
    9,605
    Location:
    Under my bridge
    We're a long way from modelling that in games! Internal reflection is affected by volume of the material and IOR and can't be solved by a simple ray direction. You could fake it by using a texture map of refraction direction which is probably the only sane way, and that should be nicely mappable to the existing raytrace shader code, just using a normal map.
     
  18. vipa899

    Regular Newcomer

    Joined:
    Mar 31, 2017
    Messages:
    922
    Likes Received:
    348
    Location:
    Sweden
    All those last postings here makes me want this RT tech even more in ps5.
     
    #1058 vipa899, Feb 13, 2019
    Last edited: Feb 13, 2019
    OCASM likes this.
  19. JoeJ

    Regular Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    260
    Likes Received:
    269
    That's an argument, especially the shadow example you gave. However, the shadow would change continuously at least, and it would not look less low poly than rocks in actual open world games do. Even real life sun shadows are not so hard to expose the tricks in a distracting manner i hope.
    The main problem is the effort which goes much beyond what current engines can do. The parametric stuff i propose can only handle geometry of limited complexity. It can make the surface very detailed, but it can not help with something like a tree and its branches at multiple scales.
    For this i see only two options to improve over billboards: point splatting or volume rendering. Only the latter is compatible with RT, but the former would be much faster and easy. As long as we are in hybrid era it's very interesting.
    Another question: Is it acceptable to exclude diffuse stuff from RT completely? I'm fine with BFV removing some foliage.

    But complexity is insane in any case. Think of an example of fine grained vegetation ranking along a tree. Close up: All triangles (+ displaced detail?). Medium distance: Put the vegetation into the texture of the tree trunk. Far: Trunk still triangles, but branches as points. Very far: Points only.
    Notice how the ranks change between point an triangle representations forth and back. Alone the preprocessing tools make my head hurt. Difficult to think of a practical compromise and hard to make transitions smooth.
     
    milk likes this.
  20. JoeJ

    Regular Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    260
    Likes Received:
    269
    Metro test with comparison screenshots and benchmark:
    http://www.pcgameshardware.de/Metro...hnik-Test-Raytracing-DLSS-DirectX-12-1275286/

    Text quite positive. Interesting:

    "Raytracing in Metro Exodus affects the sun and sky. Local light sources are not ray traced due to the stealth gameplay mechanics. This allows the player to have a better idea of when and where they can or cannot be spotted. Toggling ray tracing does not change stealth gameplay."

    Further they say some objects like foliage are exlcuded from RT, but there is no explantation about the difference from settings.
    DLSS does no wonders for IQ but helps with perf.
     
    pharma, vipa899 and OCASM like this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...