AMD: Navi Speculation, Rumours and Discussion [2019-2020]

Discussion in 'Architecture and Products' started by Kaotik, Jan 2, 2019.

Thread Status:
Not open for further replies.
  1. Qesa

    Newcomer

    Joined:
    Feb 23, 2020
    Messages:
    27
    Likes Received:
    46
    4*52 CUs*1.825 GHz gives me 379.6 billion intersections/second. I suspect it's a theoretical figure assuming full occupancy much like TFLOPS. Nvidia's "10 gigarays" was actual throughput against some actual model -- ultimately the numbers aren't at all comparable without knowing both the actual achieved occupancy and the number of intersections needed to compute a ray
     
  2. Lurkmass

    Regular Newcomer

    Joined:
    Mar 3, 2020
    Messages:
    350
    Likes Received:
    391
    With consoles, you can create a custom BVH to feed to the GPU. Depends on how 'deep' the BVH tree is so it could easily be 380 billion rays/second in the case of a full screen quad which contains a single BVH node or it could be 38 billion rays/second with a 10 node deep BVH.

    As I explained previously in the above a BVH's layout structure is a 'tree' so some 'branches' may require upto 10 node traversals to reach the leaf node or on others it could be as little as 5 node traversals in total. How 'deep' a BVH can be is variable in practice.

    On consoles it's possible to design your own specific BVH optimized for your content. You could have a 1 node deep BVH but you will not get very many rays when the geometry being tested against represents a very small area of the BVH. You could have a 20 node deep BVH so this will give a very 'tight' BVH bound with a high chance of a hit rate but this will end up needing many traversals thus you still won't get many rays this way either. Developers will have to find their ideal balance for their content in terms of trading off between how 'deep' their structure will be to ensure a high enough hit-rate or reduce the 'tightness' to do more intersection tests to achieve a higher ray count. Every TMU comes with it's own intersection engine so it's not a surprise to see the number 4 being a multiplier since every CU comes with exactly 4 TMUs.

    With other vendors giving a 'hard' ray count figure on their hardware it could imply that they have a fixed BVH structure.
     
  3. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,584
    Likes Received:
    4,310
    What are the advantages and disadvantages of a fixed BVH structure?
     
  4. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    2,405
    Likes Received:
    1,941
    Location:
    msk.ru/spb.ru
    You can (and actually have to) design your own BVH for your content everywhere, not "on consoles".
     
  5. Lurkmass

    Regular Newcomer

    Joined:
    Mar 3, 2020
    Messages:
    350
    Likes Received:
    391
    By fixing the BVH depth in hardware, you can guarantee more predictable performance by paying a constant traversal cost per ray but this means that the BVH layout can't be customized for the developer's content so they're going to have to rely on the driver to generate the BVH for them.

    By offering a customizable BVH to developers, performance is highly variable depending on which parts of the sub-tree and how often they are getting traversed. How 'shallow' or 'deep' a sub-tree runs will let the hardware scale from higher performance to lower performance.

    Actually, that's incorrect according to a joint presentation by Microsoft and Nvidia on page 8 of the slides. The geometry format of the acceleration structure is described as 'opaque' with the "layout determined by the driver and hardware".

    On console APIs the programmer can explicitly define their BVH layout for the GPU to use.
     
  6. Dictator

    Regular Newcomer

    Joined:
    Feb 11, 2011
    Messages:
    462
    Likes Received:
    2,706
    One thing is that the cost of ray triangle intersection in HW is not going to be the same as the traversal cost. It will be slower, so the ray count there is not just a matter of counting traversals/intersections and levels and doing the multiplication. I asked MS - they told me the 380 billion number is aabb bounding traversal tests and that triangle intersections are more expensive (they did not say by how much).
     
    Newguy, tinokun, w0lfram and 4 others like this.
  7. Lurkmass

    Regular Newcomer

    Joined:
    Mar 3, 2020
    Messages:
    350
    Likes Received:
    391
    Now that you did mention it I remember another detail in the patent:

    This potentially means that testing ray-triangle intersections is 4x more expensive compared to the ray-box tests.
     
    Lightman, tinokun, chris1515 and 3 others like this.
  8. eloyc

    Veteran Regular

    Joined:
    Jan 23, 2009
    Messages:
    2,474
    Likes Received:
    1,632
    Stupid question, maybe, but when are the new GPUs supposed to be ready? What hardware is running the raytracing tech demo? I'm a bit worried regarding RTRT in comparison to Nvidia's solution, because it all seems a bit obscure. Why aren't they showing demos with current games which feature RTRT? It seems as if they're not confident enough in their own solution.
     
  9. Bondrewd

    Veteran Newcomer

    Joined:
    Sep 16, 2017
    Messages:
    1,231
    Likes Received:
    575
    ~Q3 or so.
    Depends on how it goes now lmao, with both customers and supply chains being lowkey on fire.
    Either console silicon or N21.
    Pretty sure AMD DXR drivers are WIP still.
     
    Entropy and eloyc like this.
  10. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    11,309
    Likes Received:
    1,944
    Location:
    New York
    In either case any quoted rays per second metric is pretty meaningless as there is no standard measure for comparing theoretical performance across IHVs. We need a well defined RT equivalent of a texture or pixel throughput test.

    I do like AMD’s intersections per second more than Nvidia’s gigarays in that it provides some insight into max hardware capabilities.

    Maybe 3DMark will do us a favor and whip up a few theoretical tests. Deep bounding box traversal with a few triangles in each leaf node. And triangle intersection with some trivial number of bounding boxes.
     
    no-X, pharma, chris1515 and 1 other person like this.
  11. OlegSH

    Regular Newcomer

    Joined:
    Jan 10, 2010
    Messages:
    617
    Likes Received:
    1,076
    What do you mean by saying a "custom"?

    Pretty sure a bounding-box data format and precision can't be changed without loosing HW acceleration capability since HW works with fixed formats (not some arbitrary data)
    You can't change bounding volumes shapes either, this will break HW compatibility too.

    What makes you think there are no empty space optimizations in driver's BVH builder?
    In reality it will take much more tests per ray because there will be more than 2 triangles in the last node.

    You are describing an offline BVH creation here, what if there are many dynamic objects, which can be moved or destructed like in BFV?

    This couldn't imply anything, that's just a real number for the simplest case - primary rays for a high poly model
     
    #2071 OlegSH, Mar 23, 2020
    Last edited: Mar 23, 2020
    pharma likes this.
  12. w0lfram

    Regular Newcomer

    Joined:
    Aug 7, 2017
    Messages:
    254
    Likes Received:
    48

    BFV doesn't have any ray tracing on things that move... it static ray tracing.
     
  13. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,584
    Likes Received:
    4,310
    My god man! Please help yourself with some information before posting! Just think for a moment, what would be the difference between static ray tracing and simple cube map reflections?

    Reflective surfaces in Battlefield V reflect anything that moves. Down to the tiny fire flares of rocket tales.
     
    #2073 DavidGraham, Mar 23, 2020
    Last edited: Mar 23, 2020
    Picao84, pharma, Rootax and 1 other person like this.
  14. Lurkmass

    Regular Newcomer

    Joined:
    Mar 3, 2020
    Messages:
    350
    Likes Received:
    391
    You can change the tree structure itself and it can be programmer defined too because the hardware's shader unit tracks the BVH traversal state itself.

    I never implied that the driver doesn't do empty space optimization.

    By being able to customize the tree, the developers can appropriately optimize how deep parts of the BVH will be for their scene representation.​

    Sure or the developers can create a tighter BVH with the AABBs if they need to save up on the ray-triangle tests.

    I have yet to consider factoring in the cost of rebuilding the BVH regardless. There's also things like 'refitting' a BVH as well which 'degrades' the quality of the acceleration structure by reducing the hit-rate but it's cheaper than rebuilding the whole BVH.

    Of course, I only raised this as a possibility since there's not a lot of low level details revealed about their number.

    The number could also assume a perfect 100% hit-rate since it fits the area of the scene representation.
     
    #2074 Lurkmass, Mar 23, 2020
    Last edited by a moderator: Mar 23, 2020
  15. JoeJ

    Veteran Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    1,302
    Likes Received:
    1,564
    Do you know what's the branching factor for AMD? Patent mentioned 4. If this would be flexible (which i doubt) we could trade lower tree depth vs. higher branching factor.

    You mention the number of triangles in leaf nodes has no maximum. But how is it with the bounding box shape?
    Personally i would be interested in dividing geometry into small patches (call it meshlets if you want).
    For LOD i'd like to geomorph a number of such small patches to fit a lower res parent patch, and finally drop the detailed patches and replace then with the parent. Kind of hierarchical progressive mesh for the whole static scene.
    (In contrast to the proposed stochastic solution for LOD, this would prevent a need to teleport rays to a lower detailed version of the scene which causes divergent memory access.)
    To make this compatible with RT, it would be necessary to enlarge bounding volumes so they bound the whole morphing transition of mesh patches. Technically that's surely possible an any HW.
    And secondly it would be necessary to pick a BVH node, declare it a leaf, set triangles to it and zero the child nodes pointer(s). If this works it would be compatible with BL-AS and TL-AS approach and cause no other extra work.
    So, do you think consoles could eventually allow to do this?

    Finally, MS mentioned RT works with mesh shaders in their DX12 Complete presentation.
    But i have no idea how this could work.
    How to know the bound in advance?
    And does each (or a small number of) ray(s) endup processing a meshlet hundrets of times?
    Sounds a terrible idea. More likely there is a way to store mesh shader results in memory for RT reuse, but then i still don't get how a full BVH rebuild could be avoided.
     
    pharma likes this.
  16. Lurkmass

    Regular Newcomer

    Joined:
    Mar 3, 2020
    Messages:
    350
    Likes Received:
    391
    There doesn't appear to be any limits to how the tree can be structured! It's implied that both the depth and the number of child nodes is up for the programmer to decide. On consoles you can customize the tree structure of the BVH in nearly any way.
     
    JoeJ likes this.
  17. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,370
    Likes Received:
    317
    Location:
    San Francisco
    That makes no sense.

    Fixing the acceleration structure depth either implies there is no upper bound on the number of children an internal node can have, making any efficient software or hardware implementation a lost battle, or that you always have to traverse the maximally deep tree in its entirety, which would also kill perf/defies the point of having an acceleration structure in the first place.

    I believe you might be confusing the acceleration structure branching factor with its depth.
    If you are implying a software implementation could change the branching factor for different parts of the tree... that's definitely a possibility, but not a particularly useful one though. There are lots of reasons for this, ranging from making traversal unnecessarily complex (variable branching factor traversal stack?!) with little to show for it, to quickly coming to terms the only branching factors you want to use are the ones that fits nicely in one or more cache lines, if you don't want to often fetch data from memory you'll never use.

    Also the idea that game developers are going to spend considerable amount of time developing their own acceleration structure is for the birds. There is a very vast literature on the subject and all the low hanging fruits have been picked a long time ago. There is no magical BVH (or any other acceleration structure) out there that will suddenly give you much better performance, unless one starts from zero ignoring the last 2 decades of publicly available research on the subject.
     
    Dale Cooper, Alexko, tinokun and 11 others like this.
  18. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    14,798
    Likes Received:
    6,984
    Wow a well respected poster returns.
     
  19. JoeJ

    Veteran Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    1,302
    Likes Received:
    1,564
    Open ended branching factor seems indeed unpractical (as said above), but choosing 4 or 8 could make sense depending on HW implementation.

    Sounds pretty good. :)
     
  20. Lurkmass

    Regular Newcomer

    Joined:
    Mar 3, 2020
    Messages:
    350
    Likes Received:
    391
    TBH just having those several modes for branching would be enough in most practical cases. I can't really see anything above 8 being all that useful since memory fetching overhead comes into play with larger nodes.
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...