AMD: Navi Speculation, Rumours and Discussion [2019]

Discussion in 'Architecture and Products' started by Kaotik, Jan 2, 2019.

  1. w0lfram

    Newcomer

    Joined:
    Aug 7, 2017
    Messages:
    217
    Likes Received:
    38

    Right because Nvidia chose a PUDDLE, for that effect. Yet globals and other effects are missing. Turing can only do so much, while rdna2 can do much more. The demo shows how robust rdna2 is, over focused ray tracing, that Nvidia likes to boutique for players in puddles.
     
  2. ninelven

    Veteran

    Joined:
    Dec 27, 2002
    Messages:
    1,712
    Likes Received:
    131
    Why is this person allowed to exist here and continue to post in such a manner unchecked? B3D had pretty high standards once. Sure, they have been relaxed a great deal over time, but have they really fallen to 0?
     
    Picao84, Samwell, pharma and 6 others like this.
  3. BRiT

    BRiT Verified (╯°□°)╯
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    15,471
    Likes Received:
    13,968
    Location:
    Cleveland
    PC-Land reached crazy territory years ago, not even console-land is this silly.
     
  4. Naed

    Joined:
    Sep 8, 2016
    Messages:
    3
    Likes Received:
    6
    <

    Just wait for the next generation of console posters to arrive after the launch of the ps5 etc :)
     
    Kej, Picao84, pharma and 2 others like this.
  5. BRiT

    BRiT Verified (╯°□°)╯
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    15,471
    Likes Received:
    13,968
    Location:
    Cleveland
    We've already had the influx for Silly Season and it's well handled.
     
  6. Malo

    Malo Yak Mechanicum
    Legend Veteran Subscriber

    Joined:
    Feb 9, 2002
    Messages:
    7,614
    Likes Received:
    3,677
    Location:
    Pennsylvania
    I'm not all over the console forum but keeping up with the various major threads over the past year or so, I think the PC areas have been worse. Mainly since the release of RTX and the ensuing marketing spam and red vs green wars.
     
  7. bbot

    Regular

    Joined:
    Apr 20, 2002
    Messages:
    750
    Likes Received:
    13
    I think by now that it's clear that RDNA2 uses the ray tracing method described by that AMD patent, wher the intersction engine is in the tmu. If you the calculation for the case of XSX, 208tmus X 1.825Ghz = 379.6G intersections per second, which matches the figure given by Microsoft, in the Eurogamer article.. Yet according to Mark Cerny, for the case of the PS5, the intersection engine is in the CU. The PS5 is supposed to be using RDNA2.

    Could the "intersection engine" be a shader program running on the shaders, for the case of the PS5?
     
  8. szatkus

    Newcomer

    Joined:
    Mar 17, 2020
    Messages:
    30
    Likes Received:
    23
    Well, TMU is in the CU as well.
     
    Lightman, chris1515, fellix and 2 others like this.
  9. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    9,070
    Likes Received:
    2,941
    Location:
    Finland
    As @szatkus pointed out, TMUs are in the CUs too.
    For future reference, this is how RDNA1 Dual Compute Unit looks like:
    upload_2020-3-30_18-54-33.png

    Intersection Engines will be added in the yellow part on the right, the "TMU", which includes filtering and mapping units at the moment, wether it's separate new blocks or added functionality to current ones I'm not sure
     
    VitaminB6, Lightman and BRiT like this.
  10. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,505
    Likes Received:
    424
    Location:
    Varna, Bulgaria
    So, can Nvidia just add another TMU quad per SM in Ampere to increase the intersection test rate, or it will be limited by cache/memory data bandwidth?
     
  11. Mat3

    Newcomer

    Joined:
    Nov 15, 2005
    Messages:
    165
    Likes Received:
    8
    The ray tracing fixed function hardware is for a BVH I assume, which at a high level is triangles within boxes within more boxes. Voxels are boxes too, so if game had part of their geometry made up of voxels could the box testers be used for that too? Or it would have to be a specific BVH format? Also, could these box and triangle testers be used for physics collisions, like testing if a box hit another box? The Powervr marketing material always lists physics as a possible use for their ray tracing.
     
  12. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    9,070
    Likes Received:
    2,941
    Location:
    Finland
    No, NVIDIA uses different approach with separate fixed function(? not 110% sure on this) "RT core" within each SM, which handles everything. They can probably build it beefier though and of course there's more of them the more SMs there is.
     
    BRiT likes this.
  13. szatkus

    Newcomer

    Joined:
    Mar 17, 2020
    Messages:
    30
    Likes Received:
    23
    As far as I understand RTX core is a kind of specialized unit like for example FPU. It's used as a part of ray tracing shaders coupled with plain old computing resources*.

    Of course they can just add more RTX cores. We already see this in the past, when at the beginning ROPs and "compute pipelines" (that was the term?) were in 1:1 proportions. Now AMD has usually 64 pipelines for every ROP.

    * That's why moving RT to separate chip is impossible. Sorry, when I see a news about anything related to ray tracing there's always at least one comment like "AMD should add a chip just for ray tracing".
     
  14. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    9,070
    Likes Received:
    2,941
    Location:
    Finland
    NVIDIAs RT-core does all the ray traversal & intersection stuff on it's own with no CUDA-core involvement (after they send the probe to the RT core), only after the actual shading is done on CUDA-cores. On AMD stream-processors will be involved in the traversal-portion (didn't double check the patent and my memory is short but bad so might need to check on that)
    Here's NVIDIAs whitepaper on theirs anyway https://www.nvidia.com/content/dam/...ure/NVIDIA-Turing-Architecture-Whitepaper.pdf
     
    pharma likes this.
  15. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    1,489
    Likes Received:
    233
    Location:
    msk.ru/spb.ru
    It doesn't handle "everything" but it does evaluate hits and chooses the next level of BLAS to trace against - something which will presumably have to be handled by CU SIMDs in RDNA2.
    There are no reasons to assume that NV's approach is any more "fixed function" than AMD's right now. The traversal of BVH is handled by dedicated specialized h/w in both cases. In case of NV this h/w has an additional function of evaluating the results of traversal without SIMDs involvement.
    I think it's also a stretch to just assume that these BVH units in RDNA2 are tied to TMUs and especially their numbers. The fact that the CU accesses them through TMU data paths means little more than the usage of said data paths and associated caches IMO. The wiring of actual units can be absolutely arbitrary for all we know - 1 BVH unit per TMU, 1 such unit per TMU quad, 1 such unit per CU or maybe even 1 such unit per WGP?

    Why exactly can't NV add more RT cores into an SM if this will actually improve performance?
    Again, if we assume that Turing RT cores aren't coupled / accessed through TMUs (like in RDNA2 presumably) then this actually seem like an advantage as you can probably add (or remove) them the way you see fit, irrespective of your TMU counts per SM.
     
    iamw and pharma like this.
  16. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,580
    Likes Received:
    622
    Location:
    New York
    I'll be very impressed if AMD takes this approach and delivers usable performance. Passing every node hit or miss back to the shader core will incur crazy scheduling overhead and compete with other texturing & shading work.
     
    #2096 trinibwoy, Apr 1, 2020
    Last edited: Apr 1, 2020
  17. pTmdfx

    Regular Newcomer

    Joined:
    May 27, 2014
    Messages:
    278
    Likes Received:
    176
    I don’t see how it is more “problematic” in terms of scheduling and pipelining than, say, an atomic RMW operation (unless there are strict ordering constraints). Both are conceptually sending some data to an external blackbox and get back some results, from the CU’s perspective.
     
    #2097 pTmdfx, Apr 1, 2020
    Last edited: Apr 1, 2020
  18. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,580
    Likes Received:
    622
    Location:
    New York
    The number of calls to the external blackbox will be significantly higher in AMD's case. There will be a cost to be paid in compute capacity, cache efficiency and on-chip traffic. The AMD patent discusses returning intermediate BVH traversal results. Nvidia just returns the final triangle hit or miss.

    It's more complicated with transparent geometry that invokes any-hit shaders during traversal but the distinction still applies.
     
  19. pTmdfx

    Regular Newcomer

    Joined:
    May 27, 2014
    Messages:
    278
    Likes Received:
    176
    That's fair, and it would indeed be competing for resources in the CU memory/texturing pipeline. But there are still takes that can be made:

    1. On-chip traffic: If the traversal process can saturate the memory hierarchy, the added CU internal round-trip time could be less relevant in the end. It would burn more joules in absolute terms, but can be a drop in the sea in relative terms.

    2. Compute capacity: Compute is cheaper, relative to the lesser growth in memory bandwidth. One may then argue that the intersection engine would have already subsumed the most costly and divergent routines, relative to a complete in-shader implementation. So if what's left is cheap enough, relevancy of "having freed up more compute capacity" in practice can be disputed.

    3. Cache efficiency: That bears an assumption that loads generated by the intersection engine would trash the cache hierarchy. But alternative caching policies (skip L0/L1/L2) could be used for these traffic. The same issue applies equally to a dedicated RT core sharing the cache hierarchy.
     
  20. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,580
    Likes Received:
    622
    Location:
    New York
    Sure, we won’t know exactly how things balance out in the end and there are many factors at play.

    In Nvidia’s patent the RT core (or tree traversal unit) has its own local memory and L0 cache. Clearly much more expensive than AMD’s approach but likely faster too.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...