AMD: Navi Speculation, Rumours and Discussion [2019-2020]

Discussion in 'Architecture and Products' started by Kaotik, Jan 2, 2019.

Thread Status:
Not open for further replies.
  1. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,081
    Likes Received:
    4,304
    Location:
    Finland
    Yes, they could have, but that doesn't mean they were either.
    PS4 Pro, which Sony has confirmed to have features from Polaris and beyond was released within 6 months of Polaris GPUs.
    PS4 and XB1 were both GCN1.1 and released within couple months of Bonaire.
    So using RDNA2 should be well within reason for the known timeframes.
     
    Cuthalu likes this.
  2. Michellstar

    Regular Newcomer

    Joined:
    Mar 5, 2013
    Messages:
    662
    Likes Received:
    380
    Almost all evidence? like VRS that´s in RDNA1 right?

    We take for granted that RDNA2 will bring VRS an RT, and it´ll be aimed first to big GPUs, just like last time with vega and polaris.

    I see a parallel with base consoles, which were based in the Bonaire core (GNC2 sea islands??) that came to market before the Durango and Orbis, but they were already outed at that time.


    ups beaten
     
  3. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    2,922
    Likes Received:
    2,837
    Location:
    msk.ru/spb.ru
    On quite the contrary, almost everything we know now about next gen console APUs and RDNA2 points to the former being based on the latter, not on RDNA1. But it is of course possible that consoles will use some sort of mix of features from RDNA1 and RDNA2. No way of knowing this for sure for now unless you're under NDA.
     
    w0lfram likes this.
  4. fehu

    Veteran Regular

    Joined:
    Nov 15, 2006
    Messages:
    2,027
    Likes Received:
    973
    Location:
    Somewhere over the ocean
    For a refresh speculation keep in mind the n7p node, that in theory is an easier port compared to n7+, and offer 7% more performance at the same power.
     
    Alexko likes this.
  5. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,081
    Likes Received:
    4,304
    Location:
    Finland
    AFAIK N7P is a drop-in replacement for N7 and everything N7 will migrate to N7P as soon as TSMC gets their lines updated. It offers 7% more performance at the same power compared to N7, not N7+. N7+ supposedly offers slightly higher 10% performance uplift at same power compared to N7 as well as area benefits with 20% higher density. (Also AMD has said 7nm+ which is generally understood same as N7+)
     
  6. sniffy

    Newcomer

    Joined:
    Nov 14, 2014
    Messages:
    55
    Likes Received:
    83
    Just to clarify - obviously the topology has changed (reorganisation into WGPs and resource allocation) but the individual units (all 2560 of them in Navi 10) are unchanged from GCN 5/Vega?

    Will we see an AT deep dive come release of RDNA2?
     
  7. Ryan Smith

    Regular

    Joined:
    Mar 26, 2010
    Messages:
    629
    Likes Received:
    1,131
    Location:
    PCIe x16_1
    It's probably better to say that the throughput and features of various graphics-related units were not changed. Graphics these days is about half a layer above compute, so when you change the underlying architecture, the stuff above is not quite identical anymore.
     
  8. Nebuchadnezzar

    Legend Veteran

    Joined:
    Feb 10, 2002
    Messages:
    1,056
    Likes Received:
    322
    Location:
    Luxembourg
    A GPU in the end is nothing more than its own SoC within an SoC, there's a dozen functional blocks that interact with each other. RDNA1 revamped the compute blocks completely as well as some of the memory subsystem. The understanding is that RNDA2 will revamp the rasterisation / classical fixed function blocks - which seemingly had very little upgrades or changes in Navi over Vega (and why AMD essentially didn't talk about them at all this past generation).
     
  9. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,081
    Likes Received:
    4,304
    Location:
    Finland
    Don't forget new TMUs responsible for part of the RT acceleration, the compute block probably needs other changes too for it.
     
    BRiT likes this.
  10. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    2,922
    Likes Received:
    2,837
    Location:
    msk.ru/spb.ru
    The fact that they can use texturing datapaths for RT h/w access doesn't mean that TMUs themselves will be used for RT in any capacity.
     
  11. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,081
    Likes Received:
    4,304
    Location:
    Finland
    It's been a while since I last glanced the patent, but didn't it imply the hardware would be integrated as part of TMU, not just use its datapahts?
     
  12. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    2,922
    Likes Received:
    2,837
    Location:
    msk.ru/spb.ru
    Depends on what you mean by "integrated" and "TMU".

    [​IMG]

    Texture address unit is used for data fetching but this can be considered as "using the same datapaths", the rest is independent from texturing h/w.
     
    BRiT likes this.
  13. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,081
    Likes Received:
    4,304
    Location:
    Finland
    That pretty clearly states that AMD considers ray intersection engine being part of Texture Processor. Based on what else is listed for it, the patents "texture processor" seems to be what I've understood to be meant with "TMU"

    edit: the yellow part here, even if AMD separates them as Texture Filtering Units and Texture Mapping Units.
    upload_2020-1-31_14-9-29.png
     
    w0lfram, iroboto and BRiT like this.
  14. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    2,922
    Likes Received:
    2,837
    Location:
    msk.ru/spb.ru
    Which does no apparent texture work so why exactly is it a part of a texture processor? Patents can be weird when they are trying to get around stuff which was previously patented by some other party. Again, it all depends on what you consider to be a "TMU" - if a "TMU" has a shading unit inside it for example does this make it into something more than a TMU or not? Does a chip with such "TMUs" and no other shading capabilities don't have "shader cores" as they are a part of "TMUs" now? This can go far.
     
  15. CarstenS

    Legend Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,693
    Likes Received:
    3,733
    Location:
    Germany
    Since „shader“ is mentioned separately in the patent excerpt you (DegustatoR) posted, it seems pretty clear in this case, that the patent is basically adhering to AMDs usual nomenclature of things. IMHO, as a non-native speaker.

    edit: Which would make the two approaches interesting. Nvidias seemingly can work concurrently with shader cores and texture units, possibly blocking L1 Cache and/or shared memory to some extent. AMDs implementation seems to use some resources of the TMU and thus prefers to run concurrently with more compute heavy workloads.
     
    #1775 CarstenS, Jan 31, 2020
    Last edited: Jan 31, 2020
    TheAlSpark likes this.
  16. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    14,173
    Likes Received:
    17,607
    Location:
    The North
    I'm going to give answer this a shot in the dark - break me down. First I'll separate the patent of how it works as listed earlier and then I'll follow up with a snippet that was missed in the snippet but critical:

    (1) A shader sends a texture instruction containing ray data and a pointer to the BVH volume node to the texture address unit.
    (2) The texture cache processor uses an address provided by (1) to fetch BVH node data from the cache
    (3) The ray intersection engine performs ray-BVH node type intersection testing using the ray data [from (1)] and the BVH node data [from (2)]
    (4) The intersection testing results and indications for BVH traversal are returned to the shader [original caller from (1)] via a texture data return path.
    (5) The shader reviews the intersection results and the indications to decide how to traverse to the next BVH node

    [​IMG]

    Breakdown
    (1) the hybrid approach and using a shader unit to schedule the processing addresses the issues with solely hardware based and/or solely software based solutions.
    - So what are the known problems of purely hardware based solutions? We know the issue with purely software based solutions (too slow).
    - IIRC As noted by @JoeJ and other developers @sebbbi their biggest issue is over control of casting the rays for performance.
    - With nvidia's current solution the rays are cast for them, it is an entire black box
    - We have an entire thread about this issue here
    (2) Flexibility is preserved since the shader unit can still control the overall calculation and can bypass the fixed function hardware where needed and still get the performance advantage of fixed function hardware.
    - Engines and renderers today have been using ray casting for some time; the ability to have a custom intersection shader is something that would allow them to port directly without needing to rework things excessively https://forum.beyond3d.com/posts/2042744/
    - @JoeJ makes a heavy case for the need to remove restrictions around triangle intersection and many of our senior members and mods (@Shifty Geezer) have been on the debating side to see a (flexible) based ray tracing solution. https://forum.beyond3d.com/posts/2088217/
    (3) In addition, by utilizing the texture processor infrastructure, large buffer for ray storage and BVH caching are eliminated that are typically required in a hardware ray tracing solution as the existing VGPRS and texture cache can be used in its place, which substantially saves area and complexity of the hardware solution
    - And if understood correctly we may see some silicon savings as a result of going with this method.

    Lets take these concepts and look at 2 important aspects [f____ck me I need to do actual work instead of this]
    a) VRS Tier 2 (or MS version of it)
    b) DXR Tier 1.1

    VRS Tier 2 - Works well with having better control over ray casting: (https://forum.beyond3d.com/posts/2093848/)
    This is important because developers may be very specifically fine tuning their ray casts for a variety of things to be able to maximize performance in according to using VRS. ie, it makes it a hell of a lot easier to control performance as per above. They can fine tune where on the image they want more or less rays or no rays at all. ie: why bother with ray casting/rendering where the UI is going to block it. When you consider VRS, look at (1). The shader is the one holding the ray data you want to submit for intersection testing.

    This combined with new flexibility found in DXR 1.1
    So we see a scenario where in your standard rendering pathways for compute shader, you can freely inline or call ray tracing through executeIndirect, get the results you need and continue forward without having to go back to the CPU.

    A bit of overview of Ray Tracing Algorithm in question for those that need a refresher
    I asterisked these points because when you consider (5) on the patent, it's up to the shader after each cast to determine how to traverse the node next. That should mean they can control when they want to stop bouncing, how many rays they want bounced, possibly controlling what distance the ray should travel before stopping. It appears to me that may be easier to handle with determining your zones with VRS.
     
    #1776 iroboto, Jan 31, 2020
    Last edited: Jan 31, 2020
    Kej, w0lfram, Newguy and 6 others like this.
  17. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    14,173
    Likes Received:
    17,607
    Location:
    The North
    the TLDR of above: Developers are looking for flexibility to do what they need
    As per @sebbbi's commentary



    He can use his current setup and do his primary rays using this cone based method. And then inline/execute indirect dispatch rays directly to access the results from the BVH structure, and then continue there on with other features.
    He is no longer bound doing one thing or another. Developers will be able to freely mix and match where they like. There was some discussion about DXR needing the ability to directly access the BVH structure. But perhaps that no longer needed. Let developers do what they want, how they want, and if they want results from BVH tree, inline/executeIndirect a call from your results (of your primary rays/cones) and retrieve the new intersection values from BVH.
     
    milk, Leovinus, TheAlSpark and 2 others like this.
  18. upnorthsox

    Veteran

    Joined:
    May 7, 2008
    Messages:
    2,106
    Likes Received:
    380
    That's not what he's saying. What he's saying is he, like everyone else, really needs a HW solution for BVH but sans that, then don't put partial solutions in the way that block him from implementing his own workaround.
     
  19. TheAlSpark

    TheAlSpark Moderator
    Moderator Legend

    Joined:
    Feb 29, 2004
    Messages:
    22,146
    Likes Received:
    8,533
    Location:
    ಠ_ಠ
    hum... something something Compute Tunneling (to help mitigate such penalties) :?:

    Whitepaper
     
    w0lfram, iroboto, BRiT and 1 other person like this.
  20. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    2,922
    Likes Received:
    2,837
    Location:
    msk.ru/spb.ru
    I'm not entirely sure what you're trying to say as a reply to what I've posted.

    The fact that BVH traversal acceleration unit may be located inside a "texture processor" in RDNA2 doesn't make it any more or less flexible than what Turing has - and please do note that we know nothing about the placement of RT h/w in Turing as NV hasn't actually disclosed this information.

    The fact that hit evaluations are happening and data paths are used differently than they *seem* to be happening and used in Turing doesn't mean that Turing can't do this in a similar fashion - in fact, NV has already officially confirmed that Turing h/w will support DXR 1.1.

    Thus far from what can be discerned from the patent (and we don't really know how this will actually transfer into the h/w; AMD files a lot of GPU patents and not all of them are getting an obvious physical implementation) AMD's approach trades general RT performance for die area savings.

    I also don't see what VRS has to do with this as VRS is a modification of MSAA/depth testing h/w which is typically residing in ROPs/RBEs (thus is also not a part of texturing h/w). The question of what happens currently on Turing when you trace a ray per a "VRSed" pixel is an interesting one and I hope someone more knowledgeable can answer it here - however Turing's h/w approach doesn't limit ray casting to full screen or any specific number of rays per each screen pixel which was already shown in BFV's RT optimizations which varies the RPP number depending on content inside a scene.

    AFAIK this is inaccurate. The only part which is a "black box" is a BVH structure traversal by a ray which is cast by the application in a way it see fit. This is most likely will be similar in AMD's approach from the patent.
     
    DavidGraham, iroboto and pharma like this.
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...