AMD: Navi Speculation, Rumours and Discussion [2019]

Discussion in 'Architecture and Products' started by Kaotik, Jan 2, 2019.

  1. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,137
    Likes Received:
    2,939
    Location:
    Well within 3d
    An RDNA SIMD32 register file has twice the register file capacity (in terms of KB) of a GCN SIMD16 register file.
    Since RDNA's register IDs correspond to 32 work-items, a register is individually half the length of the 64-wide register of GCN.

    The register file has 4x as many individually addressable registers, although they are half-size. If a single CU in RDNA were asked to support the same number of work-items as GCN (2x Wave32 wavefronts or 1 Wave64 per GCN wavefront without certain optimizations), it would have the same register capacity per work-item.
    I'm trying to find the slide or reference that characterized RDNA has slightly improving register pressure, since I didn't see it being considered "gone".

    The finer granularity of Wave32 might allow for shaders that are particularly poor at utilizing 64 threads from having to allocate a full 64-wide wavefront context.
    Wave64 has a sub-vector execution loop that takes advantage of how Wave64 works with 2x Wave32 instructions and splits the execution into two Wave32 halves and treats each half as a single iteration of an internal loop.
    Registers used for results internal to that loop can be assigned to the same Wave32 register ID, as the software knows the intermediate results of the halves are separated in time--saving some capacity if that mode is used.

    Perhaps someone has parsed the ISA doc better than I have, but I didn't see reference to the LDS allocation values in the wavefront context being extended to give them the ability to allocate more LDS.

    The issue latency is 1/4 of GCN. For scalar forwarding, there appears to be a 2-cycle latency, which is better though not relevant as far as scalar register file footprint goes. It's scalar, so no savings in register width. It's also RDNA, which has shifted to a static 128 registers per wavefront, which in capacity terms is worse than GCN though it's rendered moot by the architecture having enough register file space to hard-wire the allocation.

    The vector result forwarding latency is worse than GCN, with RDNA needing 5 clock cycles before a dependent instruction can issue versus GCN's 4.
    Depending on what the limiting factor is for getting from the point of a temp register being written and its being consumed, there would be cases where a temp generated ahead of a serial chain would live longer with RDNA than GCN.
    Wave64 can provide register savings, within the limits of sub-vector mode. If running in Wave32 on a workload that can readily use up 64 work-items, needing 2x the wavefronts to get the same number of work-items leaves overall occupancy similar.
     
    w0lfram, Silent_Buddha, Gubbi and 2 others like this.
  2. bridgman

    Newcomer Subscriber

    Joined:
    Dec 1, 2007
    Messages:
    58
    Likes Received:
    102
    Location:
    Toronto-ish
    Section 10.3 is the closest I remember seeing:

    In WGP mode, the waves are distributed over all 4 SIMD32’s and LDS space maybe allocated
    anywhere within the LDS memory. Waves may access data on the "near" or "far" side of LDS
    equally, but performance may be lower in some cases.
     
    Lightman and JoeJ like this.
  3. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,137
    Likes Received:
    2,939
    Location:
    Well within 3d
    I saw where it references the ability for wavefronts on either side of the dual-CU to access data in the other half--subject to unspecified performance limits.
    I interpreted the statement about allowing the option to double the accessible LDS for certain workgroups to mean allowing an allocation twice the size of what a GCN wavefront could allocate. However, the sections I saw that referenced allocation like M0 or the LDS_ALLOC didn't appear to be different. Some of the other items like offset values for addressing also didn't change in stride or length, but I may have missed some change that would allow a larger allocation or allow a wavefront to access the portions of a larger allocation.
     
  4. JoeJ

    Regular Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    558
    Likes Received:
    644
    Yeah, that's what i thought, but hard to be sure. Being no hardware guy i tend to ignore details that won't affect programming. For example i never cared GCN has 16-simds and just assume 64 threads in lockstep. Also the scalar / vector instruction cycle ratio is not important to know because it never affects any decisions - there are just no options.
    I remember a lengthy discussion with another dev who thought RDNA would be twice as fast in general because 32 vs 16 simd. I ended up saying the 32 simd takes twice the time and we would see twice the TF numbers if this would be true.
    Trying to understand hardware can be quite difficult nowadays :)

    I also don't think it's possible yet, but it might become an option requiring little changes. (Being just a matter of driver work sounds too good to be true i guess.)
    Overall amount of LDS and registers seems just right most of the time, but still many shaders do not use LDS at all while others would benefit a lot form having more and pairing CUs seems an opportunity.

    I think it would make sense to manually choose WGP mode or not (also on desktop via extensions). Likely the driver can't be right all the time, and it might be some wins for free.
     
  5. Digidi

    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    227
    Likes Received:
    99
    @CarstenS Did you wrtote down somewhere teh beyond3d suite date?
     
  6. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    10,073
    Likes Received:
    4,650
    Navi 12 and Navi 14 could be released very shortly:

    https://www.pcgamesn.com/amd/navi-12-navi-14-rx-5600-rx-5500-october-15-launch

    It seems AMD made an "urgent" commit to the Mesa 3D library for Navi 12's PCI ID. The next release is October 15 so there's speculation saying if they didn't want to wait until the next Mesa driver release it's because the cards are coming out before that.

    I'm not familiar with Linux driver stuff, and I also don't know of any event between now and October 15 where AMD would announce the new graphics cards.
    OTOH, mid/low-rangers not always get released with great formal announcements.
     
    Per Lindstrom likes this.
  7. TheAlSpark

    TheAlSpark Moderator
    Moderator Legend

    Joined:
    Feb 29, 2004
    Messages:
    20,814
    Likes Received:
    5,915
    Location:
    ಠ_ಠ
    I don't suppose they'd also be mobile chips i.e. Surface Event on Oct 2nd?
     
    #1407 TheAlSpark, Sep 20, 2019
    Last edited: Sep 20, 2019
  8. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    16,342
    Likes Received:
    5,313
    Hmmm, if a Navi card releases in the 100-150 USD range, I'd be interested. It's about time I finally replaced and retired my REALLY old Radeon 5450. :p That said, I'm not sure any of these upcoming cards are planned to go that low at launch.

    Regards,
    SB
     
  9. iMacmatician

    Regular

    Joined:
    Jul 24, 2010
    Messages:
    774
    Likes Received:
    202
    On a related note, the rumored 16"–16.5" MacBook Pro is supposedly going to be launched later this year. Could one of the two upcoming Navi chips also be used for this product?

    The 15" MacBook Pro already has Vega Mobile as an option, so that would be a lower bound for the performance of the GPU in the 16"–16.5" MBP.
     
  10. Leovinus

    Newcomer

    Joined:
    May 31, 2019
    Messages:
    29
    Likes Received:
    8
    Location:
    Sweden
    Certainly would be nice if Apple were to switch out Polaris for Navi. Vega Mobile is a "high end" upgrade option that doesn't quite feel like it's worth it unless you require the compute improvements. Not to mention that I feel Apple should have just replaced the Polaris chips with Vega whole sale instead of this upgrade to begin with. But I assume HBM costs and profit margins had something to do with that...
     
  11. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    10,073
    Likes Received:
    4,650
    Lightman likes this.
  12. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,254
    Likes Received:
    1,937
    Location:
    Finland
    Considering they already launched 600-series for OEMs with Polaris this year I doubt they'd do that. Also RX 5500 https://gfxbench.com/device.jsp?ben...ws&api=gl&D=AMD+Radeon+RX+5500&testgroup=info
     
  13. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    10,073
    Likes Received:
    4,650
    Lightman likes this.
  14. PSman1700

    Regular Newcomer

    Joined:
    Mar 22, 2019
    Messages:
    607
    Likes Received:
    154
  15. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    10,073
    Likes Received:
    4,650
  16. tunafish

    Regular

    Joined:
    Aug 19, 2011
    Messages:
    546
    Likes Received:
    182
    There is no connection whatsoever with the die size choices of the Navi family chips and the nextgen consoles. What do you mean?
     
    Kaotik likes this.
  17. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,254
    Likes Received:
    1,937
    Location:
    Finland
    If we assume Navi 14 is a midranger and Navi 12 bigger than Navi 10, what's 5300 made of? I know some have suggested Polaris, but I doubt they'd do that after releasing lowend Polaris' as 600-series this year already
     
  18. TheAlSpark

    TheAlSpark Moderator
    Moderator Legend

    Joined:
    Feb 29, 2004
    Messages:
    20,814
    Likes Received:
    5,915
    Location:
    ಠ_ಠ
    1 Extreme dual-compute unit ?

    :p

    -----
    Maybe just cut the Navi 14 in half and call it a day? Assuming there are 2 NSE's & 4x3 DCUs on there. i.e. 5300 = 1NSE & 2x3 DCUs

    ^if 5700 is defined as 2NSEs & 4x5 DCUs (40CUs)

    a miserable little pile of acronyms
     
    #1418 TheAlSpark, Sep 28, 2019
    Last edited: Sep 28, 2019
  19. Frenetic Pony

    Regular Newcomer

    Joined:
    Nov 12, 2011
    Messages:
    346
    Likes Received:
    89
    A higher end chip already, without nearly as much fanfare as they put in for the midrange one? I can see a half a Navi 10 getting released, an RX580 equivalent for cheap before christmas, and it wouldn't need a "big press buildup!" But a higher end one right now seems odd.

    As for the low end, isn't that the leaked 12/24 CU chip? Assuming the arch isn't cache/bandwidth bound compared to CU's that could be good value. Even if it is bandwidth bound there's already higher clocked GDDR6 to buy.
     
  20. Bondrewd

    Regular Newcomer

    Joined:
    Sep 16, 2017
    Messages:
    523
    Likes Received:
    240
    But there was no fanfare for midrange one either.

    They just showed all the relevant info at E3 and that's about it.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...