AMD: Navi Speculation, Rumours and Discussion [2019]

Discussion in 'Architecture and Products' started by Kaotik, Jan 2, 2019.

  1. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,496
    Likes Received:
    910
    Interestingly, Lisa Su said at CES that heterogeneous computing was the answer to the slowing down of Moore's Law. I wonder whether that vision goes down to the sub-CU level.
     
  2. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,530
    Likes Received:
    875
    I agree. It's not a simple problem evaluating pros and cons; You might save power evaluating transcendentals using special purpose units, but FMA workloads at iso performance would consume less power (power ≃ clock³).

    Cheers
     
  3. Esrever

    Regular Newcomer

    Joined:
    Feb 6, 2013
    Messages:
    594
    Likes Received:
    298
    Is HSA still alive? I haven't actually heard them talk much about it for a long time.
     
  4. BoMbY

    Newcomer

    Joined:
    Aug 31, 2017
    Messages:
    68
    Likes Received:
    31
    ROCm is a pretty alive and well "HSA Compliant Runtime and Driver for AMD RADEON GPU’s"
     
  5. yuri

    Newcomer

    Joined:
    Jun 2, 2010
    Messages:
    179
    Likes Received:
    147
    HSA evolved from "enabling GPGPU on your APU for your Excel sheet and Facebook Video chat" to old fashioned AI/HPC stuff.
     
    Ike Turner likes this.
  6. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    10,037
    Likes Received:
    4,615
    Perhaps because apps who benefit from GPU compute have already implemented OpenCL and CUDA half a decade ago, and nowadays with 4+ cores being the norm, the biggest bottleneck in 99% of daily use is that pesky javascript code in the browser or other single-threaded stuff.

    Facebook video chat I'd guess uses the fixed function video codec units, so I don't know where that came from.
     
  7. Ferman

    Joined:
    Sep 30, 2018
    Messages:
    1
    Likes Received:
    0
    Rumours are Navi needs a re-spin. Maybe it had some aggressive changes to improve clock scaling.
     
  8. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    10,037
    Likes Received:
    4,615
    Redgamingtech, the website who leaked Radeon VII 2 weeks prior to launch has a new rumor/report on Navi.
    It doesn't say if the source is the same as the one from December, though it does say it's a source that was proven true in the past.

    http://www.redgamingtech.com/navi-a...-july-more-powerful-navi-launching-next-year/


    So to summarize:

    - Low to Midrange Navi chips replace Polaris 11 up to Vega 10, announced in June to launch in July
    - High-end Navi that replaces (and hopefully upgrades upon) Vega VII to launch in 2020


    Then we have Arcturus also announced for 2020, though that could have slipped by now.


    It could be that AMD is targeting a "tick-tock" of sorts with GPU architectures within 6 month periods:
    Q1'19: 7nm high-end Vega (tock)
    Q3'19: 7nm low/mid-end Navi (tick)
    Q1'20: 7nm+ high-end Navi (tock)
    Q3'20: 7nm+ low/mid-end Arcturus (tick) - pure speculation
    Q1'21: 5nm high-end Arcturus (tock) - pure speculation


    That would require perfect execution, obviously, which we shouldn't expect from AMD.
    7nm+ being TSMC's 7nm EUV, where yields should be significantly greater so chips wouldn't necessarily clock higher nor be smaller but could be significantly larger.
     
  9. del42sa

    Newcomer

    Joined:
    Jun 29, 2017
    Messages:
    169
    Likes Received:
    90
    I don´t think AMD will release such chip especially when they announce next gen GPU in 2020. But that doesn´t prevent some big Navi rumours re-emerges over and over again....
     
  10. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    10,037
    Likes Received:
    4,615
    Large Navi would be old architecture to act as pipe cleaner for 7nm EUV, like Vega 20 is for 7nm DUV.
    Arcturus would be new architecture on then-known 7nm EUV.

    As I suggested, a tick or tock every 6 months would be extremely optimistic for AMD, let alone RTG.
     
  11. del42sa

    Newcomer

    Joined:
    Jun 29, 2017
    Messages:
    169
    Likes Received:
    90
    yes, the same people said about Polaris as "pipe cleaner" for Vega10 althought this tick-tock never materialized , then the same says about Vega20 being "piper cleaner" for Navi and Navi for next gen, etc. GCN architecture is almost 9 years old, why should AMD bet it´s future on it ? AMD desperately needs something fresh. Old GCN doesn´t seem like right way to achieve it. Hope that small NAVI is the last GCN gpu from AMD...
     
    #51 del42sa, Jan 17, 2019
    Last edited: Jan 17, 2019
  12. BRiT

    BRiT (╯°□°)╯
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    12,651
    Likes Received:
    8,958
    Location:
    Cleveland
    Why would it need a respin when its being designed by Sony? Is Sony not as great as some claim?


    :runaway:


    Sorry, I couldnt resist. Feel free to delete post after the lulz.
     
    entity279, AlphaWolf and Malo like this.
  13. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,135
    Likes Received:
    2,935
    Location:
    Well within 3d
    It would take an encoding change for the architecturally single-issue ISA, since the compiler needs to determine whether instructions can go the primary or core slots instead of the hardware checking for dependences.
    Perhaps that's a matter of yet another instruction format, which has precedent for GCN. Whether that can be fit in the current instruction lengths, or if this threatens to require a new length is unclear. (It could be company for Volta, which has gone to 128-bit instructions per the "Dissecting the NVIDIA Volta GPU Architecturevia Microbenchmarking" paper. https://arxiv.org/pdf/1804.06826.pdf)

    After that, it would seem as if the CU would treat the two lanes as separate instruction queues that are drained in the same fashion as before.

    The data output cache on the other side might be a somewhat larger departure. It seems like the hardware does a bit more checking in order to make a hit in the cache, or perhaps the decode and queuing process in the front end has a small table of cache slots and last-used architectural registers to override the source operands of subsequent instructions.

    From the ISA docs, I do not see any reference to wait states for transcendental operations. If such an instruction actually did require 4 vector cycles to fully output the results for all waves, presumably co-issue would inject the risk of a subsequent fast instruction being able to source from the slow instruction's output register several cycles ahead of the writeback.
    The architecture has various other places where it does not interlock, and the wait counts do not control for within-VALU dependences. Rather, the vestigial references to a VALUCNT in old docs may point to a time where the possibility was brought up but discarded. The more straightforward method that seems consistent with the ISA is that the architecture won't issue until the prior instruction has completed for these longer-duration instructions.

    It's been some time since Kepler, but my recollection is that the impression of AMD's architectures consuming more general purpose FLOPs in mixed-use scenarios goes at least as far back as Tahiti, and possibly Cayman. (edit: VLIW5 had an AMD FLOP vs Nvidia FLOP debate as well.) The question was whether AMD's chip would have enough spare FLOPs to overcome the impact of the higher-cost special function instructions.
    Other than perhaps Fermi's hobbled start, the impression with the VLIW GPUs was that AMD FLOPs weren't as meaningful for graphics as Nvidia FLOPs, and that's mostly held true for GCN.

    The exemplar image in the patent at least doesn't draw sufficient paths in the operand delivery from the register file, with just 4 reads overall for the ALUs and vector IO. If this is combined with the bandwidth from the destination operand cache, the ALU section sees a possible peak of 6 operands sufficient for 2 FMAs. It doesn't seem unreasonable to consider this close enough to 2x peak, given that many CPUs have needed the bypass network to compensate for a register file with too few ports for all the ALUs, and Nvidia's operand reuse cache does compensate for cases where its vector register bandwidth cannot be fully used.

    One wrinkle to this going from the patent is that the operand cache interjects itself in the way of the forwarding network to the vector IO bus, potentially requiring some extra tracking or wait states since the cache does not feed into the bus used by the ALU and IO sections. It's a local ALU bus or write to the register file, so an export or memory read dependent on an operand in the cache may force an immediate writeback or require some additional checks of the mapped register list. There are some existing short wait states for some register hazards like this already, though this cache may make for longer explicit delays without pipeline interlocking.

    If I recall correctly, VLIW4 is where the T-unit was broken up and the special-function elements distributed among the remaining four ALUs. An operation would cascade from one lane to the next over four cycles, with successive approximations or lookups occurring each time. GCN's lane orientation flipped things by 90 degrees, but it's possible that what it's doing for special instructions is from that lineage. The quad-based arrangement and 4-way crossbar available between the 4 ALUs in a quad for some instructions may fit with GCN acting like VLIW4. Otherwise, every lane would need the full complement of lookup tables and miscellaneous hardware, incurring an area cost while not realizing potentially significantly higher throughput if a full transcendental unit were in every lane.
    Like the patent's side ALUs, the VLIW5 T-unit didn't have a corresponding set of hookups into the operand network, requiring unused operand cycles or shared operands with neighboring ALU slots. Unlike the T-unit, the side ALUs lack a multiplier and so cannot on their own perform complex operations. They're less generalist than the units that preceded them. Instead, it seems like the patent has two core ALUs with FMA capability, and then in some more complex scenario the side ALU can pair with one of them to perform instructions that require a full ALU.
     
    #53 3dilettante, Jan 17, 2019
    Last edited: Jan 17, 2019
    w0lfram and vipa899 like this.
  14. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,135
    Likes Received:
    2,935
    Location:
    Well within 3d
    If there's a citation for the rumors of a Navi respin, perhaps the Navi thread could use it. However, aggressive changes to improve clock scaling are not what respins are for. That would be a re-implementation or core revision.
     
    iMacmatician, ToTTenTranz and BRiT like this.
  15. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    702
    Likes Received:
    588
    Location:
    55°38′33″ N, 37°28′37″ E
    I would expect Navi to replace Polaris, but Radeon VII as it stands is a higher Enthusiast-level - so if high-end Navi happens, it would rather stand at 'Vega 40' (which never materialized) or would replace Vega 56.

    'Replace' meaning 'to offer a similar or higher performance at the same or lower price'.

    Six-month interval would be too early to introduce a completely new graphics architecture. The cycle of updates has been on a much slower pace in the last 3 years, not just for AMD but also for Nvidia.

    I'd think Navi would be both a new/updated architecture and a discrete mid-range 7 nm chip implementing this architecture.
    Thus Arcturus would be an implementation of Navi (or post-Navi) architecture for the high-end and enthusiast levels, using big caches and HBM3 memory, and won't be available in low or mid-end parts.

    Polaris => Navi
    Vega => Arcturus

    Arcturus coming 12 months after Navi? On a 5 nm EUV node? Nah, you've gotta be kidding...:cool2:
     
    #55 DmitryKo, Jan 17, 2019
    Last edited: Jan 17, 2019
  16. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    10,037
    Likes Received:
    4,615
    Arcturus actually was planned to release in late 2019 back in roadmap slides from mid-2017 (back then called "Next Gen"). It's Navi that's awfully late.
    If Navi and Arcturus are being developed by 2 distinct teams in parallel, it's not impossible that the second isn't as late as the first. Besides, Arcturus in mid 2020 is already a 6 month delay from the initially planned.

    As for 5nm, both Samsung and TSMC are planning risk production in mid 2019 with high volume production early 2020.
    It all depends on how much volume the smartphone companies will require, considering the market is slowing down significantly.

    Regardless, there's no indication of Arcturus being 5nm. That was pure and super optimistic speculation on my part, as I mentioned.
     
  17. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    702
    Likes Received:
    588
    Location:
    55°38′33″ N, 37°28′37″ E
    I would love to see AMD getting back on the tracks of predictable timeframes, so I could finally unsubscribe from all these 'AMD Execution [201x]' threads. Right now we have yet another 9-month delay for Navi though. So I will believe it when I see it.
     
  18. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    10,037
    Likes Received:
    4,615
    9 month from when?
    Navi was supposed to have an early 2018 release.
    I. e. it was initially planned as a 10nm chip at best, 14nm at worst.
     
  19. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    702
    Likes Received:
    588
    Location:
    55°38′33″ N, 37°28′37″ E
    From now, if E3 announcement rumor is true - I'd expect them to announce September or October availability.
     
    McHuj likes this.
  20. Frenetic Pony

    Regular Newcomer

    Joined:
    Nov 12, 2011
    Messages:
    339
    Likes Received:
    89
    Sounds legit at least. TSMC's 7nm+ is already under production, I think AMD are waiting for the generation after that of 7nm (7nm C I think it's called?) that has even more EUV layers (almost as many as 5nm), allowing faster and cheaper production and cheaper tapeout without having to learn a new process. IE is better for large chips.

    And of course Computex reveals are nothing new, that seems far more likely than E3. Are we getting 2 GPUs or 1? The scalability for 1 GPU to go between 15 watt laptop up to a 2070 competitor seems suspect. Shouldn't it be 2 GPUs, one at 20CUs and the other at 40 or something like that?

    As for Arcturus, I'd not expect that till AMD said it, 2021. Next year will, maybe, see 5nm for smartphone SOCs and such, and perhaps a super expensive corporate card like the Vega ML cards. More relevantly for Arcturus I'd expect whatever the PS5 chip is to be some sort of early release half Navi half Arcturus like architecture, just like the PS4 Pro had FP16 support before Vega came out.

    IE Whatever Arcturus is, we'll probably get a preview next years E3 or whenever the PS5 preview will be.
     
    #60 Frenetic Pony, Jan 17, 2019
    Last edited: Jan 18, 2019
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...