AMD: Navi Speculation, Rumours and Discussion [2017-2018]

Discussion in 'Architecture and Products' started by Jawed, Mar 23, 2016.

Tags:
Thread Status:
Not open for further replies.
  1. Bondrewd

    Regular Newcomer

    Joined:
    Sep 16, 2017
    Messages:
    520
    Likes Received:
    239
    More R&D money allows to launch more dies at once, or optimize designs better.
    Money can't save dead architectures; Otellini's Intel had all the money in the world yet it didn't save Larrabee. At all.
     
  2. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,996
    Likes Received:
    4,570
    Yays:

    1 - Only relevant info we have about Navi is "scalability" and "next-gen memory". Next-gen memory can only be HBM3, HBM Low-Cost or GDDR6.
    2 - AMD has been making a fuss about Infinity Fabric and its scalability
    3 - AMD has been making a fuss about ending the pursuit of monolithic chips in favor of several smaller chips in a single substrate because it's a whole lot cheaper to make overall
    4 - Vega already has Infinity Fabric in it with no good given reason, so they could be testing the waters for implementing a high-speed inter-GPU bus.
    5 - AMD doesn't have the R&D manpower and execution capatility to release 4 distinct graphics chips every 18 months, so this could be their only chance at competing with nvidia on several fronts.



    Nays:

    1 - Infinity Fabric in Threadripper's/EPYC's current form doesn't provide enough bandwidth for a multi-chip GPU.
    2 - No official news or leaks about Navi have ever appeared that suggest it's a multi-chip solution.
    3 - Multi-chip GPU is probably really hard to make, and some like to think AMD doesn't do hard things. Ever.
    4 - nvidia released a paper describing a multi-GPU interconnect that would be faster and consume less power-per-transferred-bit than Infinity Fabric, and some people think this is grounds for nvidia being the first in the market with a multi-chip GPU. Meaning erm.. Navi can't be first.





    Main difference being that Intel can afford to take huge risks that become failures like Larrabee and other crazy projects like SOFIA's Atom SoCs with (old and low-end) ARM GPUs.. and still make tens of billions every year surpassing their own records YoY.

    But it's not like everything is lost for Knights Landing. There's a benchmark in the monero benchmark database claiming the Xeon Phi 7210 does 2770 H/s on 215W. And its price is awfully close to a Vega 56 nowadays
    ;)

    We're talking >10KH/s on a 700W rig.
     
    #342 ToTTenTranz, Dec 30, 2017
    Last edited: Dec 31, 2017
    T1beriu, Jawed and Grall like this.
  3. Infinisearch

    Veteran Regular

    Joined:
    Jul 22, 2004
    Messages:
    739
    Likes Received:
    139
    Location:
    USA
    I sort of remember this, where is this from exactly?

    I think if AMD goes multi-chiplet with Navi it will only be for a Titan/x80ti competitor. Hopefully everything below will be single chip.
     
  4. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,996
    Likes Received:
    4,570
    Hotchips 2017:

    [​IMG]

    I'd say 200-250mm^2, which is the range of Ryzen's Summit Ridge (213mm^2) and Polaris 10 (233mm^2).

    I think GPUs smaller than 200mm^2 are bound to go inside APUs eventually, at least on AMD's side.
     
  5. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,798
    Likes Received:
    2,056
    Location:
    Germany
    Did anyone think about Navi being a family of chips, just scaling from low end to high end? In contrast to Vega and Fiji, which did not scale down.
     
  6. CaptainGinger

    Newcomer

    Joined:
    Feb 28, 2004
    Messages:
    92
    Likes Received:
    47
    If Vega isn't scalable downwards what is going into Ryzen based APU's?
     
    DrYesterday and ToTTenTranz like this.
  7. Rootax

    Veteran Newcomer

    Joined:
    Jan 2, 2006
    Messages:
    1,178
    Likes Received:
    581
    Location:
    France
    Yes, but scaling by multiplying dies/cores...? Seems cheaper that way if they can make it work ?
     
    ToTTenTranz likes this.
  8. Picao84

    Veteran Regular

    Joined:
    Feb 15, 2010
    Messages:
    1,555
    Likes Received:
    699
    Are we seeing history repeating itself? First HBCC as a distant cousin of Turbo Cache / Hyper Memory.

    Now maybe a cousin of the Voodoo days of multiplying dies, this time on a substrate instead of on the card itself.

    Eerie. Even more for AMD since both techs died an ugly dead.
     
    DavidGraham and Grall like this.
  9. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,798
    Likes Received:
    2,056
    Location:
    Germany
    Something without HBM2?
     
  10. Bondrewd

    Regular Newcomer

    Joined:
    Sep 16, 2017
    Messages:
    520
    Likes Received:
    239
    Ehm, isn't Fiji a double Tonga?
    And Vega is going down, it's already inside APUs and smaller dies are incoming.
     
  11. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,798
    Likes Received:
    2,056
    Location:
    Germany
    Tonga had different MC, different VP. Not a scale-down of Fiji (or the other way around). What's inside the APUs uses (right now) not HBM2.
     
  12. Infinisearch

    Veteran Regular

    Joined:
    Jul 22, 2004
    Messages:
    739
    Likes Received:
    139
    Location:
    USA
    Wasn't there something more specific to graphics? Maybe a quote from someone or the other?
     
  13. Bondrewd

    Regular Newcomer

    Joined:
    Sep 16, 2017
    Messages:
    520
    Likes Received:
    239
    Different IMC does not drastically alter a GPU uArch.
     
    _cat and CaptainGinger like this.
  14. BoMbY

    Newcomer

    Joined:
    Aug 31, 2017
    Messages:
    68
    Likes Received:
    31
    That's an interesting patent, I wonder if that's for Navi:

    System and method for using virtual vector register files:

    [​IMG]
     
    Nemo likes this.
  15. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    While I agree that binned rasterisation is a task that would be perfect for the base die of a PIM module ...

    ... vertex data is spread across all memory channels. There's no way to avoid having communication amongst PIMs in this case. And, to be frank, vertex data (pre-tessellation) is not a huge bandwidth monster.
     
  16. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,184
    Likes Received:
    1,841
    Location:
    Finland
    You need to decide whether you're talking about just the gfx-portion or the whole chip (which leads to tons of different generations spawning from nowhere due miniscule differences)
    If we go by "GCN-generations", which limits itself to the actual GPU-part instead of whole chip, Tonga and Fiji are the same, all Vegas are the same etc - AMD has the capability to use whichever memory controller they see best fit for each chip*, they could do Polaris+HBM and it would still be Polaris, they could do Vega+GDDR and it would still be Vega, just like Vega+shared controller on the APU is still Vega.

    *and apparently infinity fabric made this just a lot easier, even though they had the capability before too
     
    _cat and CaptainGinger like this.
  17. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    Huh? That's exactly what monolithic GPUs are, right now. Each stage of the graphics pipeline consists of compute followed by some kind of sorting/filtering for the next stage to work on. So in simple terms vertices are shaded, they're assembled (sorted) into triangles and some some are culled, they're rasterised and filtered for visibility and sorted into quads of fragments, they're shaded and then they're sorted to preserve triangle-ordering (if required) and filtered (for visibility) for blending with the render target.

    The extra latency amongst chiplets would require that between-stage buffers (queues) are larger so that they can handle variations in throughput.

    So if pipeline stage B can consume 0.25 units of work from Stage A per clock and Stage A does 0.5, 0.25 or 0.125 units of work per clock, you might give Stage A a 2 unit buffer. This buffer would then handle the situation where A spends 8 clocks producing one unit of work, provided that there were 2 units of work already in the buffer. Obviously if A takes longer than 8 clocks, then B will end up idling. The design will be balanced for "typical" usage, not the extremes. Sometimes A will be forced to pause because B says I can't take it, you're going too fast for me, my buffer's full!

    So, now put A and B into two separate chiplets with a 2 clock delay for work from A to B. So that's the equivalent of A producing one unit of work when it's fastest. So you could add one unit of work to B's buffer: it now buffers 3 units of work from A. Alternatively you might say the average throughput of A is 0.25, so B needs to buffer 2 more units of work. So as long as there is variation in throughput, sometimes faster than average and sometimes slower, then A will on average have the same throughput in the MCM or monolithic designs.

    You would simulate the variations in throughput of A to determine the size of the buffer in the monolithic chip. And for an MCM you would add latency into the simulation. To complicate the simulation there might be variable latency amongst the chiplets (one hop or two?) and it's likely that B isn't constant throughput. But the latter affects the monolithic design too.

    The end result is that increased buffering is one of the costs of MCM versus monolithic.
     
    DavidGraham likes this.
  18. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,996
    Likes Received:
    4,570
    Raven Ridge's embedded Vega 11 is a scaled down Vega 10 and doesn't use HBM2.
    I honestly don't get your question.
     
    _cat likes this.
  19. Geeforcer

    Geeforcer Harmlessly Evil
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,297
    Likes Received:
    464
    Yes, AMD has been running with two diverging lines families filling out the product stack for several years now, primarily as a function to their HBM commitment:
    Hawaii (and family) + Fiji
    Polaris + Vega

    Of course, the rest of the Vega stack could end up being true too-to-bottom family.

    As for Navi... The “scalability” (provided that that is still the key design imperative, things had an ample opportunity to change) could really mean anything. It was first discussed in the context of multiple chips and DX12 explicit multi adapter; it was not until EPYC unveiling that the thinking has shifted to MCM design; in reality it could be referring to highly modular, diverse IC libraries allowing for easy composition of designs to address various markets, with CU, special purpose units like tensor, etc being almost plug-and-play.
     
  20. Geeforcer

    Geeforcer Harmlessly Evil
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,297
    Likes Received:
    464
    I feel like HMB/interposer use is much more of an inflection point and the “choosing whichever controller they see fit” really undersells the amount of effort requiring current generations of AMD chips to use memory type over the other. I guess we will see how the Vega family plays out.
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...