AMD: Navi Speculation, Rumours and Discussion

Discussion in 'Architecture and Products' started by Jawed, Mar 23, 2016.

Tags:
  1. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    7,787
    Likes Received:
    1,510
    Location:
    Finland
    I rather skip comparisons to judge how good/bad a company is at designing 7nm chips if there isn't a proper comparison point.
    Since no-one will probably do direct shrinks or such, one should just wait 'till there's other 7nm GPUs and chips and see how their transistor densities and performance turn out compared to their older chips.
     
  2. del42sa

    Newcomer

    Joined:
    Jun 29, 2017
    Messages:
    27
    Likes Received:
    15
    It´s kind of obvious, that NAVI would be smaller than Vega20. The question is how much ? Main disadvantage persist , if it's still GCN based....
     
  3. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    15,375
    Likes Received:
    4,283
    That assumes that they only added things to the chip with new transistors and didn't change/modify any existing stuff. For the FP64 support they likely had to modify the existing design beyond just adding more transistors.

    Regards,
    SB
     
  4. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    7,787
    Likes Received:
    1,510
    Location:
    Finland
    I'm not talking about die size, I said densities and performance.
    Also considering that AMD said TSMC 7nm would offer them about around 25% (and up) better performance at same power, Vega 20 having 2 extra memory controllers and twice the memory pool which adds it's own extra power consumption - that 20% seems pretty spot on to what the process shrink should give it compared to Vega 10 on theoretical FLOPS since the base architecture is same with some added extras for HPC
     
    w0lfram and no-X like this.
  5. DeeJayBump

    Joined:
    Aug 14, 2017
    Messages:
    2
    Likes Received:
    1
    Minor clarification:

    Jim/AdoredTV said that these Navi GPUs as well as the Ryzen 3000 CPUs discussed would be announced at CES rather than Computex.
     
    ToTTenTranz likes this.
  6. Nemo

    Newcomer

    Joined:
    Sep 15, 2012
    Messages:
    123
    Likes Received:
    23
    NAVI10LITE : GFX1000 (aka Navi 12?)
    NAVI10 : GFX1010

    [​IMG]
     
    iMacmatician, Heinrich4 and Lightman like this.
  7. del42sa

    Newcomer

    Joined:
    Jun 29, 2017
    Messages:
    27
    Likes Received:
    15
    by performance, you mean clock ? Because I don´t really think AMD will go past 4096SP and I don´t think they will use more than 4 Shader Engine or more than 64 ROPs, so only way they can squeze some more performance is by having higher clocked GPU unless unicorn drivers with Primitive Shader path becomes a real deal..... GCN is awfull in regard of power consumption versus clock scaling. Vega20 with 300W TDP only prove this. Frankly I don´t see much headroom for clocking above Vega 20, but hey we can always dream, can´t we ?
     
  8. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    7,787
    Likes Received:
    1,510
    Location:
    Finland
    Yes, obviously. And they didn't talk about Vega specifically, but just what the TSMC process can offer over their current process. Around 25% (and up) more performance (clocks) at same power on same chip. Vega 20 isn't same chip, it has additional stuff on the memory side and changes inside the architecture which could increase consumption and it still manages to offer bit over 20% more performance (clocks) at same power.
     
  9. eastmen

    Legend Subscriber

    Joined:
    Mar 17, 2008
    Messages:
    9,876
    Likes Received:
    1,430
    the 2070 performance at $250 would make me jump for an upgrade over my vega 56 imo. I doubt those rumors are true however
     
  10. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    7,946
    Likes Received:
    2,370
    Location:
    Well within 3d
    One slide (second picture in the first set of slides in https://www.computerbase.de/2018-11/amd-radeon-instinct-mi60/) had an artist's rendition of MI25 and MI60.
    Pixel counting is rough going by a picture of a projected presentation, but my google-fu is a bit weak on finding a direct reference. However, there's been a rough correlation in area in the pictographs versus actual die shots historically.
    The core GPU area (CUs, L2, front ends, ROPs) for MI25 is about 75% of the area, whereas MI60's core GPU area appears to be a little over half of its die.
    The additional IO and supporting fabric/controllers appear to have a much higher proportion of the die at 7nm.

    The MI60's core GPU area appears to be about half that of the MI25, and the ratio of areas of the representations seem to be proportional to their announced die size differences (with healthy error bars).
    If we assume that the core GPU area for both is the dominant contributor to their transistor counts, it seems like that part of the GPU scaled more in line with AMD's current density scaling claims.

    As for what goes in that wide ring of non-GPU in MI60, there's things like the various controllers, HBCC, and the infinity fabric mesh. The HBCC and memory section is hefty, and it may be a major contributor to the big swaths of "nothing" in the MI60 drawing on the left and right between the HBM PHY. In Vega, the region AMD indicated was the fabric was a minor but visible strip of silicon along the bottom of the GPU section, below the ROPs. Area wise, that strip was maybe roughly half the area of the RBEs, with some unknown number of blocks on the side with the PCIe and other interfaces potentially part of it. MI60 has "dark" strips going all around it now, given that there is twice the memory transported overall and on both sides of the GPU. Then there's xGMI on one side which would have its own stops on the die at significant fractions of the GPU's bandwidth.
    That's a mesh scaled to 1 TB bandwidth, and it's composed of wires and buffers in a region that may not have scaled that well. The likely IF blocks in Zen are a non-trivial contributor, if the block of rectangles in the center of the die correspond to the crossbar setup that supports DDR bandwidths 1-2 orders of magnitude lower than MI60.

    Looking at Fiji, it doesn't have such an obvious section of the die devoted to its interconnect, so while the fabric can give many benefits, I think area isn't one of them.

    edit: From 7:00 onwards in the following presentation, there's an exchange covering the not 2x area scaling where the statement was that not all areas of the chip had that higher density. This seems to be the case for the silicon all around the GPU core.
     
  11. Tkumpathenurpahl

    Regular Newcomer

    Joined:
    Apr 3, 2016
    Messages:
    803
    Likes Received:
    545
    Could that be reason enough reason to decouple the "core GPU" from the I/O?

    Since the I/O doesn't scale well, keep sourcing it from GF on 14nm, cheaply. Couple it with the core GPU's from expensive 7nm wafers. In principal, that seems like pretty good utilisation of established supply lines, especially when legally bound to GF purchases.

    Everything I've read about chiplets has entirely revolved around a multi chiplet setup, but that's not necessarily what we'll get right off the bat. That being said, would 1 chiplet per GPU be beneficial in terms of manufacturing costs?
     
  12. Samwell

    Newcomer

    Joined:
    Dec 23, 2011
    Messages:
    103
    Likes Received:
    110
    I don't think single die with io die makes sense. With 8 CPUs you have a single Infinity fabric with 100gb/s per chiplet. That's not too much and not too hard to make. I/O Die has 8 Infinity fabrics (800Gb/s) and Ram Interface (Should be around 200 Gb/s). So in total 1 Tb/s. But let's think about a vega class gpu, 256 Bit, 14-16 gbps, 448-512 Gb/s. So the IO die needs again 1 tb/s bandwidth, 500 Gb/s infinty fabric and 500 Gb/s Ram interface. Additionally the GPU also needs this 500Gb/s Infinity Fabric to communicate with the io die and get the full bandwidth. the infinity fabric for mcm communication is much smaller than a Ram interface, but still i don't think that you gained much.
     
    Tkumpathenurpahl likes this.
  13. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    7,946
    Likes Received:
    2,370
    Location:
    Well within 3d
    The non-GPU area in this case includes the controllers and data fabric for DRAM channels with at least 1 TB/s bandwidth. Wherever the dividing line is between the chiplet and its support silicon, there's going to be a set of link controllers at the near and far end, and pad area on both sides. The area taken up by the PHY and controllers is part of what doesn't scale, and every extra off-die connection puts that area on the GPU die and IO die.
    I don't have a good handle on how much of the Zen die is the inter-die link and controller, but I tried pixel counting a die shot and got something like ~.008 for one link, which may be 1-2mm2 each.
    Bandwidth-wise, that link can support ~21-25 GB/s worth of memory bandwidth.
    25 GB/s per link would need to scale to >1000 GB/s of GPU bandwidth, since it is 40 or so times too small.
    Power-wise, the current link tech needs to improve significantly. IF is 2pJ/bit x 8 bits per byte x 2^40 bytes/s, so ~18 W in one direction. The internal paths usually have the same bandwidth in both directions, so 36W are lost just moving data between dies.
    Then, there still needs to be a fabric of some kind on the GPU die to route the data, and that's also part of the area that didn't scale.

    This is likely one reason why AMD's future plans usually have GPU chiplets as part of a stack. The HBM is above them and they sit on top of an active interposer.
    For one thing, die to die stacking allows much tighter pad pitch, and much lower power per connection. Also, stacking allows the active interposer to host a good part of that on-die fabric that has to remain on the GPU in the proposed 2-die solution.
    For designs that require more than one chiplet, the interposer also reduces the burden on the GPU because each new die in the memory network multiples the number of links and their area/power cost.

    In effect, the IO die in Rome is something of an intermediate step to being an active interposer. It's hosting the memory and IO, and likely a decent mesh or some other network to link all the clients. There's a 1:1 link to each chiplet, which at least in terms of system topology is similar to what they'd look like if the chiplet were mounted on top of the IO die.

    I feel that current and near future link technologies are not yet at the scale needed to satisfy the higher demands of a GPU, and there are greater challenges and questions regarding stacking and active interposers.
     
  14. Frenetic Pony

    Regular Newcomer

    Joined:
    Nov 12, 2011
    Messages:
    261
    Likes Received:
    60
    And there's the "much cheaper than Nvidia" that AMD needed to do, at least as a rumor.

    If they're using 20 wavefronts per shader engine, and that Navi 10 die is two of these, that would've put the equivalent selling point of a Vega 40 at around $325 (compared to Vega 64). Since shrinking doesn't do anything for cost these days we can discount the change in manufacturing node. Meaning they must have cut 20%+ of the cost from Vega to Navi to sell Navi 10 at a profit.

    We can assume a not insignificant part of that is ditching HBM. No interposer cost, no waiting on memory manufacturers gladly charging through the nose. But even then it still means AMD has managed to cut a portion of their silicon out per wavefront.

    Estimating from the performance front, Vega 64 +15% that's an astonishing 84% increase in performance per wavefront that's being mooted. Even with taking full advantage of 7nm's projected 25% increase in clockspeed, you get a very surprising 47% increase in efficiency. That's just an increase in efficiency per wavefront, not counting the supposed smaller silicon footprint of each one.

    So the cost reduction seems reasonable if AMD hasn't added more silicon for features at all. Or AMD has made one of the most astonishing generational advancements in GPU efficiency in ages. Or the rumor is false. I'm almost leaning towards the latter one, at least in part. The mooted combination of nicely lower cost and vastly higher efficiency at the same time seems too large, I'd not be surprised if this 3080 or whatever ends up costing $300 instead. But hell maybe I'll be surprised around CES, which is all of a month away.

    Edited stuff due to bad first paragraph reference, Vega 64 is $500, not $600, dummy me. Caffeine is not a substitute for sleep kids!
     
    #874 Frenetic Pony, Dec 6, 2018
    Last edited: Dec 7, 2018
  15. w0lfram

    Newcomer

    Joined:
    Aug 7, 2017
    Messages:
    63
    Likes Received:
    13
    Navi is "nextgen" uArch, not CGN.



    I believe that these "Radeon RX 3000 series", is not Vega20 (ie: v2.0), but Vega12 (ie:v1.2).

    Therefore, if true, the RX 3k series is a reworked Vega 64 style uarch, shrunk down to 7nm. With the end-users receiving all the cost/performance/heat advantages of the new node process. Vega 64 +15% @ 150w is exactly the metrics that AMD showed in their 7nm slides of Vega20. I find this move by AMD entirely plausible and almost exactly what Dr Su has been telling us all along.

    Additional thoughts on this rumor, is if..... the Vega1.(2) version has reworked Infinity Fabric, having multiple GPUs might be a thing again for gamers & streamers. Either way, hitting the sub-4k market with a sub $300 powerhouse is grabbing max mindshare.... for when Navi hits and gamers want more & move to 4k.


    I believe Navi will be the high-end gaming GPU coming in about 9 months time. (They will double the 3k series gaming performance & then some.)
     
  16. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    7,787
    Likes Received:
    1,510
    Location:
    Finland
    Absolutely nothing AMD has communicated indicates Navi would be something else than GCN.
     
    no-X and Lightman like this.
  17. w0lfram

    Newcomer

    Joined:
    Aug 7, 2017
    Messages:
    63
    Likes Received:
    13
    AMD suggested that perhaps, their higher-end GPUs in late 2019, might include a "nextgen" release. (navi)
     
  18. Azhrei

    Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    11
    Likes Received:
    6
    This doesn't line up with their slides, though they did mention "Nexgen Memory" with Navi in another -

    [​IMG]

    [​IMG]
     
  19. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,427
    Likes Received:
    257
    I don't know where this Shanghai vs. Markham thing comes from. There's no such competition.
     
    Lightman likes this.
  20. yuri

    Newcomer

    Joined:
    Jun 2, 2010
    Messages:
    126
    Likes Received:
    89
    AMD has shown that VEGA 20@~1.2GHz would have pulled 50% power compared to VEGA 10@~1.2GHz.

    Vega64 + 15% leads to 1.8GHz frequency, therefore not 150W.
    [​IMG]
     

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...