AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Discussion in 'Architecture and Products' started by ToTTenTranz, Sep 20, 2016.

  1. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,307
    Likes Received:
    3,963
    According to videocardz and fudzilla, Vega 10 will do 12 TFLOPs FP32 and 24 TFLOPs FP16, with a 512GB/s memory bandwidth. This means:

    - 64 CUs at 1.5 GHz
    - 2x FP16 rate per ALU
    - 2 stacks of 2GT/s HBM2
    - 225W TDP
    - H1 2017 (probably Q2..)

    They also claim it comes with 16GB of HBM2. This wouldn't be possible with Hynix's current portfolio because they only have 4-Hi stacks with 4GB.
    Videocardz claims these specs are coming from server leaks, so maybe the consumer version will use Hynix's current products for 8GB cards and the server versions will use yet-to-enter-production 8-Hi stacks. Though since that would require a physically higher stack they would need to at least change the heatsink's surface, I guess..?

    These clocks seem very conservative. Again, these are server chips but if the consumer cards are clocked up to 1.2GHz then I would expect a 250W TDP or more. Ignore this. I miscalculated.

    They also mention a Radeon Pro Duo replacement with 2* Vega 10 at 300W TDP coming up H2 2017.


    At the same time, they claim Vega 11 will replace Polaris 10. AMD has suggested there will be a RX 48x so maybe we're looking at that GPU. Maybe the same 36 CUs with updated GFX9 architecture and HBM memory? Maybe 1.5GHz too like the bigger brother. Or more CUs with lower clocks to lower power consumption even more?



    Lastly, there is also a Vega 20 that we haven't heard of, which is probably coming in 2018 or later, because it's using GF's 7nm. It comes with the same number of CUs as Vega 10, same "GFX 9" ISA, but now with 1TB/s bandwidth, so 4 HBM2 stacks.
    I take it that with the number of CUs and graphics architecture being equal, the clocks should be considerably higher in 7nm. Perhaps GF's 7nm will bring even higher clocks the clocks that 14FF couldn't reach. The TBP for Vega 20 is 150W and it'll bring PCI Express 4.0 support, meaning the card could work without and PCIe power cables.


    EDIT: Derp brainfart. 12TFLOPs FP32 with 64CUs would need 1.5GHz, unless GCN goes through major changes, like 96 ALUs for each CU.
     
    #1 ToTTenTranz, Sep 20, 2016
    Last edited: Sep 20, 2016
    ImSpartacus likes this.
  2. Jubei

    Regular

    Joined:
    Dec 10, 2011
    Messages:
    484
    Likes Received:
    84
    I dont see a significant perf/watt improvement for Vega, RX480 is 5,5 TF, 150 watts and 32 CUs, Vega 10 is 64 CU, 12 TF and 225 watts

    Double the CUs and you have 11 TF, increase clockspeed and you have 12 TF, add in HBM2 and you get less power draw and much more bandwith.
     
  3. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,307
    Likes Received:
    3,963
    Close to what? They didn't mention any dates for Vega 20. It could be late 2018 for all we know.
     
  4. function

    function None functional
    Legend Veteran

    Joined:
    Mar 27, 2003
    Messages:
    5,032
    Likes Received:
    2,069
    Location:
    Wrong thread
    Close to yesterday's announcement that 7 nm was next for GF.

    I hope that AMD have newer graphics architectures than Vega before late 2018. Going into 2019 competing against nvidia with something from early 2017 doesn't seem like it would work out well.

    And also ... is anyone confident that GF will be rocking 7 nm in 2018? They say risk production early 2018. When their 14 nm is up to speed, maybe I'll feel more confident.
     
  5. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,408
    Likes Received:
    172
    Location:
    Chania
    64 CUs * 64 SPs * 2 FLOPs FP32 * 1.0 GHz = 8.192 TFLOPs FP32 * 2 = 16.384 TFLOPs FP16....what am I missing? If the supplied date is true then I can only imagine a 1.5GHz frequency to get 24 TFLOPs FP16.
     
    ToTTenTranz likes this.
  6. Nemo

    Newcomer

    Joined:
    Sep 15, 2012
    Messages:
    123
    Likes Received:
    23
    Maybe V10/64CU >> 1CU/96SP -- 6144SP*2 = 12.288, 12.288*1000MHz = 12.288.000, ~12.2 TFlops. No?
     
  7. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,408
    Likes Received:
    172
    Location:
    Chania
    6* SIMD16 / CU isn't impossible theoretically, but despite a layman I could imagine that being at 4 as up to now has more advantages overall. Besides all so far Vega/Greenland related rumors, hints whatever I've seen speak either of 64 CUs or 4096 SPs, which of course doesn't have to mean anything but it shouldn't mean either that both numbers are actually correct. Theoretically nothing speaks against 96CUs@1GHz or 64CUs@1.5GHz.
     
  8. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,853
    Likes Received:
    722
    Location:
    London
    The CUs would have to be a radically new architecture (more ALUs per CU) to make this feasible. Patents referenced in the old Vega thread implied that CU architecture is changing.

    Some aspect of current GCN is keeping clocks "low" (we saw this at 28nm, too). I doubt it's the process. Although Global Foundries is generally useless, apparently, so we can't eliminate that as reason for poor clocks/power in Polaris.
     
  9. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    15,409
    Likes Received:
    4,324
    Tom's Hardware shows an average power consumption of 164 watts when gaming (Metro: Last Light). It's one of the main reasons I didn't get one despite the attractive price point. Overclocking such that it reaches 6 TF also ups power consumption to >200 watts in most cases.

    Speculation in the Vega thread also implies that things may have radically changed between Polaris and Vega.

    Assuming the rumors are true, to reach 12 TFLOPs with 64 CUs, you either need to clock at 1.5 GHz or you need significantly more ALUs per CU. Both options would require significant changes in the architecture. Polaris currently struggles to reach 1.5 GHz and power consumption spikes up drastically to do so. The overclock result that had >200 watts above was with a 1.32 GHz clock.

    There is no reason that it would be impossible for AMD to re-architect GCN (assuming it's still GCN) for greater perf/w similar to what Nvidia has done between generations.

    Or not much has changed and we'll see Project Scorpio hit 200-250 watts or more.

    Regards,
    SB
     
  10. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,307
    Likes Received:
    3,963

    AMD is definitely aware of GF's longer-term roadmaps and they've known about their plans for 7nm for quite a while. Yesterday's announcement to the general public doesn't mean much for AMD's development as they've probably known that for quite a while.

    Regarding AMD's output of different chips throughout the next couple of years, I think I remember seeing a post from @Dave Baumann stating that releasing two new chips per year is sufficient. There's Polaris 10 and Polaris 11 in 2016, Vega 10 and Vega 11 in 2017, Vega 20 and Vega 2x in 2018.
    AMD will also be releasing Zen APUs starting H2 2017. These APUs may start to assimilate the discrete GPUs' lower tiers like Polaris 11, and somehow compensate for the fewer GPU releases.
     
  11. AlBran

    AlBran Just Monika
    Moderator Legend

    Joined:
    Feb 29, 2004
    Messages:
    19,792
    Likes Received:
    4,727
    Location:
    ಠ_ಠ
    Dat Fast14
     
    no-X and swaaye like this.
  12. kalelovil

    Regular

    Joined:
    Sep 8, 2011
    Messages:
    554
    Likes Received:
    93
    It will be unusual for AMD to directly replace Polaris 10 with Vega 11 within 12 months of the Polaris release. Since GCN's launch they've favoured extended GPU lifespans and filling in the lineup gaps in alternating years.
    Perhaps it is a replacement in terms of positioning rather than absolute performance, e.g. Vega 10 becomes RX 590 / Fury RX, Vega 11 RX 580, Polaris 10 RX 570/560
     
    ToTTenTranz likes this.
  13. ImSpartacus

    Regular Newcomer

    Joined:
    Jun 30, 2015
    Messages:
    251
    Likes Received:
    199
    I think you're referencing recent news that pcie 4 would support several hundred watts through the slot (i.e. no external connectors).

    That was ultimately retracted and it looks like it'll stay at 75W.

    http://www.tomshardware.com/news/pcie-4.0-power-speed-express,32525.html

    I think that's reasonable. That's what I expected, a transition 500 with partial rebrands & repositions to make room for Vega.
     
  14. Esrever

    Regular Newcomer

    Joined:
    Feb 6, 2013
    Messages:
    585
    Likes Received:
    289
    Could Vega 20 be a server only chip since Navi should be on 7nm too. Maybe Navi will be striped of server features.
     
  15. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,307
    Likes Received:
    3,963
    Hum.. they say it's TBD so it probably won't stay at 75W but probably not reaching 300W either as otherwise there would be no need for clarification.

    Thanks for the update!


    Agreed. With 2 chips per year it doesn't look like AMD would have the luxury to completely phase out a 1 year-old chip.
    So with that option we'd have Polaris 10 with 36 CUs, Vega 10 at 64 CUs and Vega 11 at... perhaps 44 CUs like Hawaii?
     
  16. function

    function None functional
    Legend Veteran

    Joined:
    Mar 27, 2003
    Messages:
    5,032
    Likes Received:
    2,069
    Location:
    Wrong thread
    Oh I don't doubt that AMD are well aware of GF's long term plans, but someone starting a rumour or speculating might not be. If this rumour had come out a couple of days before global GFs node announcements I think it may have lent credence. Coming a day after just gets my suspicions raised, that's all.

    I suppose that could be how things are going to be. And Zen APU could be really close to the 460 if they can get clocks up and address the BW issue (I wish DDR4 was clocking at 4266 like LPDDR4 is).

    In fact ... I wonder if you could put a salvage APU on a board and sell it as an entry level GPU? CPUs with deactivated GPUs is already a really common thing, especially from behemoth Intel. I wonder if it could ever be true in reverse for some level of AMD products?
     
    #16 function, Sep 20, 2016
    Last edited: Sep 20, 2016
    ToTTenTranz likes this.
  17. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    7,982
    Likes Received:
    2,428
    Location:
    Well within 3d
    In terms of purely TDP versus peak arithmetic throughput, 12 TF at 225W relative to the RX 480's 5.8 TF at 150W would put Vega's perf/W at 1.36x that of Polaris.
    It's a bit short of AMD's older perf/W roadmap slide that gave Vega 1.5x the efficiency of Polaris, although TF doesn't equal performance and the RX 480 is not as optimistic a starting point as what AMD's marketing used in its slide.

    Vega would need to overachieve a bit, given that Polaris didn't really match up with that roadmap and that optimistic projection wouldn't have erased the competitive deficit even if it were hit.

    For reference, one of the aforementioned patents from the big AMD thread is:
    http://www.freepatentsonline.com/20160085551.pdf
    This covers a variable SIMD-width CU, with an 8, 4, and 2-wide SIMD trio in place of the customary 16-wide. If that triad is actually put in place of one SIMD, it at least would (ed: not) regress from having 4 SIMDs in the CU.
    One of the claims in the patent had the possibility that each of the smaller SIMDs could actually be 8-wide, just with selective gating.
    The chain of assumptions could give 3x8x4=96 ALUs per SIMDx64x2FLOPx1GHz=12TF.
    (edit: Missed the OP update, I'll leave the math out out here.)

    Perhaps that and HBM could make up some of the efficiency gap.

    There was a patent mentioned before about creating a tiled and binning front end with hidden surface removal built in, which might generate irregularly sized wavefronts that this ALU arrangement would cater to.
     
    #17 3dilettante, Sep 20, 2016
    Last edited: Sep 20, 2016
  18. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,283
    Location:
    Helsinki, Finland
    More ALUs per CU would be a stupid idea. AMDs CUs are already occupancy limited by register count. Nvidia halved the ALU count per SM in Pascal P100. This gives them more register space per thread and allows P100 to run complex shaders faster. AMD is already register bottlenecked in complex shaders. I would rather see AMD following Nvidia's lead than going to the opposite direction, especially as the register pressure seems to be a bigger problem for AMD.

    1.5 GHz isn't impossible for Vega. There are custom GTX 1080 models with 1.75 GHz base clock and 1.9 GHz boost clock. Maxwell (980 Ti) was only running at 1 GHz (1075 MHz boost). Nvidia achieved 75% clock improvement by the shrink in a single generation. Why couldn't AMD achieve 50% clock improvement in two generations?
     
  19. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    7,982
    Likes Received:
    2,428
    Location:
    Well within 3d
    For AMD, the vector register file capacity would scale with SIMD count and/or SIMD width, since that is how register storage is distributed in a CU. I'm not sure why it would get worse unless that relationship were changed. The number of physical entries could be scaled in the vector and scalar portions as well.

    AMD's 28nm base clock is unclear. At least initially Hawaii had cases where it showed dips down to 800-850. The 28nm consoles have a conservative clock in that range as well.
    Polaris 10's 14nm base/boost clocks do give that range of improvement. The best clock/voltage points for power/clock are measurably lower, and the boost clock or higher hits a voltage and power wall very quickly.
    Some further architectural optimization or a fix of a process problem would be needed.
     
  20. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,283
    Location:
    Helsinki, Finland
    I was talking about an architectural change (similar to Pascal P100, but in reverse direction). In current GCN architecture SIMD count and register file capacity are obviously tied.

    If you added 50% extra SIMDs and registers into a single CU, then there would be 50% more clients to the CU shared resources: 4 texture samplers, 16 KB of L1 cache and 64 KB of LDS. There would be lots of L1 trashing, occupancy would be horrible in shaders that use lots of LDS and more shaders would be sampler (filtering) bound. You could counteract these issues by having 6 texture samplers, 24 KB of L1 cache and 96 KB of LDS in each CU. However a 50% fatter CU like this would be less energy efficient as the smaller one, since the shared resources are shared with more clients. There would be more synchronization/communication overhead and longer distance to move the data. I am not convinced this is the right way to go.
     
    RootKit and Heinrich4 like this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...