AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Discussion in 'Architecture and Products' started by ToTTenTranz, Sep 20, 2016.

  1. french toast

    Veteran

    Joined:
    Jan 5, 2012
    Messages:
    1,667
    Likes Received:
    9
    Location:
    Leicestershire - England
    Remember that's Samsung not global foundries broken implementation, I wouldn't be at all surprised if Samsung own more mature Lpp process is a good 10% (or more) better than glofo Lpp in a variety of metrics, but we just don't know as of yet.
     
  2. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    Yeah nice catch with the 1475MHz and that I think is the top I have seen on air, using and fluctuating between 1.17V and 1.18V.
    Will be interesting to see how often this occurs for other cards.
    Cheers
     
    french toast likes this.
  3. pharma

    Veteran Regular

    Joined:
    Mar 29, 2004
    Messages:
    3,016
    Likes Received:
    1,694
  4. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    Better to think Vulkan/etc influence performance/watt (ideally though you want to use something that is comparable to both AMD and Nvidia as there are games that skew it either way) rather than efficiency in terms of voltage-power demand and frequency can be part of that; that is why you probably noticed when I talked about AMD and Nvidia my context is from the perspective they generally trade blows when averaging diverse range of games.
    Also comes down to not just individual game but also resolution (which I have avoided mentioning), Tom's hardware use Metro Last Light at 4k as they find in their experience it is one of the more demanding in terms of power for both AMD and Nvidia.
    Cheers
     
    french toast likes this.
  5. french toast

    Veteran

    Joined:
    Jan 5, 2012
    Messages:
    1,667
    Likes Received:
    9
    Location:
    Leicestershire - England
    I don't believe Pascal is 2x more efficient than Polaris even in worst case scenarios, in modern games and with new drivers it's a lot closer than people think, don't get me wrong Pascal is more efficient in any situation,compare 2+ year old dx11 games or game works titles and yes I'll accept Polaris can be made to look like a well designed storage heater but is that realistic to draw conclusions from moving forward?

    Saying that if I was building a small media centre type pc to occasionally game on I sure as hell would pick Pascal any day of the week, nvidias work with tegra has enabled them to get media playback consumption down very low, much lower than AMD, let alone idle consumption.
     
  6. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    I would seriously stick with Tom's Hardware and PC Perspective; one has the seriously expensive equipment and technical assistance from a well respected laboratory measurement manufacturer, while the other uses the extensive experience of one of their staff being an electrical engineer with experience in the navy as a nuclear technician and also network defense.

    Cheers
     
    pharma, Razor1 and upnorthsox like this.
  7. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY

    I agree, for AMD there is only one way for their perf/watt to go and that is up, possibly in newer games (Dx12, Vulkan) that could change but still nV's driver development could change that too, the question is to what degree. Currently in LLAPI applications, nV hardware tend to eat up more power which is kind of unexpected. Hard to figure out why its happening too, because its not like they are using more or less of the chip just because of the API, the API routines shouldn't affect the power usage to a degree of close to 20% in some instances. So what I'm thinking is the way the power, voltage is set up in the bios and which is controlled by the drivers, when doing dynamic clocking, is causing some issues with the over all power consumption.

    Tegra's base architecture outside of its graphics is probably what is getting its target power consumption at such a low point. ARM based CPU's have always been good at conserving power. So its hard to compare across the CPU's with different architecture in this regard.
     
    pharma and french toast like this.
  8. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY

    I was under the impression both GF and Samsung are sharing their experiences with the 14nm processes.....so they should be similar in maturity.
     
  9. french toast

    Veteran

    Joined:
    Jan 5, 2012
    Messages:
    1,667
    Likes Received:
    9
    Location:
    Leicestershire - England
    It's the same underlying technology (Samsung) but this is glofo we are talking about and this is their first run of 14nm finfet (production) Samsung designed the process ( stole from TSMC?) Producing a years worth of LPE before leading with LPP, I'm sure they would have some kind of short term advantage although I'm just guessing here, we have no data yet.
    Global foundries don't exactly fill me with confidence going by their track record, still i might be pleasantly surprised (one day:) )
     
  10. Entropy

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,075
    Likes Received:
    1,039
    Well the RX480 obviously has higher power draw than the GTX1060, to the tune of roughly 30-35% over. (You could turn the comparison around and suddenly the GTX1060 is only 20-25% lower in power draw. Nicer figures for AMD if you do the percentages in that direction. :wink:)
    PCper has a good measuring scheme going, and this graph largely corroborates Razor1s ballpark figure. However, as I pointed out, the RX480 is a product, based on Polaris 10, the chip. A product where AMD elected to go quite far up the voltage/frequency curve. I happen to own it, and typically run it at 10% lower voltage than stock, resulting in power draw quite close to the GTX1060 at actually improved performance. (Of course you can do the same exercise, with similar but less pronounced results with the GTX1060.) The thing is that AMD could have dropped the voltage 0.1V and let the frequency slide down a bit, and their performance/watt would have looked a lot better, but their performance/$ and the perception of the RX480 being of roughly equivalent performance as the GTX1060 would have suffered.
    Pretty much as with the Nano vs. the Fury-X, with the Nano using exactly the same silicon demonstrating a difference in performance/w very similar to the GTX1060 vs the RX480.
    (And note, that is the sum total of difference between the products. 0.1-0.15V. Price, performance, die size, and so on are all as close to identical as two different products could reasonably be. Does that constitute "a generation behind"? Seriously?)

    This makes drawing conclusions about Vega products arguing straight scaling from RX480 doubly dubious. The three factors that I lined up that we know are applicable will all help efficiency. But how large will the aggregate effect be? And how will AMD use it in the actual products? Will they push for a particular competitive performance tier, prioritising performance/$, and positioning a hair over the competitors product in the same tier in the benchmark charts? Or will they modify those priorities in the direction of the Nano?
    At this point, we can only guess. But we do know that Vega will show efficiency gains over Polaris. Straight scaling from RX480 just isn't valid, and I think we are all knowledgeable enough to realise that.
     
  11. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    Yes lets make the figure look better by using reverse to lower the %, I guess everyone should do that when their favourite manufacturer (whether AMD or Nvidia) is slower in a game :)
    Regarding voltages/frequency curve, it depends upon the IHV OC setting including power setting they used and same can be said about custom AIB Nvidia cards, just with Nvidia you have to go through more hoops to overclock or own one of a very few card models that has a bespoke AIB BIOS (which unfortunately are not usable on anything else).
    The latest MSI Afterburner makes it even easier to do on AMD, but this is still OC beyond what the card was warrantied for.

    In terms of voltage it needs to be appreciated there is an optimal performance envelope with regards to the silicon-node that constrains both manufacturers to some extent, if you keep the 480 within the AMD boost spec of 1266MHz it does not use much more voltage than the 1060 at 2050MHz, that is around 1.1V.
    This goes out the window with AIB partners or when personally OC because the voltage required and power drawn/thermals ramps up pretty quickly exacerbating leakage and waste energy, further compounded that AMD's dynamic power management is still not as complete as that implemented by Nvidia with Pascal (which is further evolved from Maxwell).
    To get the power draw of the 480 comparable to the 1060 (with 1.1V) you would need to run the 480 at 0.8V, that is some downvolting in terms of the silicon-nodes performance window.

    Which is why I can see if AMD makes good improvements with Vega as suggested with the TBP, they may end up matching Pascal, all comes down to when Volta is also released as I feel this again leapfrogs in terms of silicon-node power performance efficiency.
    Cheers
     
  12. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    10,112
    Likes Received:
    4,695
    As a starting point, the Fury X got a ~40% bump in performance-per-watt compared to the R9 290X due to HBM1, GCN 2 -> GCN 3 transition and being a larger chip on the same node.
    Polaris 10 -> Vega 10 seems to be a rather similar transition to Hawaii -> Fiji:
    - Trading GDDR5 for HBM2
    - Larger chip
    - GCN 4 -> GCN 5

    Maybe the efficiency gains for using HBM won't be as big because Polaris 10 is only using half the VRAM chips for a 256bit bus. And although the RX480 is using GDDR5's fastest memory modules and Vega 10 will apparently only use 2 stacks of HBM2, I think the difference here will probably be smaller.
     
  13. RedVi

    Regular

    Joined:
    Sep 12, 2010
    Messages:
    387
    Likes Received:
    39
    Location:
    Australia
    If Vega x can match/beat nvidia's competing product without being factory overclocked AMD might finally have good perf/watt. If it's 5-10% below by their estimations they will probably clock it too high once again.

    Just look at Fury Nano, their chips have decent perf/watt already, it's just that they lag in overall performance slightly.

    To compound their image problem AIB overclocked and especially home overclocked cards are very rarely critiqued for perf/watt, so overclockability is always seen as a bonus without carrying a bad image with it. Clocking a factory device higher than ideal will get you the performance headlines you want, but also the 'high power', 'bad perf/watt' and 'bad overclocking' headlines along with it.
     
    Lightman likes this.
  14. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    16,398
    Likes Received:
    5,385
    That wasn't his incentive as far as I could tell. What he is pointing out is that AMD with Polaris has put the 480 product significantly beyond the knee of the power curve (like Fury X) with the base voltage and frequency used for those cards. Hence the perf/watt is much worse than the chip is capable of. But it was something they chose to do to attain X level of performance which could not be achieved while staying at or below the knee of the power curve.

    On the flip side Nvidia hasn't had to do that to achieve X level of performance and is able to keep the 1060 at or below the knee of the power curve.

    In other words, the 1060 product is operating at closer to the optimum perf/watt point for the chip that is being used. Meanwhile the 480 product is not even close to operating at the optimum perf/watt for the chip being used.

    None of that is saying that Polaris 10 is better or more power efficient than the 1060. Only that the implementation of the respective cards is different. One IHV didn't need to push the chip beyond it's optimal operating range (with regards to power) while the other did.

    Another way to think of it is that Polaris 10 is being used in a product tier that is not the one best suited for the chip (at least for the majority of the chips if we assume the high voltage set is meant to salvage as many dies as possible for 480). However, AMD didn't have much of a choice as Polaris 10 and 11 were the only chips they were introducing into the market this year. Complicating that was their marketing promising that Polaris 10 would be VR capable for the masses leading up to its launch, hence setting the performance target it would have to reach regardless of whether that performance target ended up being optimal for Polaris 10.

    Regards,
    SB
     
  15. seahawk

    Regular

    Joined:
    May 18, 2004
    Messages:
    511
    Likes Received:
    141
    But in the end it will make not much difference, because what you gain by reducing power draw is to some extent compensated by the reduced performance. For a sensible comparison you would either need to equalize the performance and measure the power draw, or equalize the power draw and measure the performance. It is quite pointless to say that a less performing chip which is under volted and under clocked is equal in perfomance per watt to a chip performing 30% better and without any power saving optimisations.
     
  16. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY
    Well that all comes from the design of the chip starting with the transistor layouts, if they haven't been able to do it for the past 3 gens, how can they possible just do it now. This problem is not something new or something that was because of unforeseen issues in the new node or even a problem with a new architecture (these have been modified architectures since the 7xxx series. AMD has had ample time and resources after seeing the 750 (maxwell 1) to remedy this. With Polaris which I think many expected to see better than Maxwell 2 perf/watt, I even stated I believed it would beat Maxwell 2 perf/watt handly from the information AMD stated from the first showing of Polaris.

    The changes to uarch of Pascal gave it the extra clocks and changed its sweet spots for perf/watt, and they were low level changes something that took quite a bit of time (2+ years) to implement for something that nV already has had quite a bit of experience and success with.

    Also if we start looking at AMD vs nV chips (without HBM involved), the sweet spot for nV chips are their performance chips, unlike for AMD which is more traditional their mid range chips, for perf/watt. nV changed the name of the game, and AMD is playing catch up.

    When you start seeing things like this over and over again, you gotta start wondering is it a problem with the architecture or is AMD missing something critical for them to create those changes.
     
    #236 Razor1, Oct 11, 2016
    Last edited: Oct 11, 2016
  17. eastmen

    Legend Subscriber

    Joined:
    Mar 17, 2008
    Messages:
    9,995
    Likes Received:
    1,503
    We may just not have seen the changes AMD has implemented yet. GCN is an evolution not a revolution. Maybe Vega will be a bigger change that was brought about by Maxwell. I am sure AMD has been planning these chips for a few years now. It may take 3-4 years for a design to go from the drawing board to production. Or for all we know vega is the end of GCN and Navi is a major change. Its hard to really tell.
     
  18. Entropy

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,075
    Likes Received:
    1,039
    I'm actually interested in the topic of the thread.
    Could someone please talk about Vega?
     
  19. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    16,398
    Likes Received:
    5,385
    Unfortunately not much to talk about with the limited rumors at hand. Hence the wild speculation by some users that it's going to be horrible, and speculation by another set of users that it's going to be fantastic. :D And no one really knowing what they are talking about in relation to Vega since so little is known other than it's potentially a more radical change for AMD than anything they've released since the introduction of GCN (generation to generation not start of GCN compared to Polaris).

    Regards,
    SB
     
    kalelovil likes this.
  20. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,137
    Likes Received:
    2,939
    Location:
    Well within 3d
    If you extrapolate from the introduction of R600 in 2006 to GCN in 2011, Vega would seem to be roughly where AMD is "due" for a change.

    Whether the frequently-cited patent will debut with Vega is unclear. It's too broad to be clear how it necessarily fits with prior generations of GCN, while also being scant on details such as how many resources a CU as a whole gets. If it were purely based on that diagram, the ALU complement of a CU is 1/4 of what GCN currently supports without worrying about how 2 of those ALUs can get operands or register storage. There are other claims that might allow for a different mix and more units per SIMD than the drawing shows, however.

    If the 4096 stream processor count is valid, it may not fit the diagram well without further architectural changes. It seems like it would be misleading to count the scalar ALUs unless they could feasibly work in concert with the SIMD units without weirdness related to not having any storage or operand paths that come with associated register files. It's not impossible, if they can somehow have storage allocated in another pool or in the other register files, but that would point to some change in banking or operand routing to do this.
    One possibility is that the SIMD register files are not all of the vector register storage in the CU, and the scalar units can hit another pool. Another would be that they can poach storage and values from the vector files, with some creative banking and allocation to avoid stepping on the other files--although that might need some other work to fit into the SIMD unit's patterns.
    If they cannot, then I'm not sure how the remaining SIMD resources can keep GCN's multiple of 4 cadence and batch size of 64, which the rest of the claims do not entirely dispense with.

    One other notable difference is the claim for stretching a wavefront across multiple units and existence of two scalar units may not fit with the single-issue per wavefront behavior of GCN--or potentially whether the instructions being issued are purely in the vein of a 1:1 relationship between operation and instruction word, although this requires a rather specific combination of parameters to fall out. Some functions would be helped with advanced information in the encoding--such as arbitrating for SIMD and scalar units between instruction issue units, gating off lanes, figuring out register access patterns, determining when/whether to gate off lanes in a SIMD versus migrating, or whether to engage different cadences.
    This is actually going to something I'm curious about how GCN is currently implemented between its instruction cache and fetch unit and the instruction buffers, such as whether there is some predecode in that process so that what is in the ISA document is not necessarily what the instruction buffer and issue stage see.

    Some slight tangents below:

    On the brief blurb about Magnum, my initial reaction to the idea that AMD is making its own programmable logic device is questioning why it would be compelling or if it were just a repackaging of another FPGA into a board in a manner akin to AMD's SSG placing an separate SSD on a board--with possible further integration someday.
    I'm not sure what AMD could offer on its own where the speculation around Magnum makes sense versus established vendors.
    Moving a little further afield, however, is if the "programmable blocks" are blocks like variable-length SIMDs, scalar units, fetch blocks, configurable forwarding networks, and instruction control blocks. At least then, AMD might have a use for hardware they can tweak more readily for their custom work in the absence of a clear way for outside parties to benefit.

    On the topic of the earlier interposer/MCM speculation:
    GCN already tiles things to an extent. There's already a relaxed ordering mode for rasterization, and directed tests post-VLIW4 show more variable behavior in tile output.
    There are still architectural elements that might need to change to make it work, even with an interposer-level interconnect. Some items, like the compression pipeline (can be considered a cache path that can be thrashed) and CUs being able to read based on it, may not be set up to be consistent across multiple chips. There's already some requirement for coarse barriers for intra-frame modifications to delta-compressed data, so possibly that serves as an escape.
    AMD's other statements on scalability might be relying on explicit multi-adapter to handle possible interactions between the chips, rather than have them try manage it. Items like the GDS and message might not scale unless Vega takes those in a different direction as well.
     
    Anarchist4000, Razor1 and DavidGraham like this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...