AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

french toast · Oct 10, 2016

Razor1 said:
We will find out soon enough with the 1050 coming out of Samsung. The overclock on the 1050 seem pretty damn good, from leaks coming out of people who have the card, overclocking them to 1900 seems easy for them to do. I'm thinking its not a node thing based on that. I just want to see where it max's out and at what voltage and then compare that to what TSMC's 16nm Pascal cards can do max to get a clear picture of what each process is capable of.

Interesting point being, currently Polaris and Pascal even on different fabs/processes both max out around the same voltage....

Remember that's Samsung not global foundries broken implementation, I wouldn't be at all surprised if Samsung own more mature Lpp process is a good 10% (or more) better than glofo Lpp in a variety of metrics, but we just don't know as of yet.

CSI PC · Oct 10, 2016

french toast said:
...

Regardless, there is two interesting things that can be taken and that is over clocks, to my knowledge 1475mhz is the highest over clock on air and no modifications to the vrms that I have seen, the other is temps which look much better than on other over clocked AIB cards, some of that may be due to the open nature of the bench as tranz pointed out but it can't be all down to that.

Yea for all we know xfx could have sent him a golden sample to generate some hype, we need to see a wider review sample.

Yeah nice catch with the 1475MHz and that I think is the top I have seen on air, using and fluctuating between 1.17V and 1.18V.
Will be interesting to see how often this occurs for other cards.
Cheers

Deleted member 2197 · Oct 10, 2016

The (real) power consumption values of AMDs Polaris and nVidias Pascal
http://www.3dcenter.org/news/die-realen-stromverbrauchswerte-von-amds-polaris-und-nvidias-pascal

CSI PC · Oct 10, 2016

french toast said:
Thanks, I have seen some of tech power up reviews but was not impressed with the game choices and especially the version of API being used to come to conclusions, as I'm sure your well aware, the game choice (old/modern/dx11/12/Vulcan/game works/amd equivalent) can affect the conclusions of reviews in a dramatic way, especially power efficiency conclusions, which with Polaris can make all the difference, under dx11 2+ year old games Polaris looks absolutely horrid compared to gtx 1060, compare modern games and next (current?) Gen APIs dx12 and Vulcan with the latest drivers and it doesn't look so bad.

What I would like to see is reviewers take a sample of say 10 -15 games mixed APIs no more than 18 months old- preferably only 12 months, excluding any outliers for either company that skew the results disproportionately, such as project cars for nvidia or hitman for amd etc.
Then they could have say a 'legacy' game section for older titles but wouldn't influence the conclusions.

Better to think Vulkan/etc influence performance/watt (ideally though you want to use something that is comparable to both AMD and Nvidia as there are games that skew it either way) rather than efficiency in terms of voltage-power demand and frequency can be part of that; that is why you probably noticed when I talked about AMD and Nvidia my context is from the perspective they generally trade blows when averaging diverse range of games.
Also comes down to not just individual game but also resolution (which I have avoided mentioning), Tom's hardware use Metro Last Light at 4k as they find in their experience it is one of the more demanding in terms of power for both AMD and Nvidia.
Cheers

french toast · Oct 10, 2016

Razor1 said:
I have to say that is pretty impressive for a rev. for a rx480 in such a short time, but I will need to see more reviews/tests to believe its for all their rx480's,

Vega, I can see some improvements to its perf/watt I just can't see greater than 2x, and that is what they will need to hit from Polaris to get to Pascal.

I don't believe Pascal is 2x more efficient than Polaris even in worst case scenarios, in modern games and with new drivers it's a lot closer than people think, don't get me wrong Pascal is more efficient in any situation,compare 2+ year old dx11 games or game works titles and yes I'll accept Polaris can be made to look like a well designed storage heater but is that realistic to draw conclusions from moving forward?

Saying that if I was building a small media centre type pc to occasionally game on I sure as hell would pick Pascal any day of the week, nvidias work with tegra has enabled them to get media playback consumption down very low, much lower than AMD, let alone idle consumption.

CSI PC · Oct 10, 2016

pharma said:
The (real) power consumption values of AMDs Polaris and nVidias Pascal
http://www.3dcenter.org/news/die-realen-stromverbrauchswerte-von-amds-polaris-und-nvidias-pascal

I would seriously stick with Tom's Hardware and PC Perspective; one has the seriously expensive equipment and technical assistance from a well respected laboratory measurement manufacturer, while the other uses the extensive experience of one of their staff being an electrical engineer with experience in the navy as a nuclear technician and also network defense.

Cheers

Razor1 · Oct 10, 2016

french toast said:
I don't believe Pascal is 2x more efficient than Polaris even in worst case scenarios, in modern games and with new drivers it's a lot closer than people think, don't get me wrong Pascal is more efficient in any situation,compare 2+ year old dx11 games or game works titles and yes I'll accept Polaris can be made to look like a well designed storage heater but is that realistic to draw conclusions from moving forward?

Saying that if I was building a small media centre type pc to occasionally game on I sure as hell would pick Pascal any day of the week, nvidias work with tegra has enabled them to get media playback consumption down very low, much lower than AMD, let alone idle consumption.

I agree, for AMD there is only one way for their perf/watt to go and that is up, possibly in newer games (Dx12, Vulkan) that could change but still nV's driver development could change that too, the question is to what degree. Currently in LLAPI applications, nV hardware tend to eat up more power which is kind of unexpected. Hard to figure out why its happening too, because its not like they are using more or less of the chip just because of the API, the API routines shouldn't affect the power usage to a degree of close to 20% in some instances. So what I'm thinking is the way the power, voltage is set up in the bios and which is controlled by the drivers, when doing dynamic clocking, is causing some issues with the over all power consumption.

Tegra's base architecture outside of its graphics is probably what is getting its target power consumption at such a low point. ARM based CPU's have always been good at conserving power. So its hard to compare across the CPU's with different architecture in this regard.

Razor1 · Oct 10, 2016

french toast said:
Remember that's Samsung not global foundries broken implementation, I wouldn't be at all surprised if Samsung own more mature Lpp process is a good 10% (or more) better than glofo Lpp in a variety of metrics, but we just don't know as of yet.

I was under the impression both GF and Samsung are sharing their experiences with the 14nm processes.....so they should be similar in maturity.

french toast · Oct 10, 2016

Razor1 said:
I was under the impression both GF and Samsung are sharing their experiences with the 14nm processes.....so they should be similar in maturity.

It's the same underlying technology (Samsung) but this is glofo we are talking about and this is their first run of 14nm finfet (production) Samsung designed the process ( stole from TSMC?) Producing a years worth of LPE before leading with LPP, I'm sure they would have some kind of short term advantage although I'm just guessing here, we have no data yet.
Global foundries don't exactly fill me with confidence going by their track record, still i might be pleasantly surprised (one day

)

Entropy · Oct 10, 2016

CSI PC said:
Sorry to drag this into an Nvidia vs AMD thing but the 480 has significantly greater power draw than Pascal.

Well the RX480 obviously has higher power draw than the GTX1060, to the tune of roughly 30-35% over. (You could turn the comparison around and suddenly the GTX1060 is only 20-25% lower in power draw. Nicer figures for AMD if you do the percentages in that direction.

)
PCper has a good measuring scheme going, and this graph largely corroborates Razor1s ballpark figure. However, as I pointed out, the RX480 is a product, based on Polaris 10, the chip. A product where AMD elected to go quite far up the voltage/frequency curve. I happen to own it, and typically run it at 10% lower voltage than stock, resulting in power draw quite close to the GTX1060 at actually improved performance. (Of course you can do the same exercise, with similar but less pronounced results with the GTX1060.) The thing is that AMD could have dropped the voltage 0.1V and let the frequency slide down a bit, and their performance/watt would have looked a lot better, but their performance/$ and the perception of the RX480 being of roughly equivalent performance as the GTX1060 would have suffered.
Pretty much as with the Nano vs. the Fury-X, with the Nano using exactly the same silicon demonstrating a difference in performance/w very similar to the GTX1060 vs the RX480.
(And note, that is the sum total of difference between the products. 0.1-0.15V. Price, performance, die size, and so on are all as close to identical as two different products could reasonably be. Does that constitute "a generation behind"? Seriously?)

This makes drawing conclusions about Vega products arguing straight scaling from RX480 doubly dubious. The three factors that I lined up that we know are applicable will all help efficiency. But how large will the aggregate effect be? And how will AMD use it in the actual products? Will they push for a particular competitive performance tier, prioritising performance/$, and positioning a hair over the competitors product in the same tier in the benchmark charts? Or will they modify those priorities in the direction of the Nano?
At this point, we can only guess. But we do know that Vega will show efficiency gains over Polaris. Straight scaling from RX480 just isn't valid, and I think we are all knowledgeable enough to realise that.

CSI PC · Oct 10, 2016

Entropy said:
Well the RX480 obviously has higher power draw than the GTX1060, to the tune of roughly 30-35% over. (You could turn the comparison around and suddenly the GTX1060 is only 20-25% lower in power draw. Nicer figures for AMD if you do the percentages in that direction. )
PCper has a good measuring scheme going, and this graph largely corroborates Razor1s ballpark figure. However, as I pointed out, the RX480 is a product, based on Polaris 10, the chip. A product where AMD elected to go quite far up the voltage/frequency curve. I happen to own it, and typically run it at 10% lower voltage than stock, resulting in power draw quite close to the GTX1060 at actually improved performance. (Of course you can do the same exercise, with similar but less pronounced results with the GTX1060.) The thing is that AMD could have dropped the voltage 0.1V and let the frequency slide down a bit, and their performance/watt would have looked a lot better, but their performance/$ and the perception of the RX480 being of roughly equivalent performance as the GTX1060 would have suffered.
Pretty much as with the Nano vs. the Fury-X, with the Nano using exactly the same silicon demonstrating a difference in performance/w very similar to the GTX1060 vs the RX480.
(And note, that is the sum total of difference between the products. 0.1-0.15V. Price, performance, die size, and so on are all as close to identical as two different products could reasonably be. Does that constitute "a generation behind"? Seriously?)
.....

Yes lets make the figure look better by using reverse to lower the %, I guess everyone should do that when their favourite manufacturer (whether AMD or Nvidia) is slower in a game

Regarding voltages/frequency curve, it depends upon the IHV OC setting including power setting they used and same can be said about custom AIB Nvidia cards, just with Nvidia you have to go through more hoops to overclock or own one of a very few card models that has a bespoke AIB BIOS (which unfortunately are not usable on anything else).
The latest MSI Afterburner makes it even easier to do on AMD, but this is still OC beyond what the card was warrantied for.

In terms of voltage it needs to be appreciated there is an optimal performance envelope with regards to the silicon-node that constrains both manufacturers to some extent, if you keep the 480 within the AMD boost spec of 1266MHz it does not use much more voltage than the 1060 at 2050MHz, that is around 1.1V.
This goes out the window with AIB partners or when personally OC because the voltage required and power drawn/thermals ramps up pretty quickly exacerbating leakage and waste energy, further compounded that AMD's dynamic power management is still not as complete as that implemented by Nvidia with Pascal (which is further evolved from Maxwell).
To get the power draw of the 480 comparable to the 1060 (with 1.1V) you would need to run the 480 at 0.8V, that is some downvolting in terms of the silicon-nodes performance window.

Which is why I can see if AMD makes good improvements with Vega as suggested with the TBP, they may end up matching Pascal, all comes down to when Volta is also released as I feel this again leapfrogs in terms of silicon-node power performance efficiency.
Cheers

Deleted member 13524 · Oct 10, 2016

Entropy said:
This makes drawing conclusions about Vega products arguing straight scaling from RX480 doubly dubious. The three factors that I lined up that we know are applicable will all help efficiency. But how large will the aggregate effect be?

As a starting point, the Fury X got a ~40% bump in performance-per-watt compared to the R9 290X due to HBM1, GCN 2 -> GCN 3 transition and being a larger chip on the same node.
Polaris 10 -> Vega 10 seems to be a rather similar transition to Hawaii -> Fiji:
- Trading GDDR5 for HBM2
- Larger chip
- GCN 4 -> GCN 5

Maybe the efficiency gains for using HBM won't be as big because Polaris 10 is only using half the VRAM chips for a 256bit bus. And although the RX480 is using GDDR5's fastest memory modules and Vega 10 will apparently only use 2 stacks of HBM2, I think the difference here will probably be smaller.

RedVi · Oct 11, 2016

If Vega x can match/beat nvidia's competing product without being factory overclocked AMD might finally have good perf/watt. If it's 5-10% below by their estimations they will probably clock it too high once again.

Just look at Fury Nano, their chips have decent perf/watt already, it's just that they lag in overall performance slightly.

To compound their image problem AIB overclocked and especially home overclocked cards are very rarely critiqued for perf/watt, so overclockability is always seen as a bonus without carrying a bad image with it. Clocking a factory device higher than ideal will get you the performance headlines you want, but also the 'high power', 'bad perf/watt' and 'bad overclocking' headlines along with it.

Silent_Buddha · Oct 11, 2016

CSI PC said:
Yes lets make the figure look better by using reverse to lower the %, I guess everyone should do that when their favourite manufacturer (whether AMD or Nvidia) is slower in a game

That wasn't his incentive as far as I could tell. What he is pointing out is that AMD with Polaris has put the 480 product significantly beyond the knee of the power curve (like Fury X) with the base voltage and frequency used for those cards. Hence the perf/watt is much worse than the chip is capable of. But it was something they chose to do to attain X level of performance which could not be achieved while staying at or below the knee of the power curve.

On the flip side Nvidia hasn't had to do that to achieve X level of performance and is able to keep the 1060 at or below the knee of the power curve.

In other words, the 1060 product is operating at closer to the optimum perf/watt point for the chip that is being used. Meanwhile the 480 product is not even close to operating at the optimum perf/watt for the chip being used.

None of that is saying that Polaris 10 is better or more power efficient than the 1060. Only that the implementation of the respective cards is different. One IHV didn't need to push the chip beyond it's optimal operating range (with regards to power) while the other did.

Another way to think of it is that Polaris 10 is being used in a product tier that is not the one best suited for the chip (at least for the majority of the chips if we assume the high voltage set is meant to salvage as many dies as possible for 480). However, AMD didn't have much of a choice as Polaris 10 and 11 were the only chips they were introducing into the market this year. Complicating that was their marketing promising that Polaris 10 would be VR capable for the masses leading up to its launch, hence setting the performance target it would have to reach regardless of whether that performance target ended up being optimal for Polaris 10.

Regards,
SB

seahawk · Oct 11, 2016

But in the end it will make not much difference, because what you gain by reducing power draw is to some extent compensated by the reduced performance. For a sensible comparison you would either need to equalize the performance and measure the power draw, or equalize the power draw and measure the performance. It is quite pointless to say that a less performing chip which is under volted and under clocked is equal in perfomance per watt to a chip performing 30% better and without any power saving optimisations.

Razor1 · Oct 11, 2016

Silent_Buddha said:
That wasn't his incentive as far as I could tell. What he is pointing out is that AMD with Polaris has put the 480 product significantly beyond the knee of the power curve (like Fury X) with the base voltage and frequency used for those cards. Hence the perf/watt is much worse than the chip is capable of. But it was something they chose to do to attain X level of performance which could not be achieved while staying at or below the knee of the power curve.

On the flip side Nvidia hasn't had to do that to achieve X level of performance and is able to keep the 1060 at or below the knee of the power curve.

In other words, the 1060 product is operating at closer to the optimum perf/watt point for the chip that is being used. Meanwhile the 480 product is not even close to operating at the optimum perf/watt for the chip being used.

None of that is saying that Polaris 10 is better or more power efficient than the 1060. Only that the implementation of the respective cards is different. One IHV didn't need to push the chip beyond it's optimal operating range (with regards to power) while the other did.

Another way to think of it is that Polaris 10 is being used in a product tier that is not the one best suited for the chip (at least for the majority of the chips if we assume the high voltage set is meant to salvage as many dies as possible for 480). However, AMD didn't have much of a choice as Polaris 10 and 11 were the only chips they were introducing into the market this year. Complicating that was their marketing promising that Polaris 10 would be VR capable for the masses leading up to its launch, hence setting the performance target it would have to reach regardless of whether that performance target ended up being optimal for Polaris 10.

Regards,
SB

Well that all comes from the design of the chip starting with the transistor layouts, if they haven't been able to do it for the past 3 gens, how can they possible just do it now. This problem is not something new or something that was because of unforeseen issues in the new node or even a problem with a new architecture (these have been modified architectures since the 7xxx series. AMD has had ample time and resources after seeing the 750 (maxwell 1) to remedy this. With Polaris which I think many expected to see better than Maxwell 2 perf/watt, I even stated I believed it would beat Maxwell 2 perf/watt handly from the information AMD stated from the first showing of Polaris.

The changes to uarch of Pascal gave it the extra clocks and changed its sweet spots for perf/watt, and they were low level changes something that took quite a bit of time (2+ years) to implement for something that nV already has had quite a bit of experience and success with.

Also if we start looking at AMD vs nV chips (without HBM involved), the sweet spot for nV chips are their performance chips, unlike for AMD which is more traditional their mid range chips, for perf/watt. nV changed the name of the game, and AMD is playing catch up.

When you start seeing things like this over and over again, you gotta start wondering is it a problem with the architecture or is AMD missing something critical for them to create those changes.

eastmen · Oct 11, 2016

Razor1 said:
Well that all comes from the design of the chip starting with the transistor layouts, if they haven't been able to do it for the past 3 gens, how can they possible just do it now. This problem is not something new or something that was because of unforeseen issues in the new node or even a problem with a new architecture (these have been modified architectures since the 7xxx series. AMD has had ample time and resources after seeing the 750 (maxwell 1) to remedy this. With Polaris which I think many expected to see better than Maxwell 2 perf/watt, I even stated I believed it would beat Maxwell 2 perf/watt handly from the information AMD stated from the first showing of Polaris.

The changes to uarch of Pascal gave it the extra clocks and changed its sweet spots for perf/watt, and they were low level changes something that took quite a bit of time (2+ years) to implement for something that nV already has had quite a bit of experience and success with.

Also if we start looking at AMD vs nV chips (without HBM involved), the sweet spot for nV chips are their performance chips, unlike for AMD which is more traditional their mid range chips, for perf/watt. nV changed the name of the game, and AMD is playing catch up.

When you start seeing things like this over and over again, you gotta start wondering is it a problem with the architecture or is AMD is missing something critical for them to create those changes.

We may just not have seen the changes AMD has implemented yet. GCN is an evolution not a revolution. Maybe Vega will be a bigger change that was brought about by Maxwell. I am sure AMD has been planning these chips for a few years now. It may take 3-4 years for a design to go from the drawing board to production. Or for all we know vega is the end of GCN and Navi is a major change. Its hard to really tell.

Entropy · Oct 11, 2016

I'm actually interested in the topic of the thread.
Could someone please talk about Vega?

Silent_Buddha · Oct 11, 2016

Entropy said:
I'm actually interested in the topic of the thread.
Could someone please talk about Vega?

Unfortunately not much to talk about with the limited rumors at hand. Hence the wild speculation by some users that it's going to be horrible, and speculation by another set of users that it's going to be fantastic.

And no one really knowing what they are talking about in relation to Vega since so little is known other than it's potentially a more radical change for AMD than anything they've released since the introduction of GCN (generation to generation not start of GCN compared to Polaris).

Regards,
SB

3dilettante · Oct 11, 2016

If you extrapolate from the introduction of R600 in 2006 to GCN in 2011, Vega would seem to be roughly where AMD is "due" for a change.

Whether the frequently-cited patent will debut with Vega is unclear. It's too broad to be clear how it necessarily fits with prior generations of GCN, while also being scant on details such as how many resources a CU as a whole gets. If it were purely based on that diagram, the ALU complement of a CU is 1/4 of what GCN currently supports without worrying about how 2 of those ALUs can get operands or register storage. There are other claims that might allow for a different mix and more units per SIMD than the drawing shows, however.

If the 4096 stream processor count is valid, it may not fit the diagram well without further architectural changes. It seems like it would be misleading to count the scalar ALUs unless they could feasibly work in concert with the SIMD units without weirdness related to not having any storage or operand paths that come with associated register files. It's not impossible, if they can somehow have storage allocated in another pool or in the other register files, but that would point to some change in banking or operand routing to do this.
One possibility is that the SIMD register files are not all of the vector register storage in the CU, and the scalar units can hit another pool. Another would be that they can poach storage and values from the vector files, with some creative banking and allocation to avoid stepping on the other files--although that might need some other work to fit into the SIMD unit's patterns.
If they cannot, then I'm not sure how the remaining SIMD resources can keep GCN's multiple of 4 cadence and batch size of 64, which the rest of the claims do not entirely dispense with.

One other notable difference is the claim for stretching a wavefront across multiple units and existence of two scalar units may not fit with the single-issue per wavefront behavior of GCN--or potentially whether the instructions being issued are purely in the vein of a 1:1 relationship between operation and instruction word, although this requires a rather specific combination of parameters to fall out. Some functions would be helped with advanced information in the encoding--such as arbitrating for SIMD and scalar units between instruction issue units, gating off lanes, figuring out register access patterns, determining when/whether to gate off lanes in a SIMD versus migrating, or whether to engage different cadences.
This is actually going to something I'm curious about how GCN is currently implemented between its instruction cache and fetch unit and the instruction buffers, such as whether there is some predecode in that process so that what is in the ISA document is not necessarily what the instruction buffer and issue stage see.

Some slight tangents below:

On the brief blurb about Magnum, my initial reaction to the idea that AMD is making its own programmable logic device is questioning why it would be compelling or if it were just a repackaging of another FPGA into a board in a manner akin to AMD's SSG placing an separate SSD on a board--with possible further integration someday.
I'm not sure what AMD could offer on its own where the speculation around Magnum makes sense versus established vendors.
Moving a little further afield, however, is if the "programmable blocks" are blocks like variable-length SIMDs, scalar units, fetch blocks, configurable forwarding networks, and instruction control blocks. At least then, AMD might have a use for hardware they can tweak more readily for their custom work in the absence of a clear way for outside parties to benefit.

On the topic of the earlier interposer/MCM speculation:
GCN already tiles things to an extent. There's already a relaxed ordering mode for rasterization, and directed tests post-VLIW4 show more variable behavior in tile output.
There are still architectural elements that might need to change to make it work, even with an interposer-level interconnect. Some items, like the compression pipeline (can be considered a cache path that can be thrashed) and CUs being able to read based on it, may not be set up to be consistent across multiple chips. There's already some requirement for coarse barriers for intra-frame modifications to delta-compressed data, so possibly that serves as an escape.
AMD's other statements on scalability might be relying on explicit multi-adapter to handle possible interactions between the chips, rather than have them try manage it. Items like the GDS and message might not scale unless Vega takes those in a different direction as well.

AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

french toast

CSI PC

Deleted member 2197

Guest

CSI PC

french toast

CSI PC

Razor1

Razor1

french toast

Entropy

CSI PC

Deleted member 13524

Guest

RedVi

Silent_Buddha

seahawk

Razor1

eastmen

Entropy

Silent_Buddha

3dilettante