AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Take a look at this link pulled from semi accurate forum of an xfs 480 GTR, the power charactistcs are a major improvement of the early rx480s, about 20w at the asic(90w)whilst running at a very stable 1288 boost, interestingly it over clocked to 1475mhz whilst only pushing 149w at the core.

Don't know whether this is a rev B as some have speculated or just some xfs implementation but that is very impressive if you ask me, even more so if it isn't even a new rev.

If some site could compare the performance per watt of this Polaris to 1060 at current up to date APIs such as dx12 and Vulcan (which arguably gcn was designed for and more relevant going forward) the perf per watt is probably neck and neck.

Back to topic, if Vega introduces a major revamped ip9 core, works on the rasterizer, glofo 14nm lpp process matures I can see perf per watt being within hair splitting distance at modern APIs from nvidia /amd.

You need to appreciate that watt power demand is only calculated from the vcore itself (current x volt) and cannot take into consideration leakage/wastage (also involves the VRM-power stage but then one could argue AMD designs theirs with better components), that is an area where AMD is losing heavily to Nvidia along with how they make the best use of the silicon-node performance envelope.
This is why AMD was very careful how they presented the power of Polaris, but interestingly with Vega it seems the talk has now moved to total board power meaning the factors such as leakage and wastage are now also part of their efficiency discussion.

There are only a few tech sites that can measure GPUs correctly and accurately in terms of power demand-draw.
Anyway IF Vega does live up to its promise with regards to TBP, that would suggest there is some notable improvements to the design and puts it on a near level playing field to Pascal (but then Volta at 16nm suggests to be shockingly good - early days yet though).
So I do think it is feasible for AMD to do this with Vega as AMD-Nvidia are not fully in synch with product development timeframe wise, the challenge will be how close they can get to Volta in terms of performance-efficiency, especially when both are on the next node after 14/16nm.

Cheers
 
Last edited:
Take a look at this link pulled from semi accurate forum of an xfs 480 GTR, the power charactistcs are a major improvement of the early rx480s, about 20w at the asic(90w)whilst running at a very stable 1288 boost, interestingly it over clocked to 1475mhz whilst only pushing 149w at the core.

Don't know whether this is a rev B as some have speculated or just some xfs implementation but that is very impressive if you ask me, even more so if it isn't even a new rev.

Found a good example for the 470 showing the disparity between measuring the vcore as say GPU-Z/etc does and actually measuring the card that takes into consideration the aspects I raised.
The two primary colours to compare are the grey and green (which does also include fan though but this would be running at optimal speeds - still need to allow for it but it would be at tops 3W), the two do not fully line up due to intervals and low pass filter, but look at the average line and GPU-Z is around 95W while the real measurement is around 145W, bear in mind that while the grey has larger swings (better interval accuracy) than green it is still well below watts used by Green.
Anyway notice the peak for GPU-Z is around 122W while the actual measurement is around 195W (with a smaller time monitoring interval this would be even higher), these are short bursts and not anything to be overly concerned about but just highlighting the measurement disparity, which JayZ2Cents would not had known using the system related monitoring when he was looking at the XFX 480.
Overclocking further exacerbates the difference and actual peak-average demand.
Just to re-iterate this is showing a custom AIB 470 rather than a 480.

01-Wattage-Gaming-Curves.png
 
Last edited:
Certainly Polaris was a rocky launch, 480 should have launched with a 8 pin power connector and a better cooler, would have improved general tech press view about it in comparison with 1060.
Performance wise you could say they are pretty comparable, dx11 1060 wins by between 5-15% (saw somewhere 6-7% average across broad selection of games minus project cars) rx480 is maybe even further ahead with next/ current? gen Dx12 Vulcan API.

Efficiency however Polaris was clearly well behind, not even close highlighted in last generation APIs like dx11, but we know global foundries 14lpp process is a first attempt and Polaris was likely a pipe cleaner, ignoring the rumours of Sony setting the specifications of the chip for ps4 pro it is likely that Polaris architecture is significantly better than it ended up looking on first run at glofo,

Take a look at this link pulled from semi accurate forum of an xfs 480 GTR, the power charactistcs are a major improvement of the early rx480s, about 20w at the asic(90w)whilst running at a very stable 1288 boost, interestingly it over clocked to 1475mhz whilst only pushing 149w at the core.

Don't know whether this is a rev B as some have speculated or just some xfs implementation but that is very impressive if you ask me, even more so if it isn't even a new rev.

If some site could compare the performance per watt of this Polaris to 1060 at current up to date APIs such as dx12 and Vulcan (which arguably gcn was designed for and more relevant going forward) the perf per watt is probably neck and neck.

Back to topic, if Vega introduces a major revamped ip9 core, works on the rasterizer, glofo 14nm lpp process matures I can see perf per watt being within hair splitting distance at modern APIs from nvidia /amd.


I have to say that is pretty impressive for a rev. for a rx480 in such a short time, but I will need to see more reviews/tests to believe its for all their rx480's,

Vega, I can see some improvements to its perf/watt I just can't see greater than 2x, and that is what they will need to hit from Polaris to get to Pascal.
 
My comments on the multi-GPU on interposer:

- Tiled rasterizers (Maxwell/Pascal) provide easy independent chunks of work at relatively coarse granularity. Tile has tens of thousands of pixels (+potential overdraw), so it takes considerable time to process.
- All primitives rendered to the same region are grouped to the same tile. There's no need to fine grained synchronization.
- Tiles could be output to a shared cache (like Intel eDRAM on the same interposer) and combined there. As long as the coarse tile order is obeyed, the result is correct.
- As long as the buffers for tiled geometry are long enough, tile rendering & combining latency is not going to be a problem. Each batch has N completely separate tiles, and doesn't need internal synchronization.
- Let's assume (worst case) that the tile combining takes as much time as rendering the tile. As long as you have enough tiles on each batch to fill all GPUs twice, there will never be any stalls. Tiles can always be ordered in a way that the previous tile result (same location) is finished and combined to memory (z-buffering, blending, etc works properly).

Multi GPU on interposed could work (for rasterizing). Big shared cache on the interposer (eDRAM) would be a great place to store the binned (tile) triangles and the tile pixels. But this wouldn't be as energy efficient as having one bigger die. Off-chip data transfers are significantly more expensive (even on interposer). It would still beat the current mGPU implementations by a mile :)


That is a possibility , but that won't get rid of mGPU for rendering (SFR/AFR methods) and something like eDRAM is an expensive proposition for even performance segment cards, the only place that would be viable would be enthusiast level at the top end 1k + cards, and if mGPU is still there, just doesn't do anything for its sales ;)

The only way I can think of getting rid of mGPU entirely, is using edram/or some type of intermediate caching memory, but also dissociating the control silicon from the GPU as well into its own chip, and having multiple smaller chips for the rest of the GPU's components all on the same interposer.

This would actually keep the graphics pipeline relatively the same as it is now as well. But the cost again, is going to be very high!
 
Last edited:
Multi GPU on interposed could work (for rasterizing). Big shared cache on the interposer (eDRAM) would be a great place to store the binned (tile) triangles and the tile pixels. But this wouldn't be as energy efficient as having one bigger die. Off-chip data transfers are significantly more expensive (even on interposer). It would still beat the current mGPU implementations by a mile
This may still be more complicated than it needs to be. Simply use one rasterizer or asynchronous graphics. XB1 already had two Graphics Command Processors as I recall. If half the workload was compute that could keep the 2nd GPU busy and avoid cache issues as well. Linking scheduling units shouldn't be that difficult to track synchronization. Data transfers across an interposer as opposed to PCIE bus would be far faster and more efficient than any other option.
 
Vega, I can see some improvements to its perf/watt I just can't see greater than 2x, and that is what they will need to hit from Polaris to get to Pascal.
RX 480 was clocked a bit too high. They had to increase the voltage above the sweet spot too. Some reviewers downvolted their RX 480 and it both increased performance and reduced power load quite significantly. It seems that AMD did a last minute change to RX 480 voltage and clocks. This slighly increased performance, but reduced perf per watt and pushed them over the PCI-E spec. Or maybe they had yield problems with the new process and increased the voltage to be sure to hit their targets.

The new embedded Polaris based Radeons have tiny reductions in clock rate, but HUGE reductions in TDP. If this is the real perf per watt of Polaris, the situation is much better for AMD.
 
RX 480 was clocked a bit too high. They had to increase the voltage above the sweet spot too. Some reviewers downvolted their RX 480 and it both increased performance and reduced power load quite significantly. It seems that AMD did a last minute change to RX 480 voltage and clocks. This slighly increased performance, but reduced perf per watt and pushed them over the PCI-E spec. Or maybe they had yield problems with the new process and increased the voltage to be sure to hit their targets.

The new embedded Polaris based Radeons have tiny reductions in clock rate, but HUGE reductions in TDP. If this is the real perf per watt of Polaris, the situation is much better for AMD.
It is all relative and unfortunately when it is compared to Pascal the situation is much different.
Reducing the clock and voltage (think that is around 0.8V) of a 480 to just under 900 MHz uses about 3W less than 1060 at 2050MHz full boost (about 1.09V) when gaming.
The point being AMD is still not achieving the best out of the silicon/node from its performance window perspective (voltage-frequency-performance) while also still having challenges around leakage/waste energy/power density.

I am postive about Vega if some of the rumours stack up and it was specifically AMD that initiated the narrative around Typical Board Power for Vega being 225W, something they deliberately fudged with Polaris IMO with regards to actual performance efficiency.

Cheers
 
Last edited:
It is all relative and unfortunately when it is compared to Pascal the situation is much different.
Reducing the clock and voltage (think that is around 0.8V) of a 480 to just under 900 MHz uses about 3W less than 1060 at 2050MHz full boost (about 1.09V) when gaming.
The point being AMD is still not achieving the best out of the silicon/node from its performance window perspective (voltage-frequency-watts) while also still having challenges around leakage/waste energy/power density.

I am postive about Vega if some of the rumours stack up and it was specifically AMD that initiated the narrative around Typical Board Power for Vega being 225W, something they deliberately fudged with Polaris IMO with regards to actual performance efficiency.
Polaris had typical board power of 150W. The new embedded E9550 (MXM) has practically identical specs as RX 480, but the TDP is down to 95W. I wonder whether it is a new improved stepping and/or using cherry picked dies and/or whether it throttles more to meet the significantly lower TDP target.

Also it's worth noting that AMDs geometry improvements in Polaris will help bigger GPUs significantly more. Fury X was heavily bound by geometry pipeline. The extra cost will certainly bring more perf to bigger GPUs (bigger perf per watt gain).
 
Polaris had typical board power of 150W. The new embedded E9550 (MXM) has practically identical specs as RX 480, but the TDP is down to 95W. I wonder whether it is a new improved stepping and/or using cherry picked dies and/or whether it throttles more to meet the significantly lower TDP target.

Also it's worth noting that AMDs geometry improvements in Polaris will help bigger GPUs significantly more. Fury X was heavily bound by geometry pipeline. The extra cost will certainly bring more perf to bigger GPUs (bigger perf per watt gain).
The TBP of 480 ended up being 165W, initially reported with a TDP of 150W.
I would not necessarily compare the embedded to the discrete products as arent these historically based upon the mobile parts with greater restrictions?
Looking at previous embedded MXM models they all have lower TDP to the mobile part, but yeah interesting that so far we have no laptop Polaris model that correlates to the E95550 just yet - before posting was trying to see if any info was around.

If it manages to sustain 5.5TFLOPs while maintaining discrete GPU level of clocks with a TDP of 95W and same functionality I would be impressed and surprised.
CHeers
 
The point being AMD is still not achieving the best out of the silicon/node from its performance window perspective (voltage-frequency-performance) while also still having challenges around leakage/waste energy/power density.
Difficult to know if that's an architecture or process issue though. The fact that we've seen cards with a significant overclock would suggest process. Unless they fab some elsewhere it will be difficult to know for sure. The embedded and mobile variations would be interesting to study to see just how they lowered the consumption. Assuming it's not just a lower thermal limit.

I wonder whether it is a new improved stepping and/or using cherry picked dies and/or whether it throttles more to meet the significantly lower TDP target.
I wondered previously about memory voltages being a concern. Could be a signaling issue where higher VRAM voltages keep the core voltage artificially high. No luck on specifics with some googling for specs. They also had that power circuitry that was initially disabled, it's possible that got fixed recently. Although we should see those changes hitting the 480s unless there was a new stepping.
 
I would not necessarily compare that to the discrete products as arent these historically based upon the mobile parts with greater restrictions?
Looking at previous embedded MXM models they all have lower TDP to the mobile part, but yeah interesting that so far we have no laptop Polaris model that correlates to the E95550 just yet - before posting was trying to see if any info was around.

If it manages to sustain 5.5TFLOPs while maintaining discrete GPU level of clocks with a TDP of 95W and same functionality I would be impressed and surprised.
CHeers

So far the only I have heard, laptop that has a Polaris GPU is Dell *alienware* and that is a rx470
 
Difficult to know if that's an architecture or process issue though. The fact that we've seen cards with a significant overclock would suggest process. Unless they fab some elsewhere it will be difficult to know for sure. The embedded and mobile variations would be interesting to study to see just how they lowered the consumption. Assuming it's not just a lower thermal limit.

We will find out soon enough with the 1050 coming out of Samsung. The overclock on the 1050 seem pretty damn good, from leaks coming out of people who have the card, overclocking them to 1900 seems easy for them to do. I'm thinking its not a node thing based on that. I just want to see where it max's out and at what voltage and then compare that to what TSMC's 16nm Pascal cards can do max to get a clear picture of what each process is capable of.

Interesting point being, currently Polaris and Pascal even on different fabs/processes both max out around the same voltage....
 
Last edited:
Difficult to know if that's an architecture or process issue though. The fact that we've seen cards with a significant overclock would suggest process...
Can you link one of those significant overclocks that is within 1.1V and also under 1.25V?
I guess it can come down to the interpretation of significant overclock relative to its voltage and power demand.
Remember context was discussing performance-efficiency, which is still playing catchup to Pascal.
I am in the camp that Vega could be a big step in the right direction.
Cheers
 
Last edited:
Those are indeed some very impressive results. The card isn't using the reference PCB either, BTW. Looks like the RX 480 to get, perhaps even better than the Sapphire Nitro+.

Though one has to wonder if the lower power consumption isn't at least partially due to the lower temperatures that come from the card being used in an open case. At least in that video review.
Yea it does look positive, I'm not a fan of open benchmarking but I accept that's reviewers have limited time, I prefer Anandtech which benches from a case as I recall, nevertheless I don't think it would skew the results that much especially if card comparisons are from the same setup, speaking of which when I have time I will try to look through his other 480 reviews to see if he has performed the same test.
 
Yea it does look positive, I'm not a fan of open benchmarking but I accept that's reviewers have limited time, I prefer Anandtech which benches from a case as I recall, nevertheless I don't think it would skew the results that much especially if card comparisons are from the same setup, speaking of which when I have time I will try to look through his other 480 reviews to see if he has performed the same test.
If you are interested in seeing best and most accurate power measurements for GPUs I would look at Tom's Hardware (.com and .de),PC Perspective, Techpowerup, Hardware.fr
Tom's and PC Perspective have the best approach-measurement capturing IMO, but those others are good as well.
Cheers
 
You need to appreciate that watt power demand is only calculated from the vcore itself (current x volt) and cannot take into consideration leakage/wastage (also involves the VRM-power stage but then one could argue AMD designs theirs with better components), that is an area where AMD is losing heavily to Nvidia along with how they make the best use of the silicon-node performance envelope.
This is why AMD was very careful how they presented the power of Polaris, but interestingly with Vega it seems the talk has now moved to total board power meaning the factors such as leakage and wastage are now also part of their efficiency discussion.

There are only a few tech sites that can measure GPUs correctly and accurately in terms of power demand-draw.
Anyway IF Vega does live up to its promise with regards to TBP, that would suggest there is some notable improvements to the design and puts it on a near level playing field to Pascal (but then Volta at 16nm suggests to be shockingly good - early days yet though).
So I do think it is feasible for AMD to do this with Vega as AMD-Nvidia are not fully in synch with product development timeframe wise, the challenge will be how close they can get to Volta in terms of performance-efficiency, especially when both are on the next node after 14/16nm.

Cheers
You make a valid point, using a calculation to grab this data is not 100% accurate, the jay briefly mentioned such without the reasoning you explained, however we can use the data only when when comparing other 480s in that test and on his open setup, I have only had time to look at one of his videos which didn't seem to have the same test, in any case we wouldn't be sure of the same settings unless he specifically mentioned so, his videos are very brief, we only have his word which he says the same test draws significantly more power on other rx480s he tested.

Regardless, there is two interesting things that can be taken and that is over clocks, to my knowledge 1475mhz is the highest over clock on air and no modifications to the vrms that I have seen, the other is temps which look much better than on other over clocked AIB cards, some of that may be due to the open nature of the bench as tranz pointed out but it can't be all down to that.

I have to say that is pretty impressive for a rev. for a rx480 in such a short time, but I will need to see more reviews/tests to believe its for all their rx480's,

Vega, I can see some improvements to its perf/watt I just can't see greater than 2x, and that is what they will need to hit from Polaris to get to Pascal.
Yea for all we know xfx could have sent him a golden sample to generate some hype, we need to see a wider review sample.
 
Can you link one of those significant overclocks that is within 1.1V and also under 1.25V?

Wouldn't change the fact that most of the earlier (reviewer) cards would hardly go above 1.3GHz if at all, with or without voltage adjustments, whereas now we're seeing cards going above 1.45GHz.
Non-reference PCB could also be a factor, but wasn't the reference PCB filled with very expensive and high-quality components?

Maybe with the launch of Vega 10 we'll also see new Polaris cards with new chip revisions going for 1.45GHz, the RX 4x5.
 
We will find out soon enough with the 1050 coming out of Samsung. The overclock on the 1050 seem pretty damn good, from leaks coming out of people who have the card, overclocking them to 1900 seems easy for them to do. I'm thinking its not a node thing based on that. I just want to see where it max's out and at what voltage and then compare that to what TSMC's 16nm Pascal cards can do max to get a clear picture of what each process is capable of.

Interesting point being, currently Polaris and Pascal even on different fabs/processes both max out around the same voltage....
As the node shrinks you end up with power density/reliability-resiliance/etc issues.
Case in point Pascal cannot take as much punishment as Maxwell in terms of extreme overclocking and setting voltages well outside the spec of the node; previously you could go as far as 1.5V but now it is even risky at 1.3V, and risk being in context of what extreme OCers talk about not normal enthusiasts :)

Cheers
 
Wouldn't change the fact that most of the earlier (reviewer) cards would hardly go above 1.3GHz if at all, with or without voltage adjustments, whereas now we're seeing cards going above 1.45GHz.
Non-reference PCB could also be a factor, but wasn't the reference PCB filled with very expensive and high-quality components?

Maybe with the launch of Vega 10 we'll also see new Polaris cards with new chip revisions going for 1.45GHz, the RX 4x5.
The issue is more complex though.
You could say the same about early models of Pascal, both were using blowers/ restricting power delivery and takes time to work out what one can do especially with the tools and both using a newer boost-power management system for controlling voltage-frequency-temp.
So after some time both can now be abused from their overclocking spec (AMD rates theirs to 1.2V while Nvidia has said nothing and down to the AIB to take this on but it is probably 1.15V to 1.2V) and due to knowledge (such as how extreme OCers bypass the limitations on Pascal design) and what is available (couple of Bioses now available for both manufacturers on certain models, along with the software tools albeit working for AMD cards).
Realistically though you want to look at the real-world overclock that is possible at 1.1V and 1.2V-1.25V.
To date I still have not seen anything that makes me think either Polaris or Pascal really stands out when operating beyond their optimum performance window, context includes voltage-power demand.

Cheers
 
If you are interested in seeing best and most accurate power measurements for GPUs I would look at Tom's Hardware (.com and .de),PC Perspective, Techpowerup, Hardware.fr
Tom's and PC Perspective have the best approach-measurement capturing IMO, but those others are good as well.
Cheers
Thanks, I have seen some of tech power up reviews but was not impressed with the game choices and especially the version of API being used to come to conclusions, as I'm sure your well aware, the game choice (old/modern/dx11/12/Vulcan/game works/amd equivalent) can affect the conclusions of reviews in a dramatic way, especially power efficiency conclusions, which with Polaris can make all the difference, under dx11 2+ year old games Polaris looks absolutely horrid compared to gtx 1060, compare modern games and next (current?) Gen APIs dx12 and Vulcan with the latest drivers and it doesn't look so bad.

What I would like to see is reviewers take a sample of say 10 -15 games mixed APIs no more than 18 months old- preferably only 12 months, excluding any outliers for either company that skew the results disproportionately, such as project cars for nvidia or hitman for amd etc.
Then they could have say a 'legacy' game section for older titles but wouldn't influence the conclusions.
 
Back
Top