AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Vega users running regular radeon gaming driver:

Do you have a HBCC slider in your radeon control panel --> global settings window? What about a compute optimization setting?

I have neither, and have never had either, not with any driver version, either signed WHQL or unsigned.
 
Vega users running regular radeon gaming driver:

Do you have a HBCC slider in your radeon control panel --> global settings window? What about a compute optimization setting?

I have neither, and have never had either, not with any driver version, either signed WHQL or unsigned.
yes i have HBCC slider once i enabled HBCC, compute optimization is "automatic" on Vega, there isn't a option only on early gpu's do you get that.
 
So where did you enable HBCC then...? I'm not seeing the option anywhere.


AMD-Radeon-Settings-HBCC-Toggle.png
 
Why would anyone today make a high-end professional-only GPU with only 64 CUs? That makes no sense at all IMO.
Maybe they've noticed that even in professional field their bottlenecks are usually elsewhere than raw number crunching?
It's also first 7nm chip they're making so it's likely you don't want to go pushing straight to the process limits and instead keep the chip as small as possible
 
I think it's quite simple. The main problem of Vega is power consumption. The simplest and fastest solution is a shift to more advanced manufacturing process. Just like R600-RV670. With the wide memory bus it could be quite interesting cryptomining solution.
 
I think it's quite simple. The main problem of Vega is power consumption. The simplest and fastest solution is a shift to more advanced manufacturing process. Just like R600-RV670. With the wide memory bus it could be quite interesting cryptomining solution.

Power consumption isn't Vega problem, it being a middle of the road nothing card is. The card sips power at or below 1500mhz, going above that gets crazy high real quick. That seems to be a Samsung Process ( See A9 on LPE, Zen on LPP , M3 on 10 LPP) , the next problem is it seems to be bottlenecked by everything but ALU, it needs more rop more front end. Im sure the whole new geometry pipeline was supposed to address that, but now they are were they are.

So right now i am playing FFXV @ 3840X1024 with almost all settings on highest + turf and have mins around 40 and average around 50 while consuming around the 170watt range, clocks in the 1450 to 1550 clock range .

One of the interesting things i have noticed with Vega is the games that it runs the worst often result in the highest clocks, which says major bottleneck to me......


edit:That said 7nm might give it some extra legs, but given the general expectation of GF 7nm being 6months behind TSMC, NV might already have 7nm cards out before them.
 
I think it's quite simple. The main problem of Vega is power consumption. The simplest and fastest solution is a shift to more advanced manufacturing process. Just like R600-RV670. With the wide memory bus it could be quite interesting cryptomining solution.
No the main Prolbem is that features which should make the card mor efficient are not working like intended. The hole Frontend is broke. It's much worth than polaris.

That mean primitive shaders have dedicated hardware on the chip?
I hope i get an answer about this. I wish AMD will speek a little bit more about the hidden features. More transperancy at this point will be nice. Maybe they should hand it over to the open source community.
 
Power consumption isn't Vega problem, it being a middle of the road nothing card is. The card sips power at or below 1500mhz, going above that gets crazy high real quick. That seems to be a Samsung Process ( See A9 on LPE, Zen on LPP , M3 on 10 LPP) , the next problem is it seems to be bottlenecked by everything but ALU, it needs more rop more front end. Im sure the whole new geometry pipeline was supposed to address that, but now they are were they are.

So right now i am playing FFXV @ 3840X1024 with almost all settings on highest + turf and have mins around 40 and average around 50 while consuming around the 170watt range, clocks in the 1450 to 1550 clock range .

One of the interesting things i have noticed with Vega is the games that it runs the worst often result in the highest clocks, which says major bottleneck to me......


edit:That said 7nm might give it some extra legs, but given the general expectation of GF 7nm being 6months behind TSMC, NV might already have 7nm cards out before them.
Unfortunately the only way to measure accurately the GPU is via a scope or similar setup with the GPU isolated.
Just to expand.
The desktop software tools do not necessarily measure beyond the GPU core which importantly would miss power leakage/loss nor does it necessarily measure all aspects of the VRM.
Measuring from the outlet is also not really ideal with cheaper more generic watt meters (if after accurate reading of a component but handy for a ball-park figure) as one has to contend with modern switched mode PSUs and high crest factor/power factor correction, while separately measure accurately RMS for current/voltage and true power (none of the general devices go into the level of detail required in the spec sheet) with regards to power factor.
Further compounding this is the very fast and dynamic nature of power management these days with GPUs (in the ms and lower) and how resolution or the sample-window is used/granularity.
Not even all the lower entry Fluke clamps have the required level of accuracy-functionality when analysing between PSU and mains, definitely preferable to do between GPU and PSU.

Taken awhile but I found an application note that summarises it quite nicely (does not go into EE) : http://www.programmablepower.com/Ap...nderstanding_AC_Power_Source_Measurements.pdf

That said Vega is not bad but more ok in terms of power demand comes back to whether Vega56 or Vega64 and voltage-clocks-power set and would had been good for AMD if it came to market earlier; Fiji was better with regards to its power envelope and very close-competitive to Maxwell generally.
 
Last edited:
Why would anyone today make a high-end professional-only GPU with only 64 CUs? That makes no sense at all IMO.

Was the amount of CUs ever a definitive metric for performance?
Does the RX580 with 36 CUs have worse performance than the R9 290 with 40 CUs?


AMD's criticism on their architecture ever since GCN was introduced has been about finding ways to occupy their CUs or increase their frequency, not trying to increase their amount.
A Vega 56 has pretty much 100% the performance of a Vega 64 at ISO core and memory clocks. Save for some very specific workloads, the Vega 64 has 8 (or more) CUs that are pretty much wasting die area. I guess the primitive shaders, TrueAudio Next and other implementations were attempts at making use of that extra compute power without creating a large impact on VRAM bandwidth, but that's another story.

A Vega 20 with 64 CUs at sustained ~1.85GHz will have higher FP32/64 throughput than the Mezzanine V100.
Sure, Vega 10's efficiency is terrible at anything above ~1.4GHz, but Vega 20 is 7nm and that could (should!) make a substantial difference in how high they can clock the core at reasonable power levels, and the quad-HBM2 will put its bandwidth on par with V100.
 
Was the amount of CUs ever a definitive metric for performance?
Does the RX580 with 36 CUs have worse performance than the R9 290 with 40 CUs?


AMD's criticism on their architecture ever since GCN was introduced has been about finding ways to occupy their CUs or increase their frequency, not trying to increase their amount.
A Vega 56 has pretty much 100% the performance of a Vega 64 at ISO core and memory clocks. Save for some very specific workloads, the Vega 64 has 8 (or more) CUs that are pretty much wasting die area. I guess the primitive shaders, TrueAudio Next and other implementations were attempts at making use of that extra compute power without creating a large impact on VRAM bandwidth, but that's another story.

A Vega 20 with 64 CUs at sustained ~1.85GHz will have higher FP32/64 throughput than the Mezzanine V100.
Sure, Vega 10's efficiency is terrible at anything above ~1.4GHz, but Vega 20 is 7nm and that could (should!) make a substantial difference in how high they can clock the core at reasonable power levels, and the quad-HBM2 will put its bandwidth on par with V100.
I guess depends if talking about gaming or compute.
It is if one like AMD and Nvidia do reporting theoretical peak TFLOPs FP32 compute (then one also has to be able to sustain the spec boost clocks), take a look at Vega FE/Pro WX9100 figures; the number of cores correlates to CUs for AMD while for Nvidia it is SM, and cores are what makes up your compute performance.

With the Vega20 being a compute rather than gaming GPU then probably needs to be looked at traditionally; yeah prosumer blurs this boundary as we see with TitanV but it is an inefficient GPU for gaming due to so many wasted functions-cores.
Unfortunately it is a bit early to know just how much AMD can find in sustainable core clock gains going to 7nm early for an HPC GPU (it will not have the same leeway as a gaming GPU as seen with Vega WX9100); could be 10% or 25% on top of that.
What can also be important is how much they can improve their process at a silicon-fab level.

As a reference the Vega Pro WX9100 64CU is reported as 12.3 TFLOPs FP32, suggesting peak core clocks of 1.5GHz but it also has official guaranteed clock down to 1.2GHz, somewhere in between will be the real sustained rate and theoretical TFLOPs.
And yes one also needs to look at this for Nvidia as well.

To me Vega20 is designed primarily more to compete at FP64 against P100/V100; FP32 is also part of that of course but AMD has been out of the DP HPC segment for awhile when they used to be very strong in it.
 
Last edited:
Was the amount of CUs ever a definitive metric for performance?
For a compute part? Absolutely, it was, and is.

Does the RX580 with 36 CUs have worse performance than the R9 290 with 40 CUs?
No, the OpenCL geekbench score is slightly higher for the RX 580 (127k vs 123k). Factor in clock speeds and the number of CUs becomes are pretty good proxy.

AMD's criticism on their architecture ever since GCN was introduced has been about finding ways to occupy their CUs or increase their frequency, not trying to increase their amount.
Yes, they are terrible at keeping the CUs fed with graphics workloads.

But that’s irrelevant for something like Vega 20, which is shooting for machine learning and FP64 workloads.

I just don’t see how it helps AMD to make a me-too product that will basically match what’s already currently out there today.
 
Last edited:
edit:That said 7nm might give it some extra legs, but given the general expectation of GF 7nm being 6months behind TSMC, NV might already have 7nm cards out before them.
Except that AMD will use TSMC 7nm as well, the question is just is Vega 20 going TSMC or GloFo.
https://www.anandtech.com/show/1231...n-exclusive-interview-with-dr-lisa-su-amd-ceo
Q18: With GlobalFoundries 14nm, it was a licensed Samsung process, and 12nm is an advancement of that. 7nm is more of a pure GF design. Is there any change in the relationship as a result?

LS: So in 7nm, we will use both TSMC and GlobalFoundries. We are working closely with both foundry partners, and will have different product lines for each. I am very confident that the process technology will be stable and capable for what we’re trying to do.
 
Maybe they've noticed that even in professional field their bottlenecks are usually elsewhere than raw number crunching?
In that case, Vega 20 seems to be the perfect candidate to finally fix those bottlenecks.

It's also first 7nm chip they're making so it's likely you don't want to go pushing straight to the process limits and instead keep the chip as small as possible
I agree with that for a gaming GPU, but not for something that should compete in the $10k market. See also 16nm and P100.

In the best possible case, a 64 CU 7nm will have a short lived competitive lead against V100 if AMD releases Vega 20 earlier than Nvidia’s first 7nm chip, if it has something like tensor cores, and if it has competitive software, none of which are a given.

Worst case, it’s behind and dead in the water the moment it’s released, just like Vega 10 was for machine learning.
 
In that case, Vega 20 seems to be the perfect candidate to finally fix those bottlenecks.
What bottlenecks does it address though? Memory bandwidth, memory capacity, and FP64 throughput seem to be the only changes relative to Vega 64. Some routing changes to critical paths would be the biggest potential gain relative to previous Vegas. Voltage could be artificially high to overcome a single bad path. Increasing power consumption in areas that don't really need it. The CUs for example probably don't need high clockspeeds to maintain throughput, but get pulled higher to keep geometry throughput high.
 
Back
Top