AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

D

Deleted member 13524

Guest
According to videocardz and fudzilla, Vega 10 will do 12 TFLOPs FP32 and 24 TFLOPs FP16, with a 512GB/s memory bandwidth. This means:

- 64 CUs at 1.5 GHz
- 2x FP16 rate per ALU
- 2 stacks of 2GT/s HBM2
- 225W TDP
- H1 2017 (probably Q2..)

They also claim it comes with 16GB of HBM2. This wouldn't be possible with Hynix's current portfolio because they only have 4-Hi stacks with 4GB.
Videocardz claims these specs are coming from server leaks, so maybe the consumer version will use Hynix's current products for 8GB cards and the server versions will use yet-to-enter-production 8-Hi stacks. Though since that would require a physically higher stack they would need to at least change the heatsink's surface, I guess..?

These clocks seem very conservative. Again, these are server chips but if the consumer cards are clocked up to 1.2GHz then I would expect a 250W TDP or more. Ignore this. I miscalculated.

They also mention a Radeon Pro Duo replacement with 2* Vega 10 at 300W TDP coming up H2 2017.


At the same time, they claim Vega 11 will replace Polaris 10. AMD has suggested there will be a RX 48x so maybe we're looking at that GPU. Maybe the same 36 CUs with updated GFX9 architecture and HBM memory? Maybe 1.5GHz too like the bigger brother. Or more CUs with lower clocks to lower power consumption even more?



Lastly, there is also a Vega 20 that we haven't heard of, which is probably coming in 2018 or later, because it's using GF's 7nm. It comes with the same number of CUs as Vega 10, same "GFX 9" ISA, but now with 1TB/s bandwidth, so 4 HBM2 stacks.
I take it that with the number of CUs and graphics architecture being equal, the clocks should be considerably higher in 7nm. Perhaps GF's 7nm will bring even higher clocks the clocks that 14FF couldn't reach. The TBP for Vega 20 is 150W and it'll bring PCI Express 4.0 support, meaning the card could work without and PCIe power cables.


EDIT: Derp brainfart. 12TFLOPs FP32 with 64CUs would need 1.5GHz, unless GCN goes through major changes, like 96 ALUs for each CU.
 
Last edited by a moderator:
That's not the interesting thing from that article.

The piece that's perhaps most relevant to Project Scorpio.



That should indicate at least some significant change between Vega and Polaris. Otherwise AMD would just continue to use Polaris 10 in that market segment as they have done for the past few generations. For example, R9 290 and R9 390 use the same GPU.

As well considering the price bracket that Polaris targets, it's highly unlikely that HBM 2 will be used for Vega 11 (replacing Polaris 10) compared to Vega 10. Hence indicating that Vega would have no problems using some form of GDDR which will be important for Project Scorpio.

Wild speculation here. But it's possible that Vega 11 will have a significant perf/watt advantage over Polaris 10 making Polaris 10 redundant. That would also make it far more feasible for Microsoft to have a 6 TFLOP GPU in Project Scorpio without going over 200 watts. Depending on how much of an improvement there is, it may be possible to get near the power envelope of the PS4-P.

Regards,
SB

I dont see a significant perf/watt improvement for Vega, RX480 is 5,5 TF, 150 watts and 32 CUs, Vega 10 is 64 CU, 12 TF and 225 watts

Double the CUs and you have 11 TF, increase clockspeed and you have 12 TF, add in HBM2 and you get less power draw and much more bandwith.
 
That "Vega 20" info comes just a little too close to Global Foundries announcing their plans for 7 nm for me to be comfortable.

Close to what? They didn't mention any dates for Vega 20. It could be late 2018 for all we know.
 
Close to what? They didn't mention any dates for Vega 20. It could be late 2018 for all we know.

Close to yesterday's announcement that 7 nm was next for GF.

I hope that AMD have newer graphics architectures than Vega before late 2018. Going into 2019 competing against nvidia with something from early 2017 doesn't seem like it would work out well.

And also ... is anyone confident that GF will be rocking 7 nm in 2018? They say risk production early 2018. When their 14 nm is up to speed, maybe I'll feel more confident.
 
According to videocardz and fudzilla, Vega 10 will do 12 TFLOPs FP32 and 24 TFLOPs FP16, with a 512GB/s memory bandwidth. This means:

- 64 CUs at 1 GHz
- 2x FP16 rate per ALU
- 2 stacks of 2GT/s HBM2
- 225W TDP
- H1 2017 (probably Q2..)

64 CUs * 64 SPs * 2 FLOPs FP32 * 1.0 GHz = 8.192 TFLOPs FP32 * 2 = 16.384 TFLOPs FP16....what am I missing? If the supplied date is true then I can only imagine a 1.5GHz frequency to get 24 TFLOPs FP16.
 
64 CUs * 64 SPs * 2 FLOPs FP32 * 1.0 GHz = 8.192 TFLOPs FP32 * 2 = 16.384 TFLOPs FP16....what am I missing? If the supplied date is true then I can only imagine a 1.5GHz frequency to get 24 TFLOPs FP16.

Maybe V10/64CU >> 1CU/96SP -- 6144SP*2 = 12.288, 12.288*1000MHz = 12.288.000, ~12.2 TFlops. No?
 
6* SIMD16 / CU isn't impossible theoretically, but despite a layman I could imagine that being at 4 as up to now has more advantages overall. Besides all so far Vega/Greenland related rumors, hints whatever I've seen speak either of 64 CUs or 4096 SPs, which of course doesn't have to mean anything but it shouldn't mean either that both numbers are actually correct. Theoretically nothing speaks against 96CUs@1GHz or 64CUs@1.5GHz.
 
According to videocardz and fudzilla, Vega 10 will do 12 TFLOPs FP32 and 24 TFLOPs FP16, with a 512GB/s memory bandwidth. This means:

- 64 CUs at 1 GHz
- 2x FP16 rate per ALU
- 2 stacks of 2GT/s HBM2
- 225W TDP
- H1 2017 (probably Q2..)
The CUs would have to be a radically new architecture (more ALUs per CU) to make this feasible. Patents referenced in the old Vega thread implied that CU architecture is changing.

I take it that with the number of CUs and graphics architecture being equal, the clocks should be considerably higher in 7nm. Perhaps GF's 7nm will bring the clocks that 14FF couldn't reach.
Some aspect of current GCN is keeping clocks "low" (we saw this at 28nm, too). I doubt it's the process. Although Global Foundries is generally useless, apparently, so we can't eliminate that as reason for poor clocks/power in Polaris.
 
I dont see a significant perf/watt improvement for Vega, RX480 is 5,5 TF, 150 watts and 32 CUs, Vega 10 is 64 CU, 12 TF and 225 watts

Double the CUs and you have 11 TF, increase clockspeed and you have 12 TF, add in HBM2 and you get less power draw and much more bandwith.

Tom's Hardware shows an average power consumption of 164 watts when gaming (Metro: Last Light). It's one of the main reasons I didn't get one despite the attractive price point. Overclocking such that it reaches 6 TF also ups power consumption to >200 watts in most cases.

Speculation in the Vega thread also implies that things may have radically changed between Polaris and Vega.

Assuming the rumors are true, to reach 12 TFLOPs with 64 CUs, you either need to clock at 1.5 GHz or you need significantly more ALUs per CU. Both options would require significant changes in the architecture. Polaris currently struggles to reach 1.5 GHz and power consumption spikes up drastically to do so. The overclock result that had >200 watts above was with a 1.32 GHz clock.

There is no reason that it would be impossible for AMD to re-architect GCN (assuming it's still GCN) for greater perf/w similar to what Nvidia has done between generations.

Or not much has changed and we'll see Project Scorpio hit 200-250 watts or more.

Regards,
SB
 
Close to yesterday's announcement that 7 nm was next for GF.

I hope that AMD have newer graphics architectures than Vega before late 2018. Going into 2019 competing against nvidia with something from early 2017 doesn't seem like it would work out well.

And also ... is anyone confident that GF will be rocking 7 nm in 2018? They say risk production early 2018. When their 14 nm is up to speed, maybe I'll feel more confident.


AMD is definitely aware of GF's longer-term roadmaps and they've known about their plans for 7nm for quite a while. Yesterday's announcement to the general public doesn't mean much for AMD's development as they've probably known that for quite a while.

Regarding AMD's output of different chips throughout the next couple of years, I think I remember seeing a post from @Dave Baumann stating that releasing two new chips per year is sufficient. There's Polaris 10 and Polaris 11 in 2016, Vega 10 and Vega 11 in 2017, Vega 20 and Vega 2x in 2018.
AMD will also be releasing Zen APUs starting H2 2017. These APUs may start to assimilate the discrete GPUs' lower tiers like Polaris 11, and somehow compensate for the fewer GPU releases.
 
Some aspect of current GCN is keeping clocks "low" (we saw this at 28nm, too). I doubt it's the process. Although Global Foundries is generally useless, apparently, so we can't eliminate that as reason for poor clocks/power in Polaris.
Dat Fast14
 
It will be unusual for AMD to directly replace Polaris 10 with Vega 11 within 12 months of the Polaris release. Since GCN's launch they've favoured extended GPU lifespans and filling in the lineup gaps in alternating years.
Perhaps it is a replacement in terms of positioning rather than absolute performance, e.g. Vega 10 becomes RX 590 / Fury RX, Vega 11 RX 580, Polaris 10 RX 570/560
 
The TBP for Vega 20 is 150W and it'll bring PCI Express 4.0 support, meaning the card could work without and PCIe power cables.

I think you're referencing recent news that pcie 4 would support several hundred watts through the slot (i.e. no external connectors).

That was ultimately retracted and it looks like it'll stay at 75W.

http://www.tomshardware.com/news/pcie-4.0-power-speed-express,32525.html

It will be unusual for AMD to directly replace Polaris 10 with Vega 11 within 12 months of the Polaris release. Since GCN's launch they've favoured extended GPU lifespans and filling in the lineup gaps in alternating years.
Perhaps it is a replacement in terms of positioning rather than absolute performance, e.g. Vega 10 becomes RX 590 / Fury RX, Vega 11 RX 580, Polaris 10 RX 570/560

I think that's reasonable. That's what I expected, a transition 500 with partial rebrands & repositions to make room for Vega.
 
Could Vega 20 be a server only chip since Navi should be on 7nm too. Maybe Navi will be striped of server features.
 
That was ultimately retracted and it looks like it'll stay at 75W.
Hum.. they say it's TBD so it probably won't stay at 75W but probably not reaching 300W either as otherwise there would be no need for clarification.

Thanks for the update!


It will be unusual for AMD to directly replace Polaris 10 with Vega 11 within 12 months of the Polaris release. Since GCN's launch they've favoured extended GPU lifespans and filling in the lineup gaps in alternating years.
Perhaps it is a replacement in terms of positioning rather than absolute performance, e.g. Vega 10 becomes RX 590 / Fury RX, Vega 11 RX 580, Polaris 10 RX 570/560

Agreed. With 2 chips per year it doesn't look like AMD would have the luxury to completely phase out a 1 year-old chip.
So with that option we'd have Polaris 10 with 36 CUs, Vega 10 at 64 CUs and Vega 11 at... perhaps 44 CUs like Hawaii?
 
AMD is definitely aware of GF's longer-term roadmaps and they've known about their plans for 7nm for quite a while. Yesterday's announcement to the general public doesn't mean much for AMD's development as they've probably known that for quite a while.

Oh I don't doubt that AMD are well aware of GF's long term plans, but someone starting a rumour or speculating might not be. If this rumour had come out a couple of days before global GFs node announcements I think it may have lent credence. Coming a day after just gets my suspicions raised, that's all.

Regarding AMD's output of different chips throughout the next couple of years, I think I remember seeing a post from @Dave Baumann stating that releasing two new chips per year is sufficient. There's Polaris 10 and Polaris 11 in 2016, Vega 10 and Vega 11 in 2017, Vega 20 and Vega 2x in 2018.
AMD will also be releasing Zen APUs starting H2 2017. These APUs may start to assimilate the discrete GPUs' lower tiers like Polaris 11, and somehow compensate for the fewer GPU releases.

I suppose that could be how things are going to be. And Zen APU could be really close to the 460 if they can get clocks up and address the BW issue (I wish DDR4 was clocking at 4266 like LPDDR4 is).

In fact ... I wonder if you could put a salvage APU on a board and sell it as an entry level GPU? CPUs with deactivated GPUs is already a really common thing, especially from behemoth Intel. I wonder if it could ever be true in reverse for some level of AMD products?
 
Last edited:
In terms of purely TDP versus peak arithmetic throughput, 12 TF at 225W relative to the RX 480's 5.8 TF at 150W would put Vega's perf/W at 1.36x that of Polaris.
It's a bit short of AMD's older perf/W roadmap slide that gave Vega 1.5x the efficiency of Polaris, although TF doesn't equal performance and the RX 480 is not as optimistic a starting point as what AMD's marketing used in its slide.

Vega would need to overachieve a bit, given that Polaris didn't really match up with that roadmap and that optimistic projection wouldn't have erased the competitive deficit even if it were hit.

For reference, one of the aforementioned patents from the big AMD thread is:
http://www.freepatentsonline.com/20160085551.pdf
This covers a variable SIMD-width CU, with an 8, 4, and 2-wide SIMD trio in place of the customary 16-wide. If that triad is actually put in place of one SIMD, it at least would (ed: not) regress from having 4 SIMDs in the CU.
One of the claims in the patent had the possibility that each of the smaller SIMDs could actually be 8-wide, just with selective gating.
The chain of assumptions could give 3x8x4=96 ALUs per SIMDx64x2FLOPx1GHz=12TF.
(edit: Missed the OP update, I'll leave the math out out here.)

Perhaps that and HBM could make up some of the efficiency gap.

There was a patent mentioned before about creating a tiled and binning front end with hidden surface removal built in, which might generate irregularly sized wavefronts that this ALU arrangement would cater to.
 
Last edited:
you either need to clock at 1.5 GHz or you need significantly more ALUs per CU
More ALUs per CU would be a stupid idea. AMDs CUs are already occupancy limited by register count. Nvidia halved the ALU count per SM in Pascal P100. This gives them more register space per thread and allows P100 to run complex shaders faster. AMD is already register bottlenecked in complex shaders. I would rather see AMD following Nvidia's lead than going to the opposite direction, especially as the register pressure seems to be a bigger problem for AMD.

1.5 GHz isn't impossible for Vega. There are custom GTX 1080 models with 1.75 GHz base clock and 1.9 GHz boost clock. Maxwell (980 Ti) was only running at 1 GHz (1075 MHz boost). Nvidia achieved 75% clock improvement by the shrink in a single generation. Why couldn't AMD achieve 50% clock improvement in two generations?
 
More ALUs per CU would be a stupid idea. AMDs CUs are already occupancy limited by register count. Nvidia halved the ALU count per SM in Pascal P100.
For AMD, the vector register file capacity would scale with SIMD count and/or SIMD width, since that is how register storage is distributed in a CU. I'm not sure why it would get worse unless that relationship were changed. The number of physical entries could be scaled in the vector and scalar portions as well.

1.5 GHz isn't impossible for Vega. There are custom GTX 1080 models with 1.75 GHz base clock and 1.9 GHz boost clock. Maxwell (980 Ti) was only running at 1 GHz (1075 MHz boost). Nvidia achieved 75% clock improvement by the shrink in a single generation. Why couldn't AMD achieve 50% clock improvement in two generations?
AMD's 28nm base clock is unclear. At least initially Hawaii had cases where it showed dips down to 800-850. The 28nm consoles have a conservative clock in that range as well.
Polaris 10's 14nm base/boost clocks do give that range of improvement. The best clock/voltage points for power/clock are measurably lower, and the boost clock or higher hits a voltage and power wall very quickly.
Some further architectural optimization or a fix of a process problem would be needed.
 
For AMD, the vector register file capacity would scale with SIMD count and/or SIMD width, since that is how register storage is distributed in a CU. I'm not sure why it would get worse unless that relationship were changed. The number of physical entries could be scaled in the vector and scalar portions as well.
I was talking about an architectural change (similar to Pascal P100, but in reverse direction). In current GCN architecture SIMD count and register file capacity are obviously tied.

If you added 50% extra SIMDs and registers into a single CU, then there would be 50% more clients to the CU shared resources: 4 texture samplers, 16 KB of L1 cache and 64 KB of LDS. There would be lots of L1 trashing, occupancy would be horrible in shaders that use lots of LDS and more shaders would be sampler (filtering) bound. You could counteract these issues by having 6 texture samplers, 24 KB of L1 cache and 96 KB of LDS in each CU. However a 50% fatter CU like this would be less energy efficient as the smaller one, since the shared resources are shared with more clients. There would be more synchronization/communication overhead and longer distance to move the data. I am not convinced this is the right way to go.
 
Back
Top