AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Deleted member 13524 · Nov 22, 2018

mczak said:
There weren't any power consumption numbers, right? If the cooling solution is indeed better, it could easily draw a bit more power and still be quieter (although I don't doubt efficiency increased substantially).

Even if the power consumption is higher, the chassis has the same size, the system is less noisy and the CPU is clocking higher therefore pulling more power, too.
So I really doubt a significant part of that 82% performance boost in a GPU-centric benchmark is coming from a higher TDP.

Besides, that efficiency is similar to what we'd find on a scaled up Vega 8/11 that we see in Raven Ridge.

entity279 · Nov 22, 2018

Maybe also the mobile Vega is configured differently compared to the mobile Polaris? (e.g. it may have beefier front end).

DavidGraham · Nov 28, 2018

AMD's comments on 7nm and 7nm+:

TSMC said in March 2017 that its process would offer up to 35% speed gains or 60% lower power compared to its 16FF+ node. However, AMD is only claiming that its chips will sport 25% speed gains or 50% less power compared to its 14-nm products.

“TSMC may have been measuring a basic device like a ring oscillator — our claims are for a real product,” said Mark Papermaster in an interview the day that the 7-nm chips were revealed.

“Moore’s Law is slowing down, semiconductor nodes are more expensive, and we’re not getting the frequency lift we used to get,” he said in a talk during the launch, calling the 7-nm migration “a rough lift that added masks, more resistance, and parasitics.”

Looking ahead, a 7-nm-plus node using extreme ultraviolet lithography (EUV) will “primarily leverage efficiency with some modest device performance opportunities,” he said in the interview.

https://www.eetimes.com/document.asp?doc_id=1333996&page_number=3

3dilettante · Nov 28, 2018

I do not have a way to verify some of the discussion I've seen elsewhere, but the general point was that Papermaster may have been giving TSMC too little credit on how it characterized its process scaling--even assuming there's marketing inflation. The idea was that TSMC was basing its evaluation on collections of physical IP and elements like the hard libraries it offers as options for adding process-optimized cores like ARM in test devices. While not a GPU or server SOC, it's not a simple oscillator.

This seems plausible to me. TSMC's nodes have nominal voltages and a series of rules and IP offerings that fit within what it considers its optimal zone. That comfort zone could be exceeded by a design that's pushing for aggressive circuit implementation and clock/voltage ranges beyond what more pedestrian designs would be using. AMD's value-add is its expertise in complex design integration and physical implementation, and it seems consistent that pushing the envelope means experiencing diminishing returns more acutely.

There were similar trends when it came to other product transitions like FinFETs on Intel's nodes, where there was a promise of massive scaling that could be realized so long as the cores didn't ramp into the upper voltage and clock ranges where fundamental limits and sub-linear improvement had the chance to make their continued presence known.

Frenetic Pony · Nov 28, 2018

DavidGraham said:
AMD's comments on 7nm and 7nm+:

TSMC said in March 2017 that its process would offer up to 35% speed gains or 60% lower power compared to its 16FF+ node. However, AMD is only claiming that its chips will sport 25% speed gains or 50% less power compared to its 14-nm products.

“TSMC may have been measuring a basic device like a ring oscillator — our claims are for a real product,” said Mark Papermaster in an interview the day that the 7-nm chips were revealed.

“Moore’s Law is slowing down, semiconductor nodes are more expensive, and we’re not getting the frequency lift we used to get,” he said in a talk during the launch, calling the 7-nm migration “a rough lift that added masks, more resistance, and parasitics.”

Looking ahead, a 7-nm-plus node using extreme ultraviolet lithography (EUV) will “primarily leverage efficiency with some modest device performance opportunities,” he said in the interview.

https://www.eetimes.com/document.asp?doc_id=1333996&page_number=3

A: TSMC's comparison is between their 16nm node and their 7nm node, AMD's is a comparison between GF's 14nm node and TSMC's 7nm node.

B: Foundries often state the most optimistic possible outcomes for numbers like this as an advertising scheme. They say "up to" so their shareholders can't sue for incorrect guidance. Their customers on the other hand need to produce correct guidance for actual products, so have to report typical numbers they're getting instead of the "up to" IE "absolute best case" numbers the foundry itself reports.

Alexko · Nov 28, 2018

Frenetic Pony said:
A: TSMC's comparison is between their 16nm node and their 7nm node, AMD's is a comparison between GF's 14nm node and TSMC's 7nm node.

B: Foundries often state the most optimistic possible outcomes for numbers like this as an advertising scheme. They say "up to" so their shareholders can't sue for incorrect guidance. Their customers on the other hand need to produce correct guidance for actual products, so have to report typical numbers they're getting instead of the "up to" IE "absolute best case" numbers the foundry itself reports.

Probably, but since TMSC's 16nm is apparently better than GF's 14nm…

del42sa · Nov 29, 2018

Alexko said:
Probably, but since TMSC's 16nm is apparently better than GF's 14nm…

or AMD just being wise and use a liitle bit conservative numbers ...

w0lfram · Dec 1, 2018

Frenetic Pony said:
A: TSMC's comparison is between their 16nm node and their 7nm node, AMD's is a comparison between GF's 14nm node and TSMC's 7nm node.

B: Foundries often state the most optimistic possible outcomes for numbers like this as an advertising scheme. They say "up to" so their shareholders can't sue for incorrect guidance. Their customers on the other hand need to produce correct guidance for actual products, so have to report typical numbers they're getting instead of the "up to" IE "absolute best case" numbers the foundry itself reports.

He was already made aware of that, but David keeps denying what TSMC has said, & what Dr Su has said... what I don't understand, is why he doesn't believe AMD's uplift from going from GF's 14nm, to TSMC's 7nm.?

Deleted member 13524 · Dec 3, 2018

For all intents and purposes, Vega 20 is a first-gen 7nm implementation and at the moment the very first "large" 7nm chip. AFAIK next to Vega 20 there's only A12X which is a little over a third larger (122mm^2 vs. 330mm^2).
Sure, nvidia hit the nail with their very first 16FF+ Pascal chips in terms of efficiency and clocks, but AFAIR that was a rather unprecedented occurrence. Plus, there had bee products with 16FF+ chips in the shelves for over a year when Pascal came to the market.

Regardless, the MI50/60 cards have conservative clocks as all cards do in their domain because they're designed to work at 100% capacity 24/7.
I still do think a consumer Vega 20 would go over 2GHz. Whether we'll actually see that or not is a different subject.

yuri · Dec 3, 2018

ToTTenTranz said:
Regardless, the MI50/60 cards have conservative clocks as all cards do in their domain because they're designed to work at 100% capacity 24/7.
I still do think a consumer Vega 20 would go over 2GHz. Whether we'll actually see that or not is a different subject.

Well, even at those conservative clocks the card pulls 300W equipped with a server-grade cooler. Pushing the card 15-20% further would likely result in a ~350W TDP.

Having a 7nm VEGA 10/20 hybrid with stripped pro features (half amount of VRAM, no FP64, no XGMI, etc.) seem to be a bit smoother.

Despoiler · Dec 3, 2018

yuri said:
Well, even at those conservative clocks the card pulls 300W equipped with a server-grade cooler. Pushing the card 15-20% further would likely result in a ~350W TDP.

Having a 7nm VEGA 10/20 hybrid with stripped pro features (half amount of VRAM, no FP64, no XGMI, etc.) seem to be a bit smoother.

They don't have coolers though. They are passively cooled. It is not as efficient as having a dedicated cooler and fan(s) in a single GPU system. Look at MI25 vs Vega 64. MI25 has a 300w TDP and Vega64 is 295. Keep in mind these are quoted as board power. In other words 2x 8pin = 300w. Simple watts in = watts out TDP. You aren't going to magically require more TDP when you aren't putting in any more power.

Rootax · Dec 3, 2018

But you can draw more than 300w from 2x8pin, no ? (and you have the pci-e slot too)

CarstenS · Dec 3, 2018

The Radeon Instinct cards AMD had on display at the Next Horizon Event were 8+6-Pin, not 2× 8-pin.

3dilettante · Dec 3, 2018

Despoiler said:
They don't have coolers though. They are passively cooled. It is not as efficient as having a dedicated cooler and fan(s) in a single GPU system. Look at MI25 vs Vega 64. MI25 has a 300w TDP and Vega64 is 295. Keep in mind these are quoted as board power. In other words 2x 8pin = 300w. Simple watts in = watts out TDP. You aren't going to magically require more TDP when you aren't putting in any more power.

The server boards do have a bank of high flow rate fans to blow air past them, which at least takes their power consumption out of the board's budget. Vega 64 gives 295W typical board power versus MI25's 300W thermal design power. TBP versus TDP is not well-defined, but the latter is likely more stringently policed by server architects. A gaming card with 2x8 pin leaves margin for up to 375W including the PCIE slot's 75W before taking the connectors beyond their individual specifications.

DavidGraham · Dec 3, 2018

I remember reading somewhere that server grade GPUs have lower clocks and higher TDP because the types of workloads they run usually stress out the GPU more, resulting in consistently higher power consumption and possibly temps than gaming GPUs.

Despoiler · Dec 4, 2018

3dilettante said:
The server boards do have a bank of high flow rate fans to blow air past them, which at least takes their power consumption out of the board's budget. Vega 64 gives 295W typical board power versus MI25's 300W thermal design power. TBP versus TDP is not well-defined, but the latter is likely more stringently policed by server architects. A gaming card with 2x8 pin leaves margin for up to 375W including the PCIE slot's 75W before taking the connectors beyond their individual specifications.

Passive in this case just means the card itself has no fan. The cooling is done by other methods that AMD does not provide. The cooling depends on the rack and the building HVAC setup. There are a couple ways of doing it that I typically see. You can have AC per rack where the full rack is a sealed system with dedicated cooling inlet and exhaust or you have fans as part of the partial rack enclosure with building AC cooling the entire room. Pretty much any combination you want though. As far as the PCIE + slot power, I realize you can pull from the slot, but AMD has been pretty restrictive on using the slot ever since the Polaris fiasco. I could be misremembering, but I thought they power memory from the slot and GPU from the cables to keep things inline.

DavidGraham said:
I remember reading somewhere that server grade GPUs have lower clocks and higher TDP because the types of workloads they run usually stress out the GPU more, resulting in consistently higher power consumption

True. TDP and TBP often have a component of use case for the product.

3dilettante · Dec 5, 2018

Despoiler said:
Passive in this case just means the card itself has no fan. The cooling is done by other methods that AMD does not provide.

My understanding is that there is a presumed minimum air flow rate the server will provide over the copper fins. Firepro passive cards in the past have specified CFM rates ranging from 10 to 30, where the top range is around what a blower cooler would provide.

As far as the PCIE + slot power, I realize you can pull from the slot, but AMD has been pretty restrictive on using the slot ever since the Polaris fiasco. I could be misremembering, but I thought they power memory from the slot and GPU from the cables to keep things inline.

Polaris had its supply ratio shifted, although it didn't stop using the slot. Certain domains of the GPU are powered by the PCIe supply, or some like memory were substantially supplied by it.
For Vega, I'm not sure of the split. At least some reviews that checked showed the slot's electrical load being around half its peak.
https://www.tomshardware.com/reviews/amd-radeon-vega-frontier-edition-16gb,5128-10.html
It's safely below max, but sufficiently non-zero. The card did have transients noticeably above 300W, which are not TDP-breaking on their own. The specifics of their duration and average time above the limit is more closely monitored for server products, rather than the enthusiast-targeted "typical" handwaving standard.

Deleted member 13524 · Dec 10, 2018

New Vega logo registered by AMD:

https://trademarks.justia.com/882/10/n-88210086.html

Reads like "Vega 2".

Either is "Vega 2" is the consumer name for Navi or Vega 20 is coming to a consumer product soon.

My vote is on Vega 20 chip, 3 stacks of HBM2 for 12GB at 768GB/s, reduced FP64 throughput (1:8 or 1:16) and clocks up to 2.1GHz boost.
Price and performance between a 2080 and 2080 Ti.

Bondrewd · Dec 10, 2018

ToTTenTranz said:
My vote is on Vega 20 chip, 3 stacks of HBM2 for 12GB at 768GB/s, reduced FP64 throughput (1:8 or 1:16) and clocks up to 2.1GHz boost.

~not gonna happen~
At best it's gonna be FE-like SKU for those who want.

Kaotik · Dec 10, 2018

ToTTenTranz said:
New Vega logo registered by AMD:

https://trademarks.justia.com/882/10/n-88210086.html

Reads like "Vega 2".

Either is "Vega 2" is the consumer name for Navi or Vega 20 is coming to a consumer product soon.

My vote is on Vega 20 chip, 3 stacks of HBM2 for 12GB at 768GB/s, reduced FP64 throughput (1:8 or 1:16) and clocks up to 2.1GHz boost.
Price and performance between a 2080 and 2080 Ti.

The "Vega V" logo has been used in Radeon Instinct technical marketing at least, so new logo doesn't necessarily mean anything regarding any consumer products

AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Deleted member 13524

Guest

entity279

DavidGraham

3dilettante

Frenetic Pony

Alexko

del42sa

w0lfram

Deleted member 13524

Guest

yuri

Despoiler

Rootax

CarstenS

Moderator

3dilettante

DavidGraham

Despoiler

3dilettante

Deleted member 13524

Guest

Bondrewd

Kaotik

Drunk Member