AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

The MI25 is up against two models of Tesla for slightly different roles; P100 (FP64/FP32/FP16 no accelerated Int8 functions) and also the P40 (highest FP32 GPU Nvidia offers but with no 2xFP16 accelerated functions and accelerated Int8 functions instead for inferencing).
You can see the different strategies at work; Nvidia sees the market going more dedicated nodes and splitting GPU requirements between training (P100) and inferencing (P40), but IMO not all HPC-research sites will be doing dedicated inferencing and AMD's solution from a hardware perspective fits them better (putting aside the software-platform considerations).
Probably a balance of pros/cons and does work out the new trend becoming more dedicated nodes but probably not for everyone.
Cheers
Note thought, that AMD also labels the lowest performing card MI6 as an inference accelerator and the most expensive one MI25 as a training accelerator. They are well aware of different target markets.

That being said, I could imagine that commercial applications of AI would rather have dedicated installations for training (longer run times, not updated every hour to the live environment) and inference (high performance live environment for user experience).
 
those 4 RU servers are monsters.........

NVMe over fabric... ROFL , i guess you would need 288 PCIe lane switch for that.....
if the GPU's can talk to NVMe without CPU assistance then thats going to be interesting
 
Most definitely not a dual-chip solution. This is a 300W passively cooled card so maybe it needs the volume for a large heatsink.
Maybe, but more dice at lower clocks should be more power efficient. Leaning more towards an SSG design.

NVMe over fabric... ROFL , i guess you would need 288 PCIe lane switch for that.....
if the GPU's can talk to NVMe without CPU assistance then thats going to be interesting
SSG does that with a Fiji, no reason Vega couldn't. I'd say it's rather likely. Even the K888 with 4x MI25 lists "Optional 8x NVMe support". SSGs did use two drives in RAID0.
 
Note thought, that AMD also labels the lowest performing card MI6 as an inference accelerator and the most expensive one MI25 as a training accelerator. They are well aware of different target markets.

That being said, I could imagine that commercial applications of AI would rather have dedicated installations for training (longer run times, not updated every hour to the live environment) and inference (high performance live environment for user experience).
Well you market what hardware you can :)
It is amusing because they advertise the accelerated FP16 packed math with the MI25 that will be for inferencing rather than training, but this card is also for training and has good FP32.
And I should had really said the P40 was an either case rather than just inferencing, but could be deemed training nodes may also need other tasks relating to FP64 modelling for greater flexibility.

Yeah it can be argued what will become the standard with regards to inferencing (Int8 vs FP16) and training (how often can FP16 be used over FP32).
Nvidia has sort of fudged the model with the way they split this in terms of flexibility (gained some/loss some depending upon card), but there is more talk generally about dedicated nodes for training and inferencing.

The lower level cards are going up against the P4, which has around 5.6 FP32 TFLOPs (and with accelerated Int8 functions) at 50W, again marketed for inferencing by Nvidia.
Context HPC.
Cheers
 
Last edited:
Are we still operating under the assumption it is 4096 ALU? 1500mhz still seems excessive to me, considering this is an HPC part as well that would strongly imply any consumer version would be clocking at least at 1600mhz.
 
When trying to make more educated guesses about Vega and P100, I found this on the SK Hynix website
79Vyzbr.png

From Samsung, I haven't seen anything wrt to HBM gen2 after their intial announcement from january 2016 to be starting production.
 
It seems AMD didn't disclose to the tech press what the NCU is or what they mean by "High Bandwidth Cache and Controller".

The "high bandwidth cache and controller" is probably hinting it's a SSG-model
Does that make sense for a deep learning / inferencing card, though? I get that a dedicated SSD may be great for graphics and video workstations, but is it good for neural networks to have lots of data spread throughout "local storages" in each GPU?
 
It seems AMD didn't disclose to the tech press what the NCU is or what they mean by "High Bandwidth Cache and Controller".


Does that make sense for a deep learning / inferencing card, though? I get that a dedicated SSD may be great for graphics and video workstations, but is it good for neural networks to have lots of data spread throughout "local storages" in each GPU?

Yes it is good, neural networks need huge amounts of bandwidth, the SSD, is being use like a caching system, it could help greatly.

All this aside though AMD needs their software/packages up and ready to go first though.
 
Does that make sense for a deep learning / inferencing card, though? I get that a dedicated SSD may be great for graphics and video workstations, but is it good for neural networks to have lots of data spread throughout "local storages" in each GPU?
They might not be local. Using a high speed fabric in place of a local drive could work for connecting a large volume of cards sharing a static data set. Even without the fabric each card in theory has more than enough space to hold most data sets without having to reach out to the CPU. Keep the GPUs largely self contained while they do their thing. You wouldn't necessarily need x16 PCIE lanes to the CPU as the cards wouldn't be going there often. No reason a single CPU with a bunch of switches couldn't handle 16 cards effectively for most workloads.

When trying to make more educated guesses about Vega and P100, I found this on the SK Hynix website
What if we're looking at a dual Vega? 8192 cores at 750MHz? If that's the top of the line card in theory they maxed out the die space leaving a single Vega 10 around the 600mm2 mark on par with Fiji. A pair of ~300mm2 wired together makes more sense. Each with 8GB HBM2 for the consumer Vega. Those would be fairly conservative clocks, but about as power efficient as you could get.
 
The German news websites are mentioning that AMD has shown them two Vega-powered systems running Doom:

GwXvvz.jpg

Pmh2hj.jpg



If anyone still remembers
, 687F:C1 is the name of the unknown GPU that appeared in the Ages of the Singularity results database.
I guess this means those results are legit, and they do belong to a new Vega GPU.

The news websites are also claiming that the new Vega card has a 512GB/s bandwidth. With a total of 8GB detected in the Doom demo, this means the Vega that was shown to the press probably has 2 stacks of 4-Hi HBM2 running at 2Gbps.
 
Do we already now if Vega with HBM2 will be the same size as Fury X/Nano card ? (I mean the PCB, not the chip itself). Short cards are cool...
 
What if we're looking at a dual Vega? 8192 cores at 750MHz? If that's the top of the line card in theory they maxed out the die space leaving a single Vega 10 around the 600mm2 mark on par with Fiji. A pair of ~300mm2 wired together makes more sense. Each with 8GB HBM2 for the consumer Vega. Those would be fairly conservative clocks, but about as power efficient as you could get.
Given the size of the card shown as well as AMDs sweet spot of ~850-900 MHz for perf/watt in Fiji, I could follow your line of reaseon. However, with the move to 14nm, the sweet spot should have signifcantly moved up as well as 300 Watts with the more energy efficient HBM2 (compared to GDDR5X) already included and for "only" being slightly ahead of Titan X I would think this were no good signs for AMD. So I prefer to believe that this is a single GPU card with either some very complex power circuitry or some other nice surprises (SSD-attached, NCU "Network")
 
http://radeon.com/en-us/instinct/

Given the size of the card shown as well as AMDs sweet spot of ~850-900 MHz for perf/watt in Fiji, I could follow your line of reaseon. However, with the move to 14nm, the sweet spot should have signifcantly moved up as well as 300 Watts with the more energy efficient HBM2 (compared to GDDR5X) already included and for "only" being slightly ahead of Titan X I would think this were no good signs for AMD. So I prefer to believe that this is a single GPU card with either some very complex power circuitry or some other nice surprises (SSD-attached, NCU "Network")
I'm saying two Vega10s on the same interposer providing a package roughly the same size as Fiji. So it makes sense that if AMD has 25% higher FP16 performance than P100 at ~600mm2 they would at least have have similar areas. Vega11 was in theory the "big" one. While possible, I doubt they made a chip significantly larger than 600mm2. Vega10 at ~300mm2 benchmarking a bit faster than an overclocked 1080 (314mm2) makes sense. Then a pair of them at lower clocks for the big card with double the ram. Otherwise they're horribly inefficient if 50% more bandwidth and double the die size is only marginally faster than GP104 in games, but outpaces P100 (which already has 2xFP16) in deep learning with 30% less bandwidth.
 
Last edited:
Do we already now if Vega with HBM2 will be the same size as Fury X/Nano card ? (I mean the PCB, not the chip itself). Short cards are cool...
I see no reason why it wouldn't. Keep in mind that Vega 10 will probably be in >250W cards (save for an eventual Nano 2 solution), so unless it keeps the Fury X's AiO cooling solution then the cards themselves should be quite big.


Given the size of the card shown as well as AMDs sweet spot of ~850-900 MHz for perf/watt in Fiji, I could follow your line of reaseon. However, with the move to 14nm, the sweet spot should have signifcantly moved up as well as 300 Watts with the more energy efficient HBM2 (compared to GDDR5X) already included and for "only" being slightly ahead of Titan X I would think this were no good signs for AMD.

One thing to keep in mind is that AMD already put apart $340 million from their 2016 Q3 to pay Global Foundries for producing chips at TSMC.
Given the timing of Vega's production schedule (news websites are claiming the Vega chips being presented only came a few weeks ago), it's a good possibility that Vega is being produced at TSMC, whose 16FF+ seems to be achieving substantially higher clocks than Samsung's/GF's 14F.
So Vega could be very different from Polaris in what relates to achievable clocks (it had better be, otherwise AMD spent >$340M for nothing?).
 
Back
Top