AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Really? So now Vega will somehow leapfrog Pascal and achieve 100% performance over Fiji with what? The same ALU count and a frequency boost?
25 vs 12 TFLOPs against Titan Xp is a bit more than 100%, so it seems that way looking at compute, but TFLOPs isn't everything. Just need to see what other changes they made to boost performance. That 12/25 TFLOPs may not be counting everything if they added flexible scalars or changed the standard FMAs. Then consider the possibility Vega10 could be the mid-range product with a dual Infinity design the high end like with Ryzen. Raja said it was "possible", but they hadn't mentioned it.

Polaris vs Pascal and Vega vs Volta may just be a coincidence, "poor volta", releases within a matter of months, potentially similar deep learning capabilities, page migration, die size, etc.

IOW: improved virtual memory access, but not the first to have HW support virtual memory.
Can't say I've seen that feature showing up in very many games. Just because something "can" be done doesn't mean it should be. It's the more reasonable page size that is significant. Nvidia mentioned access counters for a reason.

I don't understand the urge to pit Vega against this architecture but not the other. At the end of the day, what matters is that AMD took a break from competing against the high end for more than a year.
It's more an argument of AMD allegedly pitting two (Polaris and Vega) architectures against Pascal. Seems a bit odd to be constrained on R&D and release two different architectures with different IP together. It seems obvious they skipped a Polaris high end to focus on the next generation architecture covering enthusiast to low-end and APU.
 
25 vs 12 TFLOPs against Titan Xp
What? I can't really tell if you are being serious or not? But you do reallize that 25 number is FP16, FP32 is just 12.5, which barely scraps past the Titan Xp. And even then AMD always had an advantage in pure tflops that never translates into a gaming advantage.
Polaris vs Pascal and Vega vs Volta
You do realize that comparison doesn't work in your favor, right? Polaris was never a match for Pascal, how can you state Vega is a match for Volta?! Are we allowing the discussion to mutate into Astrology and oracle like prophecies now?
 
Last edited:
What? I can't really tell if you are being serious or not? But you do reallize that 25 number is FP16, FP32 is just 12.5, which barely scraps past the Titan Xp. And even then AMD always had an advantage in pure tflops that never translates into a gaming advantage.

You do realize that comparison doesn't work in your favor, right? Polaris was never a match for Pascal, how can you state Vega is a match for Volta?! Are we allowing the discussion to mutate into Astrology and oracle like prophecies now?

Plus since when did we start estimating performance of products based on alliteration lol, this is ridiculous.

Now we're looking at packed FP16 throughput and comparing it to FP32 throughput on GP102 and claiming it's a wash... Vega and Volta both begin with the same letter, they must be a match
 
One plausible explanation would be drivers, maybe they are not up to par yet. The consumer cards are 2 months away from release and probably 3 months away from stock being healthy.
Although they did happily show 4K and FPS with Sniper Elite 4 at the Financial Day; was it a single GPU? - it looked to have possibly some kind of higher setting when watching the shadows but as someone mentioned no way to tell specifically what those settings were.
That demo was running equal to above 60 fps on the 20 seconds segment they did (running along the path-climbing-stabbing an enemy), and even much earlier event they demo'd Star Wars Battlefront 2 at 4K and 60fps (albeit in a more simple map area).
Not sure if I like the look of Prey or not tbh, just at times feels like it does not present the greatest visual fidelity compared to some other games.
Cheers
 
Last edited:
I will really dont say that Vega is a match on Volta ( even if in some specific area, it could give it a run for this money, but not for gaming ), Navi and Vega 20 are going to be there for Volta.

The thing is Vega have take some delay ( due to HBM2, maybe, i dont know ), and after Polaris who was just intended ( in the version released ), to go on low end to "mid"end gear, this have put AMD in a desert on high end.. Maybe they was a big version of Polaris, maybe not, maybe vega was initially the high end of Polaris maybe not...

We dont know, if they was a big Polaris, or not... we just see the resulting product, not the initial plan.
 
Last edited:
Well the feature set is more in line with Voltage than Pascal, so why not? Unified memory, theoretical TFLOPs, FP16 (possibly), so why not?
Sorry, people keep repeating that, but I don't see it that way. We can't speak yet in consumer pace for but HPC/hyperscale, Vega is annihilated by Tensor cores for the most important and hyped workflow of the decade. Vega lacks any communication interface between chips (aka NVLink) for better scaling. Vega lacks industry support like HGX format promoted by the biggest OEMs and Microsoft. Of course, because it uses NVlink and SMX2 connector, Vega will never be compatible and I don't even talk about rack computing density where Vega is miles behind. Finally, CUDA ecosystem still has no serious answer from AMD...
Don't get me wrong, Vega will attract some customers too, especially small ones looking for better value for money (does AMD have a choice?), but it will be nowhere close to the success and, most important, close to the revenue generated by Volta.
 
Can't say I've seen that feature showing up in very many games. Just because something "can" be done doesn't mean it should be.
AFAIK, this feature is only exposed in CUDA. I assume this is not something you can merge into the existing DirectX driver (and that Vega wouldn't be able to make use of a similar feature without a game making explicit use of it.)

It's the more reasonable page size that is significant. Nvidia mentioned access counters for a reason.
The counters are mentioned because it helps make to a good decision about which page to swap out to main DRAM.
 
No they didn't...
Yes and no. With the Doom-Demo at 2016 RTG Summit you could open the settings dialogue and check for yourself. So, while technically maybe they did not tell "the press" (I cannot confirm to whom each and every AMD employee talked), it qualifies in my very much as "settings and fps".
 
Sorry, people keep repeating that, but I don't see it that way. We can't speak yet in consumer pace for but HPC/hyperscale, Vega is annihilated by Tensor cores for the most important and hyped workflow of the decade. Vega lacks any communication interface between chips (aka NVLink) for better scaling. Vega lacks industry support like HGX format promoted by the biggest OEMs and Microsoft. Of course, because it uses NVlink and SMX2 connector, Vega will never be compatible and I don't even talk about rack computing density where Vega is miles behind. Finally, CUDA ecosystem still has no serious answer from AMD...
Don't get me wrong, Vega will attract some customers too, especially small ones looking for better value for money (does AMD have a choice?), but it will be nowhere close to the success and, most important, close to the revenue generated by Volta.
You make helluva lot of assumptions there, which I'm pretty sure at least some are false assumptions. Only thing true for certain is lack of NVLink, which is proprietary NVIDIA bus, and SXM 2.0 which is AFAIK just as exclusive format by NVIDIA, not an industry standard.
 
You make helluva lot of assumptions there, which I'm pretty sure at least some are false assumptions. Only thing true for certain is lack of NVLink, which is proprietary NVIDIA bus, and SXM 2.0 which is AFAIK just as exclusive format by NVIDIA, not an industry standard.
no assumptions but facts that you can easily check (google is your friend).
Regarding HGX becoming a standard among the hyperscalers:
https://www.forbes.com/sites/patric...andard-for-aiml-cloud-computing/#7ad5c9fb4d59
https://www.hpcwire.com/off-the-wire/nvidia-partners-manufacturers-advance-ai-cloud-computing/
http://www.eetimes.com/document.asp?doc_id=1331798&page_number=1
 
Sorry, people keep repeating that, but I don't see it that way. We can't speak yet in consumer pace for but HPC/hyperscale, Vega is annihilated by Tensor cores for the most important and hyped workflow of the decade.
How so? The only difference seems to be that Tensor throws a lot more silicon and ALUs at the problem. Strip all instructions not related to FMA from a SIMD and you have Tensor. That's a distinct possibility with a flexible scalar if AMD went that route. Put a pair of 32 bit FMA units capable of packed math in each SIMD lane along with L0 cache and suddenly AMD has 4 Tensor'ish cores per CU with the ability to bond dice with Infinity. So >1000mm2 of silicon per GPU before considering a traditional "dual" part. All in a standard consumer part that can work for graphics and it works with the quoted ops/clock AMD has listed. Wouldn't be too different from Zen's FPU scaled out to a SIMD with SMT if they went that route.

Besides the big companies seriously into deep learning all made their own custom hardware.

Vega lacks any communication interface between chips (aka NVLink) for better scaling.
Infinity Fabric and MCM Threadripper/Naples as a backbone doesn't exist? Even better it doesn't require IBMs Power line of CPUs, so x86 works. That's 8 GPUs per server with direct access to 8 memory channels and potentially better density and perf/watt.

I don't even talk about rack computing density where Vega is miles behind.
Covered above, but AMDs 3 petaflop racks seem respectable enough. Plus they aren't locked into deep learning which covers most of the HPC market.

Finally, CUDA ecosystem still has no serious answer from AMD...
Guess I didn't realize CUDA was that relevant to HPC. With less than 20% of supercomputers even using a GPU after all. CUDA for CPU acceleration over C/C++, Fortran, and most other languages constituting the vast majority of applications then?

AFAIK, this feature is only exposed in CUDA. I assume this is not something you can merge into the existing DirectX driver (and that Vega wouldn't be able to make use of a similar feature without a game making explicit use of it.)
The HBCC demos would suggest it works on DirectX transparently. Plus the ability to directly interface with x86. It seems like Vega clusters sit on Infinity like Zen clusters with direct access. Only change is less coherent cache, but a CU should be able to read memory pointers with HBCC transparently caching data like LLC on a CPU.
 
How so? The only difference seems to be that Tensor throws a lot more silicon and ALUs at the problem. Strip all instructions not related to FMA from a SIMD and you have Tensor. That's a distinct possibility with a flexible scalar if AMD went that route. Put a pair of 32 bit FMA units capable of packed math in each SIMD lane along with L0 cache and suddenly AMD has 4 Tensor'ish cores per CU with the ability to bond dice with Infinity. So >1000mm2 of silicon per GPU before considering a traditional "dual" part. All in a standard consumer part that can work for graphics and it works with the quoted ops/clock AMD has listed. Wouldn't be too different from Zen's FPU scaled out to a SIMD with SMT if they went that route.

The point you appear to be missing is that Tensor cores are distinct from the rest of the ALU/FPU units and can run independently .
 
while technically maybe they did not tell "the press" (I cannot confirm to whom each and every AMD employee talked),
They told everyone, they released a video on the official AMD YouTube channel, hosting an AMD marketing employee talking about the Doom 4K demo, and showing both the Ultra settings and the fps counter.
 

You seriously think that one motherboard / rack design from one manufacturer will become industry standard over night? There are several OCP compatible designs out there, including AMD's and AMD already has Vega designs with major server companies.
You don't know how fast Vega is in tensor tasks, you assume it gets absolutely "annihilated".
Vega does have Infinity Fabric to communicate between GPUs and/or CPUs
AMD's ROCm support all the major frameworks for machine learning, they support variety of languages etc
Vega can do 400 TFLOPS in 4U, how many miles ahead is NVIDIA again?

The point you appear to be missing is that Tensor cores are distinct from the rest of the ALU/FPU units and can run independently .
Has NVIDIA actually confirmed this? At least earlier it's been either FP64 units or FP32 units, not both at the same time, so is there a reason to believe it's now suddenly FP32+Tensor or FP64+Tensor instead of FP32 or FP64 or Tensor?
 
In one clock, with GCN's existing broadcast/permute capability and regular register accesses?
I wouldn't say existing capability, but the redesign would in theory have four operands to read and broadcast. Writes absorbed by an accumulator and L0 registers.

Is this still with a 64-wide wave?
I've tried to mentally picture how this works, without getting into the relative widths of the units.
I was thinking more along the lines of two or three 64 wide waves per SIMD per cycle, possibly with the cadence. Similar to the Zen FPU with SMT and a temporal scalar per SIMD running low frequency(integer, SFU, etc) instructions along with the traditional scalar work. In theory the RF could shrink as the L0 would absorb some registers as they wouldn't need written. That would require some compiler work.

After the initial full-precision multiply, there would need to be a set of additions whose operands are at a stride of 4 in the other dimension and the elements from register C--not a standard 1 cycle operation or handled by the regular write-back path of a SIMD.
Accumulation should bypass the write back. It may also be possible to interleave matrices. Four Tensors in four clocks, but I need to study that more.

Officially, there is currently no special relationship between Vega and Zeppelin. It's only described as using PCIe for the compute cards. xGMI was on a slide for Vega 20.
I didn't mean to imply that there was, but Naples in theory already does that with Infinity so the capability could be there. Even with PCIe the capabilities should be there along with 64 or 128 lanes to connect all the GPUs. So Naples isn't required, it just happens to be really good for IO as an interconnect with that mesh. AMD leans on the CPUs fabric as opposed to NVLink directly connecting processors.

If CUDA is not that relevant, AMD's efforts that not that relevant relative to that.
It's sufficiently relevant that AMD's revamped compute platform specifically provisions for translating from CUDA.
One aspect of a larger picture. There is more than just deep learning and GPU work in HPC. The vast majority of that market is CPU clusters with the corresponding code backing it up.
 
If AMD hasn't talked about tensor tasks, I doubt they will be any good at them. Prior to Volta, if AMD had great tensor performance on Vega, they would have talked about it cause the only other product that is very good at those tasks is google tensor.
 
Last edited:
If AMD hasn't talked about tensor tasks, I doubt they will be any good at them. Prior to Volta, if AMD had great tensor performance on Vega, they would have talked about it cause the only other product that is very good at those tasks is google tensor.
AMD hasn't talked about tensor tasks a lot, but they have said they support them. Can't find the quote now but I think it was in the Financial Analyst Day broadcast
 
If AMD hasn't talked about tensor tasks, I doubt they will be any good at them. Prior to Volta, if AMD had great tensor performance on Vega, they would have talked about it cause the only other product that is very good at those tasks is google tensor.
Vega should be close to Pascal P100 (which is currently selling at 5000$+) in tensor math. It has double rate fp16 and quad rate int8, similar to Pascal.

Volta is obviously faster, but it not available yet and going to cost even more (rumors say around 10000$). If AMD is slower inference they certainly have room to price their product lower than Volta.

The V100 will first appear inside Nvidia's bespoke compute servers. Eight of them will come packed inside the $150,000 (~£150,000) DGX-1 rack-mounted server, which ships in the third quarter of 2017. A 250W PCIe slot version of the V100 is also in the works (probably priced at around £10,000)

source: https://arstechnica.com/gadgets/2017/05/nvidia-tesla-v100-gpu-details/
 
Back
Top