AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

TDP is far more nebulous than TFLOPS figures, AMD would have quite the egg on their faces if MI25 doesn't come close to the 25TFLOPS its name would suggest and the various configurations it's in, than their polaris power efficiency numbers.

Even if they are boost numbers that the card would rarely hit, they'd have to be stable enough that AMD are willing to have them on professional cards, so that desktop cards should easily hit it as well.

There's a rumor going around that AMD are launching it on 9th May and that the card will have 1.6Ghz+ boost frequency. That'd be a pretty quiet launch considering their 'make some noise' advertising and even at those boost figures they wouldn't go past 1080 custom cards when you take into account the confirmed time spy score of a Vega variant. I don't think there's going to be some magical driver development that'd increase its numbers compared to Fiji at the same clock, those slides were marketing and as nebulous in their performance numbers as TDP is.
Does it need to be faster than a 1080 custom ? Perhaps custom vega's will be fast enough for that. Also in terms of noise I would wait for the launch to judge how noisy it gets. Price and performance are both important and if they can come in under the 1080 price with 1080 performance it will make noise.
I'm hoping they were able to beat 1080 performance however. We really need two strong gpu makers in the industry otherwise things will get worse for all of us
 
I hope it's faster than a custom 1080... A Fury X is already near the 1070 now in some games (cudos to amd btw, very good driver support for old cards). The product come latter, and not by a small amount of time (sorry for my english...). Or they do a Ryzen ? Not faster, but a very good price / performance ratio ? I doubt they can make a "big vega" "cheap"... I hope it won't be a HD 2900XT again.
 
I hope it's faster than a custom 1080... A Fury X is already near the 1070 now in some games (cudos to amd btw, very good driver support for old cards). The product come latter, and not by a small amount of time (sorry for my english...). Or they do a Ryzen ? Not faster, but a very good price / performance ratio ? I doubt they can make a "big vega" "cheap"... I hope it won't be a HD 2900XT again.
1080 performance , similar power usage and low price would be a great trifecta for me. I would jump from my 290
 
Oh, I didn't even realize that you didn't link wccftech, so it's more likely they copied it off 3dcenter.:LOL:

Anyway, this is likely the card that AMD were showing off in various games since the same device ID showed up with DOOM and it was judged to be around 1080 in performance, so the performance in these tests in somewhat surprising, even more so when one considers AMD blocking off parts of card during such demonstrations while the person who is running these tests is likely a tester for AMD and wouldn't have such problems.
 
Here are the 3dmark11 performance test numbers, which I think are cpu bottlenecked or have some other problem since a Fury X scores 3k more on graphics.
Probably geometry considering the OpenCL Catmul results. With low levels of tessellation and clocks normalized, the Vega was almost exactly half the performance of Fiji. Those results increased 3x or 1.2x respectively for Catmul5. So it could be half the geometry pipelines disabled, clockspeeds way off their design targets, or simply horrible performance with simple geometry. Not like an engineering sample needs fully enabled for testing.
 
Probably geometry considering the OpenCL Catmul results. With low levels of tessellation and clocks normalized, the Vega was almost exactly half the performance of Fiji. Those results increased 3x or 1.2x respectively for Catmul5. So it could be half the geometry pipelines disabled, clockspeeds way off their design targets, or simply horrible performance with simple geometry. Not like an engineering sample needs fully enabled for testing.
That result would be extremely odd, as Vega has all the Polaris geometry pipeline improvements, 2x raw geometry throughput compared to Polaris, and improved geometry load balancing. Half performance per clock vs Fiji would mean that it behaves 4x worse in geometry bound scenario than raw specs would let us believe.

But are we talking about OpenCL benchmark here? OpenCL is not using geometry pipes at all, it is a pure compute API. What does the benchmark actually do?
 
But are we talking about OpenCL benchmark here? OpenCL is not using geometry pipes at all, it is a pure compute API. What does the benchmark actually do?
687F:C1 vs Fury

That's what I'm looking at. Most results being roughly identical when adjusted for clockspeed 1.2GHz to 1.0GHz. Not sure what exactly the test entails (no breakdown the benchs), but I'm assuming it's OpenCL (not OpenGL) leveraging the same compute hardware used for tessellation with a lot of interpolation. So at least somewhat representative of geometry performance, minus the culling and setup. Bandwidth seems to make a significant difference as Fiji saw a slightly larger boost than Polaris going from 3 to 5. Polaris 20 regressed every so slightly. The 5 results are reasonable, the 3 results seem to be running half speed for whatever reason. That's why I'm assuming the cards are being held back beyond just clockspeeds. Could be a regression, but seems an odd one.

With the stated improvements to Vega, I'd have expected the hardware to be more efficient at evaluating those codepaths. Beyond just culling and distributing the load more, considering the programmable nature and the first stage of the pipeline being removed with Vega.
 
I was just going to post that as well, but wccftech seems to have been looking there too.

Here are the 3dmark11 performance test numbers, which I think are cpu bottlenecked or have some other problem since a Fury X scores 3k more on graphics.

http://www.3dmark.com/3dm11/12148225

If it was CPU bottlenecked then this wouldn't be possible with the R7 1700: http://www.3dmark.com/3dm11/12163982

This Vega part is probably something on the lower end, or underclocked or simply bad drivers.
 
Not sure what exactly the test entails (no breakdown the benchs), but I'm assuming it's OpenCL (not OpenGL) leveraging the same compute hardware used for tessellation with a lot of interpolation.
OpenCL doesn't have any access to hardware tessellator or GPU geometry pipelines. OpenCL = pure compute kernels, just like CUDA. If this is pure OpenCL benchmark, then it has nothing to do with hardware tessellation or fixed function geometry performance.

You can obviously do catmull-clark subdivision by CPU or by a compute shader. You don't need any tessellation hardware for this. The fact that the compute shader benchmark result is "mTriangles/s" doesn't mean that it is using the fixed function triangle processing pipeline.

Interpolation: result = B*A + C*(1-A) = B*A + C - C*A. This is compiled to two multiply-add instructions. If this shader is mostly interpolation (multiply-adds), I doubt it would perform any different than GCN1-4. But we don't yet know whether NCU keeps the 4-cycle cadence with no visible instruction latency (for full rate instructions) and no bank conflicts. Multiply-add needs three input registers. It hits register files pretty heavily. If they have changed the basic GCN architecture with NCU, then all bets are off. In this case they would obviously also need a brand new shader compiler. We have seen 2x+ perf difference with a Maxwell Linux shader compiler, before the compiler added support for operand reuse cache. I would be surprised if GCN5 is a brand new CU architecture. I still believe that NCU is simply their marketing name for heavily power & clock optimized CU: 1.5 GHz clock rate, reduced power usage, support for double/quad rate 16/8 bit ops.
 
Last edited:
AMD's OpenCL performance is so random from driver to driver it's prolly best to completely ignore.

Having said that, a few of these results are very much in agreement with 1200MHz Vega versus 1050MHz Fury X in pure compute terms. Results for my Fury X with 17.4.3 (not freshly booted):

Level-set Simulation 128 - 9522.2
Level-set Simulation 256 - 12283.1
Local Tone Mapping - both crash
Ocean Surface - 3231.7 - Vega scales by theoretical compute
Catmull-Clark Level 3 - 146.7
Catmull-Clark Level 5 - 231.9
N-Body Simulation 128K - 339.9
N-Body Simulation 1024K - 82.3
Vertex Connection and Merging - 10.7
Subsurface Scattering - 5942.3 - Vega scales by theoretical compute
Subsurface Scattering Multiple View - 5499.9 - Vega scales by theoretical compute
TV-L1 Optical Flow - 50.1 - Vega scales by theoretical compute
 
TDP is far more nebulous than TFLOPS figures, AMD would have quite the egg on their faces if MI25 doesn't come close to the 25TFLOPS its name would suggest and the various configurations it's in, than their polaris power efficiency numbers.

Even if they are boost numbers that the card would rarely hit, they'd have to be stable enough that AMD are willing to have them on professional cards, so that desktop cards should easily hit it as well.

There's a rumor going around that AMD are launching it on 9th May and that the card will have 1.6Ghz+ boost frequency. That'd be a pretty quiet launch considering their 'make some noise' advertising and even at those boost figures they wouldn't go past 1080 custom cards when you take into account the confirmed time spy score of a Vega variant. I don't think there's going to be some magical driver development that'd increase its numbers compared to Fiji at the same clock, those slides were marketing and as nebulous in their performance numbers as TDP is.
There is a relationship between TDP-TFLOPs-frequency though that cannot be broken that much, Nano is a classic example of this with the rigid power management to stay close to 175W and its impact on core frequency with real world compared to spec rating TFLOPs relative to Fury X as I outlined in my previous post.
This is important because the Nano (as MI8) is used with those higher ratings (not same as real world) still with its 175W TDP in same slide as the MI25.
The Nano has a boost of 1000MHz but has difficulty even hitting 940MHz often with nearly any of the operations tested unless one changes core parameters; meaning affecting the TDP/voltage/etc - this is more relevent to enterprise/professional but is important as it is in same slide as the MI25.

I am a bit concerned for AMD/Vega myself with the targeted commitments provided back before 2017 for the reasons I mentioned earlier (and a more solid indicator HBM2 actually available is showing some of those targets look to been missed), I think from a technical perspective hitting 1300-1350MHz would be quite an achievement, anything above that would be superb.
Cheers
 
Last edited:
The Nano has a boost of 1000MHz but has difficulty even hitting 940MHz often with nearly any of the operations tested unless one changes core parameters; meaning affecting the TDP/voltage/etc.

The Nano clocks at 850-950MHz in games, but I haven't seen any clock measurements when everything non-essential for compute is completely turned off (e.g. ROPs, TMUs, geometry engines).

Plus, the Polaris 10 Mi6 claims 5.7 TFLOPs whereas the RX480 can do >5.8 in its maximum boost clocks (which are very rarely achieved in the reference versions).
They would be conservative for the Polaris 10 performance but overconfident for the other two cards?
 
I think from a technical perspective hitting 1300-1350MHz would be quite an achievement, anything above that would be superb.
Radeon RX 480 STRIX is clocked at 1310 MHz (base) / 1330 MHz (boost). It runs at pretty stable clocks (at least in our game). Stock Radeon 580 is running at 1340 MHz boost clock. AMD marketing has already clearly said that Vega NCU was designed to allow higher clock rate. This has been mentioned by many architecture previews. I'd say 1500 MHz is plausible. But I'd guess it is the boost clock rate, not the base clock rate. It is still a huge improvement over the Fury X 1050 MHz max clock rate.
 
Radeon RX 480 STRIX is clocked at 1310 MHz (base) / 1330 MHz (boost). It runs at pretty stable clocks (at least in our game). Stock Radeon 580 is running at 1340 MHz boost clock. AMD marketing has already clearly said that Vega NCU was designed to allow higher clock rate. This has been mentioned by many architecture previews. I'd say 1500 MHz is plausible. But I'd guess it is the boost clock rate, not the base clock rate. It is still a huge improvement over the Fury X 1050 MHz max clock rate.
Remember the official rating with TDP from AMD and going from a 380 to 580 gave 29% clock gains as I laid out in earlier post.
The TDP-voltage goes out the window with any of the custom designs, hence why I mention the context and importance of the Nano/MI25 and enterprise/professional world where this does not happen and the discussion is around the spec rating of the MI25 and we can also include MI8 (Nano) in that discussion.

If Vega ends up with close to 300W official reference TDP rating at 1300-1350MHz, custom designs will need to accomodate higher than that and if at all possible, yeah I appreciate this side of the debate is pure speculation but just looking at trends and AMD in recent past mentioning core speeds relative to Nvidia are worst due to the wide design used by AMD (came out in an interview over last 6 months).

So are you suggesting then that 1500MHz would be custom models only and not the official rating spec and therefore not applicable to MI25?
Because you are using 480 clocks that are not part of the standard reference TDP-clocks given by AMD (boost was 1266Mhz with 161W TDP), custom models were hitting 190W-200W TDP/TBP and higher voltages (important in context of real world or specs and MI8 and MI25 where much of the discussion is based upon for performance).
Cheers
 
Last edited:
The Nano clocks at 850-950MHz in games, but I haven't seen any clock measurements when everything non-essential for compute is completely turned off (e.g. ROPs, TMUs, geometry engines).

Plus, the Polaris 10 Mi6 claims 5.7 TFLOPs whereas the RX480 can do >5.8 in its maximum boost clocks (which are very rarely achieved in the reference versions).
They would be conservative for the Polaris 10 performance but overconfident for the other two cards?
And that just reimphasis my point, the reference spec boost figure TDP/TFLOPs was 5.8, in fact with the increased clocks for 580 AMD on their site now says "up to 6.2 TFLOPs" and has a TDP/TBP 185W while custom designs are hitting 200-230W.
And they dropped 0.1 TFLOPs for Mi6 with TDP of 150W, relative to 480 that had 160W TDP (real world and AMD adjusted for this) and not sustaining peak official boost.
Lets wait and see as unfortunately it seems AMD has missed some targets given before 2017 for Vega (HBM2 the biggest indicator), but as I said in earlier posts AMD is using boost figure and that does not necessarily mean guaranteed - and this is not a dig at AMD as Nvidia via their 'illustrous' leader overpromises at times as well or the devil is in the detail such as initial Drive PX2 specs reported.
But there are two discussions to be had and perspectives; whether ratings can be hit with the enterprise/professional Mi series given in presentation pre2017 along with HBM2/bandwidth also given for Vega, and what consumer ends up being in custom designs or overclocking-modified voltages such as underclocking.
Cheers
 
Last edited:
Remember the official rating with TDP from AMD and going from a 380 to 580 gave 29% clock gains as I laid out in earlier post.
The TDP-voltage goes out the window with any of the custom designs, hence why I keep mentioning the context and importance of the Nano/MI25 and enterprise/professional world where this does not happen and the discussion is around the spec rating of the MI25 and we can also include MI8 (Nano) in that discussion.

If Vega ends up with close to 300W official reference TDP rating at 1300-1350MHz, custom designs will need to accomodate higher than that and if at all possible, yeah I appreciate this side of the debate is pure speculation but just looking at trends and AMD in recent past mentioning core speeds relative to Nvidia are worst due to the wide design used by AMD (came out in an interview over last 6 months).

So are you suggesting then that 1500MHz would be custom models only and not the official rating spec and therefore not applicable to MI25?
Because you are using 480 clocks that are not part of the standard reference TDP-clocks given by AMD (boost was 1266Mhz with 161W TDP), custom models were hitting 190W-200W TDP/TBP and higher voltages (important in context of real world or specs and MI8 and MI25 where much of the discussion is based upon for performance).
Cheers
A number of things.
Clock speed is dependent on logic design.
Clock speed is dependent on layout - a given design can be electrically implemented to minimize interference and allow higher clock speeds (may lead to lower density, more processing steps. ) Different libraries are an aspect of this.
Clock speed is dependent on process. As far as I'm aware, we don't even necessarily know where Vega will be fabbed, much less the particulars of the process.
 
Back
Top