AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Apart from the re-selling factor, how different is this from nvidia using GDDR5X which only Samsung produces?
There's only one provider of HBM2 for AMD cards so far and it's Hynix. There's not a lot of possible confusion from that area..
Unique to the 2.5D packaging is who eats the cost for logistics and faults in the components before they get to board partners.
Perhaps bundling memory in the old days introduced some additional logistics and perhaps some level of warranty for the rare bad module, which is good for at least a small amount of additional cost besides just providing a packaged GPU.
If a board partner has less than 100% success rate of attaching said items to a PCB, it's not AMD's problem.

The Fury package has a significantly more extensive logistics train due to how many separate vendors are responsible for each component, shipping between them and AMD, and the less than 100% success rate.
Each piece either reduces the fault rate (old process interposer) or places the onus on a partner (known-good stack), but the integration steps themselves are less than 100%. As the manufacturer driving early implementation, AMD may have eaten a decent amount of the logistics for coordinating and shipping between the widely separated manufacturing and integration points. Well-established things like putting a package together has very high yields, and back then the overall interposer assembly process only promised something like >95%, I think.
Going by the Hawaii vs Fiji estimate, AMD would be on the hook for $183 for each faulty module at a significantly higher rate than the failure for plain packaging. It would preferably be low in absolute terms, but regular packaging yields are so high that any blip at each step is going to be measurably worse.

One change with HBM2 was provisioning extra signals for redundancy, which seems consistent with integration losses being measurable enough.
The choice of going with 2 stacks, besides possibly making the interposer requirements lower and smaller, may also be a measure for reducing integration losses. Some phenomena like warping are influenced by interposer size and number of components being integrated.
The lead time for all of this on top of the fab+packaging route would presumably be longer as well, and the stakes are higher when it comes to binning and product mix since faults or mispredictions for any individual element get bound together in a single assembly.

The inflexibility of this seems to explain why AMD may be aiming to double-down on scalability with Navi, and deconstructing things further with chiplets and active interposers in its HPC proposal.
 
If it performs like a OC Fiji, then yeah , wtf happened ... I guess the sure thing is it will perform better in heavy tesselation situation (based on Polaris gain). But it has to perform a lot better in all situations, even clock for clock.
 
Point taken, but I don't think that calling a spade a spade equals name calling.

I don't care what you call it, if it's not graphics related don't post it. If you feel the need to waste your time and "call a spade a spade" on the internet, please feel free to use the PM feature of this forum. It's annoying to everyone else who has to waste their time scrolling past your crap.
 
If it performs like a OC Fiji, then yeah , wtf happened ... I guess the sure thing is it will perform better in heavy tesselation situation (based on Polaris gain). But it has to perform a lot better in all situations, even clock for clock.
Vega has two types of SC.Maybe the new one was not enabled in that driver.

* BinningMode enum
*/

typedef enum BinningMode {
BINNING_ALLOWED = 0x00000000,
FORCE_BINNING_ON = 0x00000001,
DISABLE_BINNING_USE_NEW_SC = 0x00000002,
DISABLE_BINNING_USE_LEGACY_SC = 0x00000003,
} BinningMode;
 
If it performs like a OC Fiji, then yeah , wtf happened ... I guess the sure thing is it will perform better in heavy tesselation situation (based on Polaris gain). But it has to perform a lot better in all situations, even clock for clock.
If the rumors are true, Vega clocks are going to be ~40% higher than Fury X. It's not a small increase. Also 4 GB -> 8 GB memory is a big deal.

Fury X is already pretty much tied with GTX 1070 in most games:
http://www.anandtech.com/bench/product/1720?vs=1731

There's obviously exceptions. However 4 GB -> 8 GB is going to solve some of them. Improved geometry pipelines from Polaris are going to solve some of them. Now add 40% faster clock rate. Result should already be roughly 40% faster than GTX 1070. And this is simply a big fat highly clocked Polaris with 8 GB of HBM2.

Fury X was highly bottlenecked by the geometry pipeline. This can be clearly seen by the difference in 1080p vs 4K scores. With the geometry pipeline improvements of Polaris and the further (announced) geometry pipeline improvements of Vega (2x throughput + better load balancing), the new GPU will be much better utilized. ROP caches in Vega are under L2 cache. This means less cache flushes = less stalls (very important for a big CPU with lots of CUs).

Even if the draw stream binning rasterizer would be a total failure and ended up being disabled, I am pretty sure we are going to see performance close to GTX 1080... assuming of course that AMD hits the rumored clock rate. That's the only thing I am unsure about. All the technical improvements are certainly the correct ones to solve the Fury X bottlenecks.
 
If the rumors are true, Vega clocks are going to be ~40% higher than Fury X. It's not a small increase. Also 4 GB -> 8 GB memory is a big deal.
I think the ~1.5GHz clocks are already past being just rumors. The TFLOPs numbers for the officially announced Mi25 cards together with the number of NCUs presented in the Linux drivers clearly point to clocks in that range.
 
I think the ~1.5GHz clocks are already past being just rumors. The TFLOPs numbers for the officially announced Mi25 cards together with the number of NCUs presented in the Linux drivers clearly point to clocks in that range.
Then there's nothing to worry about. A 45% higher clocked (1050 MHz -> 1526 MHz) Fury X with 8 GB of HBM2 would already be competitive with GTX 1080. Add all technical improvements that solve Fury X bottlenecks from Polaris and Vega...
 
Then there's nothing to worry about. A 45% higher clocked (1050 MHz -> 1526 MHz) Fury X with 8 GB of HBM2 would already be competitive with GTX 1080. Add all technical improvements that solve Fury X bottlenecks from Polaris and Vega...
The only thing to worry about is the question if AMD succeeds in hitting these targets (clock speed and architecture improvements). If they fail to deliver on both fronts it won't be pretty (let's say -20% of the projected clockspeed and the rasterizer improvements end up being disabled) for the die size just shy of 500mm² (I'm pretty sure getting only GP104 performance out of it isn't the target for AMD). If they fully succeed in both areas, GP102 will get a worthy competitor.
 
The only thing to worry about is the question if AMD succeeds in hitting these targets (clock speed and architecture improvements).

But AMD already did a soft-launch of the Radeon Instinct MI25 card with 12.5 TFLOPs FP32, and the MI25 name itself comes from the card doing 25 TFLOPs FP16.
You think AMD would launch these cards without reasonable confidence of reaching the 1.5GHz clocks?



Besides, AMD has been very clear about increasing clocks for Vega cards, compared to previous GCN architectures:

coiYfi7.jpg


If Polaris is already close to 1.4GHz in the RX580, I'd say 1.5GHz is close to the minimum of what to expect from Vega.
 
3dcenter.org has proposed upcoming Vega architectural changes. Source speculation links can be found at the article. Apologies for the Google translation below.
http://www.3dcenter.org/news/was-sind-die-bekannten-architektur-verbesserungen-von-amds-vega

Proposed Architecture improvements from GCN5 (Vega)

1. Draw Stream Binning Rasterizer ( Source ) .
2. The rasterizer will now also support Conservative Rasterization.
3. The instruction buffers are again increased.
4. The shader units (ALUs) should support a (significantly) higher clock rate.
5. The shader units will support FP16 calculations with dual performance (to FP32) ( source ) .
6. The shader units will support Int8 calculations with quad-performance (to FP32). ( Source ) .
7. The geometry engines will increase the theoretical throughput from a triangle to 2.75 triangles (read, perhaps, only in special cases), which means 11 triangles per clock, rather than 4 as in Fiji ( source ) .
8. The ROPs are now consumers of the Level2 cache ( source ) , which is why AMD may also support Rasterizer Ordered Views (this could increase the need for space).
9. Since the rasterizer now stores tiles in the Level2 cache and also the ROPs are consumers of the Level2 cache, the capacity of the Level2 cache will certainly increase - presumably from 2 MB to 4 MB.
AMD has done more things that have not been mentioned yet.
10. This also includes a (partial?) Tile-based renderer as with nVidias Maxwell and Pascal chips,

In general, the architecture enhancements of GCN5 are considered to be quite potent in our forum - and at least theoretically potent (and how much theory ultimately brings in theory can only be demonstrated by practice) . This is also based on the assumption that AMD at Fiji has apparently left enormous potential: The Fiji computing performance exceeds the previous Hawaiian chip by 45% (compared to Radeon R9 Fury X to Radeon R9 390X) , but the achieved 22% performance gain (in the 4K performance index ) is rather meager - and between Fiji and Hawaii there is also an architectural leap from GCN2 to GCN3, which does not really show on its performance results. If AMD "simple" would be able to solve with the GCN5 the apparent bottleneck in the graphics chip architecture with a higher number of shader units, then the Vega 10 announcement of 4096 Shader units would be no longer "little".

In this respect, the 15% gain in performance, which was last mentioned at this point, is even considered a "defensive estimate" by the graphics chip architecture (higher clock rates have not yet been included) , some expect more than just this + 15%. Certainly - to achieve nVidias GP102 chip, a higher performance gain purely by the graphics chip architecture would certainly be better, since even taking the clock rate gain of + 40-50% (will not be completely converted 1: 1 in performance, itself If Vega 10 scarcely bandwidth-limited out comes out) would be quite scarce. In order to reach the GP102 chip from Fiji, 75-80% more performance is necessary - a hard bread with a graphics chip with the same basic hardware data, but also the high clock rate advantage is not entirely impossible. AMD, however, will have to deliver both points - (for AMD) unusually high clock rates and an architecture advantage of 15% or (better) more.

The fact that there is skepticism in this respect and that the user part of the argument on "architectural improvements" do not give anything or understand only minimal performance-effective things, is understandable, however, and must be largely attributed to AMD itself. In the past, AMD has advertised more or less every new graphics chip as "extreme" architecture improvements, although the real performance gains on the same clock and the same number of shader units often had to be searched with a magnifying glass. Especially for GCN3 (Tonga & Fiji) as well as GCN4 (Polaris) had been promised in the run-up and due to the full-bodied announcements AMDs some performance gains through the Grafikchip architecture - and was largely disappointed. From this situation, it is quite difficult to come back with the argument of "architecture improvements", since some users (understandably) switch off immediately. According to current knowledge, GCN5 seems to be the (clear) biggest leap within the GCN architecture - a remarkable achievement gain purely through the improved graphics chip architecture is therefore not only expected, but also an obligation for AMD.
 
You think AMD would launch these cards without reasonable confidence of reaching the 1.5GHz clocks?
Confidence is one thing, actual products another. Can you buy an MI25 card@1,5GHz? I don't think so. I know what AMD announced. And it indeed looks promising and could be a really big step forward (as I said: Vega10 could be a strong competitor of GP102). But one always needs to be wary regarding such promises. AMD has to back it up with solids facts and products.
 
Last edited:
but that number comes straight from the 4096 cores providing 25TFLOPs.
Yeah I know but it seems unclear if this leads you to believe it would be a 1500mhz base or boost clock. I am also really curious what AMD has in mind for a product lineup with vega? Will they have a 1070 like part? an 1080? Titan Xp? I really hope AMD comes through with Vega, it seems they really worked hard on it.
 
Completely opposed to the "almighty prohibitive costs of HBM" idea that has been pulled ouf of the ass preached by some people here.
You're probably thinking about me. I appreciate the thought.

But I stand by it.

If the authors of that table can't even get an accurate cost of a mass produced PCB right ($15 ??? How about $5...), I don't see why we should trust much more confidential HBM numbers either.
 
Last edited:
Confidence is one thing, actual products another. Can you buy an MI25 card@1,5GHz? I don't think so. I know what AMD announced. And it indeed looks promising and could be a really big step forward (as I said: Vega10 could be a strong competitor of GP102). But one always needs to be wary regarding such promises. AMD has to back it up with solids facts and products.
Yeah reality does not necessarily reflect targeted specs and those figures usually are boost from AMD these days going back to Fiji.
If one looks at that Radeon Instinct slide with Vega they also show the MI8 (Nano) with the official TDP/TBP of 175W and 8.2 FP32 TFLOPs , the Fury X is 8.6 TFLOPs (50mhz higher boost clock at 1050MHz) and a 275W official TDP.
In reality the Nano has a pretty strict-controlled power management to remain pretty close to the 175W TDP and that means the clocks vary from 890 to 940MHz (rarely) depending upon the operation, considering the slide is more about professional /enterprise market this is what one can expect as they do not push OCing and going 30% beyond the rated TDP nor do they bother with undervolting.
The Fury X sustains its 1050MHz all day long.

For Vega, 1.5GHz clock is closer to 50% improvements, and tbh I find it rather difficult to think they can take the GCN architecture and manage to clock it as well as Nvidia does with to Maxwell and then to Pascal; AMD in the recent past has mentioned in interviews their wide design is part of the limitation for high clock rates, so still to be seen in real world what Vega will do from an actual core clock perspective with their changes.
With official clock speeds Nvidia does not hit 50% between generations, and the clock frequency leap from 380 to 580 was around 29.5% (going by Videocardz data).
TBH I think AMD would had done well increasing core clock speeds by 30% to 35% on Vega especially as it is mentioned as passively cooled for the MI25, which would give figure around 1300-1350MHz.
Worth noting they provide a distinction with the MI8 (Nano) as it does not include the passively cooled comment, which makes sense.

Not long to find out either way for the consumer Vega, but I think the targeted spec is a challenge for AMD with what they defined; already the HBM2 bandwidth is less due to being 1.6GHz effective clocks and recent leaked tests put it at the same as what Samsung launched their HBM2 and that is 1.4GHz effective clock, then there is the fact there is no 8-Hi HBM2 and so no 16GB option that was also originally targeted.
Cheers
 
TDP is far more nebulous than TFLOPS figures, AMD would have quite the egg on their faces if MI25 doesn't come close to the 25TFLOPS its name would suggest and the various configurations it's in, than their polaris power efficiency numbers.

Even if they are boost numbers that the card would rarely hit, they'd have to be stable enough that AMD are willing to have them on professional cards, so that desktop cards should easily hit it as well.

There's a rumor going around that AMD are launching it on 9th May and that the card will have 1.6Ghz+ boost frequency. That'd be a pretty quiet launch considering their 'make some noise' advertising and even at those boost figures they wouldn't go past 1080 custom cards when you take into account the confirmed time spy score of a Vega variant. I don't think there's going to be some magical driver development that'd increase its numbers compared to Fiji at the same clock, those slides were marketing and as nebulous in their performance numbers as TDP is.
 
Back
Top