AMD Vega Hardware Reviews

Sorry if it wasn't obvious, but the question was rethorical. It was my rebuttal to your assessment that there isn't much that's different between Vega and GP102. Vega has several of those things including FP64 at 1/16 vs 1/32 (still incipient performance but its there and occupies die space) and very large L2 cache (you mentioned HBM2 before).
GCN's FP64 rate it tunable, but per their description of the overall architecture it cannot go as low as Nvidia can.
1/16 is the disclosed floor of GCN. Since GCN runs it through common hardware, the area costs when cycling through existing hardware 8x slower don't point to a large loss. It's debating two tiers of the lowest effort either vendor can manage, and it's on AMD's architecture if this is still significant.
 
Clearly Vega has a bigger flops count than GP102, any applications that depend purely on flop count is going to favor Vega, no doubt about that, Cryptography is the same as well (for obvious reasons), no surprises there. However, in the cases that use mixed workloads it's not going to be the same, from the very review you quoted, Vega trails the Titan XP by a big margin in: 3D Max, AutoCAD, and the whole SPECviewperf/SPECapc lineup of tests. It's also worth to point out that Vega is blasted beyond it's optimal clocks/efficiency curve, in other words it's pushed beyond it's limits just to compete with GP104 in gaming, If Pascal is put under similar conditions it would pull further ahead.

The difference in flops count between both chips is very small at 11%, while we see several compute benchmarks where the gap is larger than that. Meaning that Vega is able to use its compute resources more efficiently than GP102.

Concerning the CAD and SPECapc numbers, that review is using Vega 64 drivers. Vega FE drivers do much better as you can see below:
http://www.tomshardware.com/reviews/amd-radeon-vega-frontier-edition-16gb,5128-6.html
 
Last edited:
Question: how does Vega compare to the Nvidia GeForce GTX 1080 Ti and the Nvidia Titan Xp?
DON WOLIGROSKI (AMD): It looks really nice.
link

Since when "really nice" is indicative of anything specific? Is Vega 64 priced the same as GTX1080Ti? No, so you could argue that, for its cost, Vega 64 compares "really nice" with GTX1080 Ti. I'm sorry but you really seem to be gasping at straws here.
 
So you'd have preferred multiple, more specialized dies?

Well, multiple, more specialized dies would have been the way to go, as nVIDIA has shown since Maxwell (which did not even have a compute part). But AMD probably does not have the resources to do that, so like we say in Portugal "if you dont have a dog, hunt with a cat". For me it would be out of question not trying to compete in compute though.
 
Vega has several of those things including FP64 at 1/16 vs 1/32 (still incipient performance but its there and occupies die space) and very large L2 cache (you mentioned HBM2 before).
We've already talked about the area impact of FP64 in Nvidia's architecture. It's very substantial. And thus it makes sense to remove it for its gaming GPUs.

The impact these have on Vega graphics performance is unknown but is certainly not zero. They occupy space on the die that could have been used for things that benefited graphics (e.g. more FP32 units).
That's besides the point. The question is not how much AMD could have added to VEGA. The question is why it performs so poorly on gaming workloads with the resources that it already has.

That doesn't make sense. If there would not be a performance difference between a graphics and compute oriented parts then it would be more profitable to have a single chip covering both markets (less R&D), as it was before Pascal arrived with GP100 and GP102.
That's only true if the reduction in R&D outweighs the reduction in production cost. In the case of Nvidia, that's definitely not the case. So for them, it makes total sense to have separate compute and gaming parts for a pure profit point of view, even if the compute parts have zero impact on gaming performance.

I don't see that. Vega 64 gets 46% more FP32 performance than Fury X. The performance increase is exactly in line with this.
Ok. Fair enough.
 
There are? Where? What did I miss?
You can't be serious if you're talking about this.
AMD included wafer shots as background in its more recent presentations, which the other Vega thread has various versions of.
In terms of ballpark estimation of the area consumption of its server features, even the marketing rendition doesn't seem too bad. The area colored in for the graphics/CU region proper seems accurate enough in terms of showing its dominant share. The HBM interface area is accurate enough. The area locked in by miscellaneous hardware and the PCIe interface appears to be consistent with other chips' PCIe blocks.

That leaves some of the blank area in two corners, a fraction of half of the left edge perimeter, and the strip of space between the GPU proper and the HBM blocks. The ISA document and other descriptions point to the areas dedicated to core compute/graphics functionality as being kept within the L2, which limits the impact of the HBCC and fabric on them.
 
AMD included wafer shots as background in its more recent presentations, which the other Vega thread has various versions of.
Where? Honest question because I just googled and googled and I can't find anything remotely usable except for that die shot with some purple and green rectangles shopped over it, which AFAICS doesn't tell anyone how big the units are beneath said rectangles.





So you'd have preferred multiple, more specialized dies?
nVidia did prefer, and they're the ones making the most money out of GPUs, ergo...

The existence of both GP100 and GP102 is proof that there's a clear segmentation between a compute and a graphics-oriented high-end solution.They have the exact same number of SMs (60 total), yet GP100 is ~30% larger and has ~30% more transistors, it has to assume lower clocks (1480MHz) and discard 4 SMs for redundancy even for the highest-powered mezzanine version at 300W TDP, whereas the GP104 Titan Xp has all the SMs activated at significantly higher clocks (>1700MHz average) in a 250W power envelope.

If a top-end gaming/rendering chip was equally efficient for compute, GP100 wouldn't need to exist. If a top-end compute chip was equally efficient for gaming/rendering, GP102 wouldn't need to exist (I'm pretty sure a GTX1080 Ti at $800 would still be very profitable with e.g. 3 HBM2 stacks like the 12GB version of GP100, and it'd get the same 540GB/s bandwidth as a Titan Xp).

V100 just takes this to a new level by including the tensor units that most probably won't see the light of day in gaming/rendering GPUs.


All this might change with multi-die GPUs that would completely change the economics of GPU design (let's just include everything so we don't have to produce more than one chip.), but for the moment the highly segmented (therefore more focused) offerings from nvidia are making them a lot of money.
 
Where? Honest question because I just googled and googled and I can't find anything remotely usable except for that die shot with some purple and green rectangles shopped over it, which AFAICS doesn't tell anyone how big the units are beneath said rectangles.

I'm speaking of the wafer shot in the Vega rumors thread.
https://forum.beyond3d.com/posts/1994314/

The marketing picture from AMD isn't that far off in terms of the overall outlines of the core/uncore/IO areas of the GPU.
The way AMD portrayed the sub-blocks isn't particularly great, but that goes to the question of what areas contribute to the server-oriented features. AMD's choices in coloring the marketing picture generally covers in areas that are heavily graphics-focused or have significant dual use. The remaining strips of hardware are incremental increases to the area, and elements like the HBCC and fabric sections would in a non-server part still have some stripped-down version of themselves present.

The server/enterprise feature blocks don't seem that bad of an additional cost, and the compactness of HBM helps counterbalance them. The size of the GDDR5/GDDR5X interfaces in other AMD or Nvidia GPUs shows there's area penalties they've managed to deal with that Vega avoids.
 
The question is why it performs so poorly on gaming workloads with the resources that it already has.

Then that's a question you should have put to all AMD chips since GCN family started (not even even going back to VLIW, since it is already quite distant architectural wise). They all suffer from the same symptoms, not only Vega.
 
The existence of both GP100 and GP102 is proof that there's a clear segmentation between a compute and a graphics-oriented high-end solution.
Yes.

If a top-end gaming/rendering chip was equally efficient for compute, GP100 wouldn't need to exist.
We're not talking perf/W efficiency here. We're talking architectural efficiency.

Do you have any reason at all to believe that the compute specific extra features of GP100 have a negative impact on its graphics performance?

Is there a negative to having larger register files? To larger cache? To having NVLINKs?

Because that's really what this is about: people claiming that VEGA's gaming performance is lackluster because it's focusing on compute.

If a top-end compute chip was equally efficient for gaming/rendering, GP102 wouldn't need to exist (I'm pretty sure a GTX1080 Ti at $800 would still be very profitable with e.g. 3 HBM2 stacks like the 12GB version of GP100, and it'd get the same 540GB/s bandwidth as a Titan Xp).
No, it's irrelevant whether or not that is still profitable.

The real question is: is it MORE profitable than having a gaming only version with a die size that's 150mm2 smaller and doesn't use HBM?

Given the high volumes of 1080 Ti, I think that answer is a resounding yes.

For Nvidia, it makes total sense from a pure financial point of view to have 2 different versions. They're not looking for profit, they're looking for maximal profit.
 
I didn't write a VEGA white paper that claims that new features will improvise graphics everywhere.

AMD did.

Did the white paper claim that? Really? Lets see....

The introduction to the White Paper states the following:
"The “Vega” architecture is intended to meet today’s needs by embracing several principles: flexible operation, support for large data sets, improved power eficiency, and extremely scalable performance. “Vega” introduces a host of innovative features in pursuit of this vision, which we’ll describe in the following pages. This new architecture promises to revolutionize the way GPUs are used in both established and emerging markets by offering developers new levels of control, flexibility, and scalability." Nowhere is mentioned in the introduction that Vega features are targeted at "brute force" performance.

Regarding HBCC it states that "The availability of a higher-capacity pool of hardware-managed storage can help enable game developers to create virtual worlds with higher detail, more realistic animations, and more complex lighting effects without having to worry about exceeding traditional GPU memory capacity limitations." This is not exactly about performance but raising capacity limitations. While it has necessarily an effect in performance, that ia not something that could be immediately seen since games available today are designed around current limitations.

Concerning the "Next Generation Geometry Engine", the White Paper does say "To meet the needs of both professional graphics and gaming applications, the geometry engines in “Vega” have been tuned for higher polygon throughput by adding new fast paths through the hardware and by avoiding unnecessary processing." Now, it seems we can see the impacts of these changes on CAD and other 3D workstation workloads (where it is competitive with GP102), not just in games. This is a big red flag for me towards the stance that something in Vega that is good for professional graphics/compute is not for gaming. Otherwise why would the gains on geomtery be seen in one case and not the other?

Then we have Rapid Packed Math, which as we all know can only be relevant if games start making extensive use of FP16. Again this is a feature whose impact on performance that cannot be seen on today's games. AMD recognises this on the paper when it says a)"The programmable compute units at the heart of “Vega” GPUs have been designed to address this changing landscape with the addition of a feature called Rapid Packed Math." and b) "For applications that can leverage this capability, Rapid Packed Math can provide a substantial improvement in compute throughput and energy efficiency."

There is also this: "In addition to Rapid Packed Math, the NCU introduces a variety of new 32-bit integer operations that can improve performance and efficiency in specific scenarios. These include a set of eight instructions to accelerate memory address generation and hashing functions (commonly used in cryptographic processing and cryptocurrency mining), as well as new ADD/SUB instructions designed to minimize register usage. This explains why Vega is so strong at cryptography. There are also additonal tidbits about improvements that are focused at video and image processing algorithms for AI.

After it goes into the DSBR technology which is arguably one of the features that could have a bigger impact on graphics after clock speed and for all we know its not working in current drivers. Still, the impact as stated by AMD themselfs is not huge: "In the case of “Vega” 10, we observed up to 10% higher frame rates". AMD also includes caveats like "Even larger performance improvements are possible when developers submit geometry in a way that maintains screen space locality or in cases where many large overlapping polygons need to be rendered."

Clock Speed is next and we all know the huge impact it obviously has. Still its part of the architecture and not just a "natural metric" as some make it seem: "One of the key goals for the “Vega” architecture was achieving higher operating clock speeds than any prior Radeon™ GPU. Put simply, this effort required the design teams to close on higher frequency targets. The simplicity of that statement belies the scope of the task, though. Meeting “Vega’s” substantially tighter timing targets required some level of design effort for virtually every portion of the chip.". The white paper goes into a lot of detail on this.

Next is power efficiency, specifically about reducing IDLE power consumption as well as in video transcoding. It barely mentiones power efficiency while performing graphics tasks.
- "Meanwhile, an improved “deep sleep” state allows “Vega” 10 to scale down its clock speeds dramatically at idle in order to achieve substantially lower power consumption."
- "In “Vega” 10, this fabric is clocked separately from the graphics core. As a result, the GPU can maintain high clock speeds in the Infinity Fabric domain in order to facilitate fast DMA data transfers in workloads that feature little to no graphics activity, such as video transcoding."

The white paper concludes with information about "Display and Multimedia Improvements" which is all about Free Sync, HDR, etc, with zero regard to graphics improvements.

So I ask you now, where did you see the White Paper claiming "that new features will improvise graphics everywhere"? Because AGAIN, I cannot see any of it, after not seeing your claims that VEGA has worse perf/Flops than AMD previous chips. The White Paper is actually quite cautious in what it states with plenty of context and caveats given. Like I said a few posts ago, it seems that you (and not only you) made up your mind about VEGA even before looking at the data available.
 
This typo sounds sadly appropriate for this whole Vega business at this point.
We're approaching mid-september, and still no 3rd party boards even announced dates for sale, much less available in stores, when in june it was claimed we'd have them in early september.

Dunno what the eff is up with that, because AMD is saying ziplock about it...
 
Since when "really nice" is indicative of anything specific? Is Vega 64 priced the same as GTX1080Ti? No, so you could argue that, for its cost, Vega 64 compares "really nice" with GTX1080 Ti. I'm sorry but you really seem to be gasping at straws here.

If you are still after objective analysis you ought to leave product pricing out.
 
Did the white paper claim that? Really? Lets see....
This is what I get from not first rereading the white paper before making claims about it. :) You make good points. In the white paper, AMD significantly backpedals on all their pre-release marketing hype.

So allow me to introduce AMD's January slides to the mix. (Ignore that it's WCCFTech, I'm using the slides only.)

They're talking about render back-end that are now clients of the L2 cache which improves applications that use deferred shading. It is my understanding there are plenty of games that do this. Do we have any indication that it actually helped? Are there deferred shading applications at all that are not games? (I honestly have no clue.)

They have a slide that says "NCU is optimized for higher clock speed and higher IPC". Higher clock speeds, yes, we can see that. Higher IPC? Since we agree that Vega and Fury perform about the same when corrected for clock: not so much. Would you agree that improved IPC in the shader is something that should benefit all kinds of workloads?

They claimed over 2x peak throughput per clock with their programmable geometry pipeline. We can now say that this was essentially an empty statement. They can point to one corner case where this happen and still say it's true. But was it unreasonable, based on their messaging to think that Vega would finally fix some of AMD's (alleged) long standing geometry issues in games? I don't think so.

AMD tried its best to paint a picture of improved Vega efficiency compared to previous generations, and not just for narrow applications.

The white paper was only published after the official Vega reveal. If you consider all the marketing actions of AMD in the year/months before release, do you really think it was unreasonable to think that they'd have improved their architecture efficiency? And that tons of people are non-plussed about Vega because it was all just BS marketing and future promises?

I now know that we simply have to assume that, after all these years, AMD still has no clue about how to make an architecture that's efficient at extracting performance out of its compute units. But let's not apologize for that by claiming that it's one or the other: Nvidia has shown very well that it doesn't have to be the case. Their GPUs perform excellent a games and at pure compute applications.

Now, it seems we can see the impacts of these changes on CAD and other 3D workstation workloads (where it is competitive with GP102), not just in games. This is a big red flag for me towards the stance that something in Vega that is good for professional graphics/compute is not for gaming. Otherwise why would the gains on geometry be seen in one case and not the other?
I may be suffering of silo vision, but how is this different? The behavior above can be simply explained by games having different bottlenecks that professional applications. It's well known that professional applications typically have a shitload of geometry and not a lot of texturing. Games are just the opposite. We already know that Vega is a per-clock regression when it comes to texturing and some ROP operations compared to Fury.
 
Last edited:
If you are still after objective analysis you ought to leave product pricing out.

I was not the one who said it looked "very nice". It was just an example of how it can mean anything or nothing. You cannot conclude anything at all from such a vague answer, admit that. At its limit it is like asking a woman "what's wrong?" and get "nothing" in return :D
 
This is what I get from not first rereading the white paper before making claims about it. :) You make good points. In the white paper, AMD significantly backpedals on all their pre-release marketing hype.

So allow me to introduce AMD's January slides to the mix. (Ignore that it's WCCFTech, I'm using the slides only.)

They're talking about render back-end that are now clients of the L2 cache which improves applications that use deferred shading. It is my understanding there are plenty of games that do this. Do we have any indication that it actually helped? Are there deferred shading applications at all that are not games? (I honestly have no clue.)

Could this be related/impacted by the disabled DBSR? Honest question, I have no idea, but DBSR is related to rendering so...

They have a slide that says "NCU is optimized for higher clock speed and higher IPC". Higher clock speeds, yes, we can see that. Higher IPC? Since we agree that Vega and Fury perform about the same when corrected for clock: not so much. Would you agree that improved IPC in the shader is something that should benefit all kinds of workloads?

Did you notice that the information about IPC came immediately coupled with FP16, RPM and INT8? I actually did and Anandtech as well:

"That said, I do think it’s important not to read too much into this on the last point, especially as AMD has drawn this slide. It’s fairly muddled whether “higher IPC” means a general increase in IPC, or if AMD is counting their packed math formats as the aforementioned IPC gain."

I was not expecting higher IPC in all situations. Why? Because if you look at the slide the difference is having two ops instead of one, which for me was a blatant clue that that was referring to RPM. The message was poorly conducted but the image was very revealing.

They claimed over 2x peak throughput per clock with their programmable geometry pipeline. We can now say that this was essentially an empty statement. They can point to one corner case where this happen and still say it's true. But was it unreasonable, based on their messaging to think that Vega would finally fix some of AMD's (alleged) long standing geometry issues in games? I don't think so.

Given that Vega is still pretty much GCN, plays by its rules and limitations (e.g. maximum of 4 ACEs) and still has 4096 "cores , yes your expectations were too high. Especially when the same number of cores is now responsible for more geometry stuff than before. Didn't you think that that would have a drawback? The same number of units of Fiji but now spread out to more workloads would magically find a way to do more with same, even if running at higher speeds? If you didn't think a drawback would exist, yes that's silo vision because you were thinking about the changing in geometry without thinking about the impacts elsewhere.

I'm starting to think that's the reason for our disagreement. I had my expectations way more in check. It's performance is pretty much where I expected it to be.

AMD tried its best to paint a picture of improved Vega efficiency compared to previous generations, and not just for narrow applications.

That's the job of marketeers. I learned a long while ago the art of smelling bullshit and critical thinking about what they present, not taking anything they say at face value. As you can see from what I wrote above my expectations were balanced by this attitude.

The white paper was only published after the official Vega reveal. If you consider all the marketing actions of AMD in the year/months before release, do you really think it was unreasonable to think that they'd have improved their architecture efficiency? And that tons of people are non-plussed about Vega because it was all just BS marketing and future promises?

Honestly, the White Paper is not that different from the slides, with the exception of not mentioning IPC anywhere. I guess they realised the stupidity of calling RPM an increase in IPC, which is really what they meant by it. Regarding reasonability of thinking, refer to my answer above about expectations.

I now know that we simply have to assume that, after all these years, AMD still has no clue about how to make an architecture that's efficient at extracting performance out of its compute units. But let's not apologize for that by claiming that it's one or the other: Nvidia has shown very well that it doesn't have to be the case. Their GPUs perform excellent a games and at pure compute applications.

Like I said above, it's still GCN. You don't just change how an architecture performs in every single situation overnight just with small tweaks and touches here and there. That's what Vega is, there was not a big overhaul, with the exception of RPM and INT8 which are more due to AI than gaming. Especially when the chip still has the same number of units. With the exception of Pascal consumer, NVIDIA always changed the SM layouts and unit ratios affecting load balancing from Fermi to Kepler and Kepler to Maxwell, looking for optimal performance. I had thought that by now, with 3 iterations of GCN, people would have realised that GCN does not offer the same kind of flexibility NVIDIA architectures have.

I may be suffering of silo vision, but how is this different? The behavior above can be simply explained by games having different bottlenecks that professional applications. It's well known that professional applications typically have a shitload of geometry and not a lot of texturing. Games are just the opposite. We already know that Vega is a per-clock regression when it comes to texturing and some ROP operations compared to Fury.

Refer to my point about the changes in geometry using more compute resources having a knock out effect on other functions that also use them.
 
Last edited:
Back
Top