AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

I tend to go with what Ryan at PCPer with regards to Vega said that in theory best will be 8 polygons/clock if wrapped by the driver and the x2.7 is with API/coding.
Earlier in his article before mentioning those figures he says:
And we know Wasson and Raja were doing interviews and providing further information after the preview, along with Mike Mantor.
Cheers
And the 8 polygons/clock come from where, when the slide says 11? I'm not aware that anybody from AMD mentioned 8 prims/clock. Every techsite also just came up their own ideas (if they didn't show just the slides). AMD actually provided only very little information what is really going on. And I'm pretty sure quite a few didn't get the provided information really straight, even after talking with guys like Mike Mantor.
As said before, when Mantor talks about that “with the right knowledge you can discard game based primitives at an incredible rate", he isn't talking about the "traditional" geometry pipeline. He talks about the software culling approach with a primitive shader, which should be able to be even faster, given the size of the shader array (it should actually be able to be in the same order of magnitude as the discard speed for fragments/pixels [potentially several tens per clock as peak]). Therefore, these statements don't give a hint how fast Vega will be with existing software (i.e. without culling in a primitive shader).
Even this shows a huge improvement for Polaris though.
No it doesn't. Just consider the ~20-25% higher clockspeed of the RX480 when comparing it with the R9-290 or FuryX and it doesn't show an effect at all in this test.
AMD always show best case numbers, and they could be talking about tessellated smaller than pixel level traingles in this case too, they didn't specify that all they stated they were comparing to Fiji, no specifics on software/app etc. They actually gave more information on Polaris in their white paper.
This whitepaper was cited here already. It really makes only a case for a pretty specific scenario, it didn't claim to increase the geometry throughput in a more fundamanetal way as the Vega slides are doing.
Sounds like shaders are definitely involved in the process to me. How much and how far is still up to debate.
They are only involved if someone uses the new possibilities with the primitive shader. Otherwise, it shouldn't play a role. The mentioned limit of 11 prims/clock should not apply for a shader based solution (at least not as peak, but if you have a huge amount of unique geometry with some vertex attributes, you run into memory bandwidth constraints ;)).
I found some cases where Polaris has been shown to to achieve performance equal or above Fiji in some titles, Forza Horizon 3, Fallout 4 comes to mind.
Equal performance isn't 2x the performance (especially considering the clock speed advanatge of the RX480). :LOL:
 
And the 8 polygons/clock come from where, when the slide says 11? I'm not aware that anybody from AMD mentioned 8 prims/clock. Every techsite also just came up their own ideas (if they didn't show just the slides). AMD actually provided only very little information what is really going on. And I'm pretty sure quite a few didn't get the provided information really straight, even after talking with guys like Mike Mantor.
As said before, when Mantor talks about that “with the right knowledge you can discard game based primitives at an incredible rate", he isn't talking about the "traditional" geometry pipeline. He talks about the software culling approach with a primitive shader, which should be able to be even faster, given the size of the shader array (it should actually be able to be in the same order of magnitude as the discard speed for fragments/pixels [potentially several tens per clock as peak]). Therefore, these statements don't give a hint how fast Vega will be with existing software (i.e. without culling in a primitive shader).
No it doesn't. Just consider the ~20-25% higher clockspeed of the RX480 when comparing it with the R9-290 or FuryX and it doesn't show an effect at all in this test.
This whitepaper was cited here already. It really makes only a case for a pretty specific scenario, it didn't claim to increase the geometry throughput in a more fundamanetal way as the Vega slides are doing.
They are only involved if someone uses the new possibilities with the primitive shader. Otherwise, it shouldn't play a role. The mentioned limit of 11 prims/clock should not apply for a shader based solution (at least not as peak, but if you have a huge amount of unique geometry with some vertex attributes, you run into memory bandwidth constraints ;)).
Equal performance isn't 2x the performance (especially considering the clock speed advanatge of the RX480). :LOL:
You are deciding to say Ryan at PCPer is being deliberately misleading and exaggerating regarding his section on the polygons/clock and the Primitive Shader.
I did the quote to point out Ryan was told directly; have you seen any other article using the wording as part of being informed in the preview "with the right knowledge"?
Anyway other articles mention up to x2.75 (giving 11polygons/clock) and also some mention requiring coding/API, so the 2x figure theory when wrapped fits and cannot see how Ryan would make that up.
Earlier he was quoted specifically regarding the Primitive Shader and wrapped in the driver performance, this was by Razor in response to my post where I mentioned reading about it having requirements in a few places.
I know of journalists that had direct conversations with Raja/Mike/Scott pertaining to CES/Preview.
 
Last edited:
You are deciding to say Ryan at PCPer is being deliberately misleading and exaggerating then, and never spoke to AMD s
BTW we know it is not a traditional geometry pipeline, not sure why that is in the discussion now.
One can already cull triangles in shaders before it hits the primitive assembly "to achieve incredible rate", and Gipsel has been making this point since the very beginning. Frostbite even had a presentation that was all about culling in GDC 2016. This does not clash with the fact that AMD claims their four geometry engines can handle up to 11 primitives per clock, which is hardware fixed-function

Nothing is wrong with the words of PCPer or TechReport — neither of them claimed what is being inferred.
 
You can cull triangles in shaders before it hits the Primitive Assembly "to achieve incredible rate", and Gipsel has been making this point since the very beginning. Frostbite even had a presentation that was all about culling in GDC 2016. This does not clash with the fact that AMD is claiming up to 11 primitives per clock for four geometry engines, which is hardware fixed-function.

Nothing is wrong with the words of PCPer or TechReport — neither of them claimed what is being inferred.
I know, but this was a continuation of what Ryan at PCPer said earlier and was quoted on, specifically Razor's response to one of my posts.
I did not expect those to be forgotten already, anyway I edited my post to provide more clarity.
To quote again the aspect on the Primitive Shader by Ryan @ PCPer.
This primitive shader type could be implemented by developers by simply wrapping current vertex shader code that would speed up throughput (to that 2x rate) through recognition of the Vega 10 driver packages. Another way this could be utilized is with extensions to current APIs (Vulkan seems like an obvious choice) and the hope is that this kind of shader will be adopted and implemented officially by upcoming API revisions including the next DirectX. AMD views the primitive shader as the natural progression of the geometry engine and the end of standard vertex and geometry shaders. In the end, that will be the complication with this new feature (as well as others)

The 1st part I quoted just recently was to show that he was informed directly "with the right knowledge".
 
Last edited:
I know, but this was a continuation of what Ryan at PCPer said earlier and was quoted on, specifically Razor's response to one of my posts.
I did not expect those to be forgotten already, anyway I edited my post to provide more clarity.

CSI PC said:
I did the quote to point out Ryan was told directly
That section contradicted with what AMD claims at the very beginning. Up to 2x versus over 2x. Moreover, "with the right knowledge” in that context was clearly referring to the use of primitive shader, not Ryan being told "with the right knowledge”.

Edit: The "directness", on the other hand, is something you inferred. "told it to me" is a very generic context. I could very well sit in the venue's corner, and in this context AMD was still "telling me stuff".

This new shader combines the functions of vertex and geometry shader and, as AMD told it to me, “with the right knowledge” you can discard game based primitives at an incredible rate.
Punctuation matters.
 
Last edited:
That section contradicted with what AMD claims at the very beginning. Up to 2x versus over 2x. Moreover, "with the right knowledge” in that context was clearly referring to the use of primitive shader, not Ryan being told "with the right knowledge”.


Punctuation matters.
Just on the point of the Primitive Shader, it is that controlling performance up to x2.75 with regards of the 11polygons/clock but importantly various sources mention it needs API-coding/SDK to achieve.
That is how you get over 2x in slides, but up to 2x is when wrapped by driver (if going by Ryan's article with my POV), but performance gains is all theory anyway and will be interesting real results from the wrapped solution with a game.

And fine even better because if you accept the part where he says "with the right knowledge" as part of the primitive shader; you also need to consider "as AMD told it to me" that goes with the sentence "with the right knowledge" his point has validation and he is the only one to mention capability of being wrapped and 2x.
 
Last edited:
lets put an end to this, its obvious only with primitive shaders AMD can attain a high level of triangle culling at the levels they are talking about, period. There is no question about it everyone, AMD has stated this by two different AMD employees to two or more websites, so why is anyone questioning that? Ask AMD why they are saying what they are saying. We don't know how many geometry engines Vega has, nor do we know anything else about it, but looking at Doom's performance, is any underwhelmed? I sure am, Doom Vulkan should be a game that AMD cards against equal nV cards should have 15% more performance? So going by that......
 
These do not prove architectural improvements since we have higher chip clock and more memory in play.
So you are saying Polaris has no architectural advantages over Fiji?

Equal performance isn't 2x the performance (especially considering the clock speed advanatge of the RX480). :LOL:
Who said Polaris has 2X the general performance of Fiji? We are discussing Geometry performance here, Polaris is delivering these numbers because of the mentioned Geometry enhancements.

They are only involved if someone uses the new possibilities with the primitive shader. Otherwise, it shouldn't play a role. The mentioned limit of 11 prims/clock should not apply for a shader based solution (at least not as peak, but if you have a huge amount of unique geometry with some vertex attributes, you run into memory bandwidth constraints ;)).
That remains to be one theory though, one that is backed up by logic, nothing else, which doesn't make it any more plausible than the other theory (which is also logical and is backed by AMD's own hints). Until we have a solid concrete info, I find both theories valid at this point.
 
Some other info.
Regarding one article that had clarification from Scott Wasson:
Some may not know this (more so 2nd half of the 1st sentence) so thought it worth posting.
AMD’s Wasson clarified further that the primitive shader’s functionality includes a lot of what the DirectX vertex, hull, domain, and geometry shader stages can do but is more flexible about the context it carries and the order in which work is completed.
The front-end also benefits from an improved workgroup distributor, responsible for load balancing across programmable hardware. AMD said this comes from its collaboration with efficiency-sensitive console developers, and that effort is now going to benefit PC gamers, as well.

Cheers
 
... We don't know how many geometry engines Vega has, nor do we know anything else about it, but looking at Doom's performance, is any underwhelmed? I sure am, Doom Vulkan should be a game that AMD cards against equal nV cards should have 15% more performance? So going by that......

It isn't to me as there is more to GPU than performance in one or even few games. Besides, Vulkan will not improve AMD's performance as much in 4K Ultra as it would in lower resolutions.
One more point to make is that AMD themselves said in one of interviews that they are not putting any effort in optimizing OpenGL driver for Doom as they have Vulkanl. You can then switch the tables and say that Vulkan for AMD = OpenGL for nVidia.

We simply need more data points and can't assume that what AMD is showcasing is representative of final performance. We all know how 'easy' it is for one manufacturer of GPU's to steal launch thunder by just updating BIOS before release and tweaking it slightly one way or another. It kind of would be suicidal to show final performance few months before release, no matter if good or bad. For all we know, this could be full VEGA chip or severely castrated one.

Same really goes for Zen or any other pre-production sample.
 
It isn't to me as there is more to GPU than performance in one or even few games. Besides, Vulkan will not improve AMD's performance as much in 4K Ultra as it would in lower resolutions.
One more point to make is that AMD themselves said in one of interviews that they are not putting any effort in optimizing OpenGL driver for Doom as they have Vulkanl. You can then switch the tables and say that Vulkan for AMD = OpenGL for nVidia.

We simply need more data points and can't assume that what AMD is showcasing is representative of final performance. We all know how 'easy' it is for one manufacturer of GPU's to steal launch thunder by just updating BIOS before release and tweaking it slightly one way or another. It kind of would be suicidal to show final performance few months before release, no matter if good or bad. For all we know, this could be full VEGA chip or severely castrated one.

Same really goes for Zen or any other pre-production sample.


Well I agree with more data points, definitely need that, but at this point, I think its pretty safe to assume AMD is only going to show use the best they can. We saw that with Polaris, with performance and perf/watt with both P10 and P11 (6 months and 8 months prior to launch), saw that with Fiji, and all other prior launches, unless they are gracing us with a new view of their marketing lol, I have doubts.
 
So you are saying Polaris has no architectural advantages over Fiji?
Surely you know we were in the context of geometry performance. But why not go there, when it comes to numbers it only has advantage in bandwidth utilization, by any other measure it can be merely a die shrink.
 
The only thing we know is that peak performance is 2.75X up. I guess it is save to say that this will require primitive shaders. How much Vega will do in other scenarios is unclear.
 
Last edited:
Sorry for double-posting, but it just occured to me, that AMD actually did NOT say, Vega was designed with four geometry engines. Only that in Vega, four geometry engines could handle 11 polygons. Vega's physical implementation could have more than those 4, if they're trying and understate a characteristic for once. A little far-fetched, I know, but still possible.

The text of the slide says:
New Programmable Geometry Pipeline
Over 2X peak throughput per clock


The respective footnote (full):
Geometry throughput slide: Data based on AMD Engineering design of Vega. Radeon R9 Fury X has 4 geometry engines and a peak of 4 polygons per clock. Vega is designed to handle up to 11 polygons per clock with 4 geometry engines. This represents an increase of 2.6x. VG-3
 
Last edited:
Sorry for double-posting, but it just occured to me, that AMD actually did NOT say, Vega was designed with four geometry engines. Only that in Vega, four geometry engines could handle 11 polygons. Vega's physical implementation could have more than those 4, if they're trying and understate a characteristic for once. A little far-fetched, I know, but still possible.

The text of the slide says:
New Programmable Geometry Pipeline
Over 2X peak throughput per clock


The respective footnote (full):
Geometry throughput slide: Data based on AMD Engineering design of Vega. Radeon R9 Fury X has 4 geometry engines and a peak of 4 polygons per clock. Vega is designed to handle up to 11 polygons per clock with 4 geometry engines. This represents an increase of 2.6x. VG-3
Yep which is why I am wondering if we will see that change with Vega 20 with regards to the number of geometry engines, and a way to increase the CUs/ALUs/etc or just for better load-balance as it will also have 1/2 FP64.
Maybe it will be a technical production milestone in Vega 20 towards Navi *shrug*.
Cheers
 
You are deciding to say Ryan at PCPer is being deliberately misleading and exaggerating regarding his section on the polygons/clock and the Primitive Shader.
Nobody said something about deliberately misleading anyone. But from my experience it is pretty common (in all fields, i.e. I'm talking now in general terms), that when someone tries to convey some information bits without explaining the whole context in detail (which one often tends to do, because it is clear for oneself), the audience will try to place the information in some context, but not necessarily the correct one. ;)
As said before, a fixed limit of 11 primitives/clock doesn't make much sense for a shader based solution, which (when fed "with the right knowledge" from a developer) could be discarding "primitives at an incredible rate", which would be even higher then the stated 11 primitives/clock. pTmdfx already linked the Frostbite presentation from the last GDC. There they state with a smiley, that they are sure, the devs present in the room can come up with code to discard triangles significantly faster with the shader array on the XB1 than the fixed function hardware can handle them (indicating that it is a low barrier). And Vega 10 has more than 5 times as many shader resources as the XB1. As I said: a shader based solution can shoot for discardings tens of triangles per clock as peak, not just 11. And the concrete number for a specific implementation of course depends on the specific implementation, so giving such a concrete number, doesn't make much sense in the first place.
And reading up a bit how they are doing this in the Frostbite engine, they basically use a compute shader which processes and culls the geometry. The processed geometry data is then fed into some passthrough shader in the graphics pipeline. What Mike Mantor mentioned was, that the primitive shader gets now more flexibility what data it can access and process, much like a compute shader (i.e. it is not limited to the functionality of the traditional shader stages). And it can be bound to the graphics pipeline as a replacement for the passthrough shader stages just reading the data created by the compute shader from memory, which potentially increases the efficiency of the whole process. That would be a proper context of the stuff Mike Mantor said.

That remains to be one theory though, one that is backed up by logic, nothing else, which doesn't make it any more plausible than the other theory (which is also logical and is backed by AMD's own hints). Until we have a solid concrete info, I find both theories valid at this point.
Actually, in my opinion the second one doesn't make sense, as it is not backed up by logic and neither by AMD's slides. It is pretty clear, that shader based culling can potentially handle more than 11 triangles per clock as peak number. And strictly speaking, giving a concrete number for shader based culling is somewhat pointless anyway, as it is dependent on details of the implementation (the shader and in extension the whole engine) and not the hardware.
Edit:
On the other hand, the information on the slides most likely pertain to the fixed function pipeline (they don't make much sense otherwise). And there AMD stated explicitly 11 triangles per clock with 4 geometry engines (see CarstenS' quote above). A shader based solution would be basically independent of the number of geometry engines. ;)
 
Last edited:
Back
Top