AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

How else can Vega's current uncanny similarity to Fiji's performance-per-clock be called despite having 40%/4.2B transistors more, then?
As much as the term "Fiji fallback" feels wrong to people on the inside, it sure looks like every single architectural difference that would make Vega 10 work faster than Fiji clock-for-clock is simply not working in games.

Or maybe, Vega:s is first and foremost built to compete with GP100, not necessarily GP102/GP104. In other words, built for AI and datacenter and render farm workloads. That's where the big money is anyway.

Vega then need to fight GP100, GP102 and GP104 with one chip. That's a tall order.
 
Someone please explain why GCN and apparently NCU is limited to a maximum of 4 shader engines? What's the pro and cons with such an limited architecture?
 
sebbbi seems to think, that ROP-/L2-rework can save lots of cache flushes previously necessary [strike]specifically on deferred shading engines[/strike]: https://forum.beyond3d.com/posts/1987712/
Note that he posted this before receiving his Vega FE - so it's a guess, however educated it may be.
That's true and it goes the same with better DCC compression found in Polaris. But to see a benefit of this you'll need to pit Fury X vs Vega FE in a situation where Fury X will be bandwidth starved. In a game.
 
That's true and it goes the same with better DCC compression found in Polaris. But to see a benefit of this you'll need to pit Fury X vs Vega FE in a situation where Fury X will be bandwidth starved. In a game.
Do frequent cache flushes not also place a heavier tax on maintaining a high GPU occupancy? IOW - more register-state in flight.
 
That's true and it goes the same with better DCC compression found in Polaris. But to see a benefit of this you'll need to pit Fury X vs Vega FE in a situation where Fury X will be bandwidth starved. In a game.

Sniper Elite 4 at 4K seems to be such a case, where a 38% core clock increase in Vega results in 4% better performance at relatively low FPS numbers, and Fiji gets better/similar performance than Vega at the same clocks:

ZHF6KKv.png
 
That's true and it goes the same with better DCC compression found in Polaris. But to see a benefit of this you'll need to pit Fury X vs Vega FE in a situation where Fury X will be bandwidth starved. In a game.
L2 cache flushes hurt even when you are not bandwidth bound. In this particular example case, the whole GPU needs to wait until the L2 cache flush is done before it can start executing the next shader. I would assume that frequent RT->texture transitions hurt Vega less than Fiji.

DCC obviously also helps, since it allows skipping decompress operations (which stall the GPU for much longer times than cache flushes). Publicly available DCC documentation about GCN3/4/5 however is pretty non-existent. This is the only thing available http://gpuopen.com/dcc-overview/. I would like to see more detailed DCC document of AMD PC hardware in the future.
 
Someone please explain why GCN and apparently NCU is limited to a maximum of 4 shader engines? What's the pro and cons with such an limited architecture?
You don't need more than 4 shader engines. Best example is comparing Nvidia GP102 and GP104. If you look at Polygonoutput test of Beyond3d suite you see no difference between GP102 and GP104 when culling comes into play.
http://www.pcgameshardware.de/Titan...hmark-Tuning-Overclocking-vs-1080-1206879/#a5

http://techreport.com/review/31562/nvidia-geforce-gtx-1080-ti-graphics-card-reviewed/3

So limitations are not made bye Rasterizer. The Culling is the Issue. PcgamesHardware say in the article, that they want to check this behaviour, but they never wrote an answer about this!?

Also if you look clocked normalized Fiji don't look so bad there.
 
Last edited:
Someone please explain why GCN and apparently NCU is limited to a maximum of 4 shader engines? What's the pro and cons with such an limited architecture?

To be clear, I don't think we have any confirmation of a 4 shader engine limit on GCN/NCU except back in the Hawaii days with GCN 2 (1.1 in Anandtech terms).

It's just that every AMD GPU since then has happened to have no more than 4 shader engines.

Earlier this year, Anandtech commented/speculated on the potential shader engine limit (removal) with respect to Vega:

"As some of our more astute readers may recall, when AMD launched the GCN 1.1 they mentioned that at the time, GCN could only scale out to 4 of what AMD called their Shader Engines; the logical workflow partitions within the GPU that bundled together a geometry engine, a rasterizer, CUs, and a set of ROPs. And when the GCN 1.2 Fiji GPU was launched, while AMD didn’t bring up this point again, they still held to a 4 shader engine design, presumably due to the fact that GCN 1.2 did not remove this limitation.

But with Vega however, it looks like that limitation has finally gone away. AMD is teasing that Vega offers an improved load balancing mechanism, which pretty much directly hints that AMD can now efficiently distribute work over more than 4 engines. If so, this would represent a significant change in how the GCN architecture works under the hood, as work distribution is very much all about the “plumbing” of a GPU. Of the few details we do have here, AMD has told us that they are now capable of looking across draw calls and instances, to better split up work between the engines."

Vega%20Final%20Presentation-25.png

If I had to speculate as a naive layman, I would say that if there is some material amount of R&D necessary to remove that limitation, then maybe AMD is betting that they'll transition to MCM before they ever need to build a GPU with more than 4 shader engines. That is, we'll see something like a dual 64CU (or dual ~48CU, etc) card soon enough that it doesn't make sense to waste precious R&D dollars to remove that limitation in the mean time.
 
Fiji was reticle size limited, IIRC. In the sense, that they maxed out die size, so that the interposer could be made still using single instead of double exposure.
 
Primitive shader (what ever it is) is again something that's not in DX description and is presumably something that will have to be explicitly coded for somehow. Are there geometry throughput improvements outside of primitive shader that we know about?
The primitive shader refers to vs+gs being executed as one shader (tesselation shader stages also get merged with others, with tesselation there's one shader pre-tesselation and one post-tesselation). This cannot be disabled in the driver, it has to be active at all times. (Potentially with extensions exposing this you could do some things more efficiently.)
It can only make a difference if geometry and/or tesselation shaders are in use, however.
 
About the driver comment: it's normal and completely expected for there to be common code in a GPU driver that applies to some or all of the GPUs a driver supports, alongside the specifics for the GPU being driven. That's hopefully just a given. So it was just a guiding hand to not conflate any commonality with it running the driver for a different ASIC, and then reading things into that.

I'd have said that regardless of working for AMD or not, since the above is true for all GPU vendors.

I want to talk about Vega and RX here as much as everyone else since I'm a GPU enthusiast, but that's not in my wheelhouse (unless you're an NDA'd developer of course!), so I can't go into specifics.
 
I want to talk about Vega and RX here as much as everyone else since I'm a GPU enthusiast, but that's not in my wheelhouse (unless you're an NDA'd developer of course!), so I can't go into specifics.

No problem, send me the NDA´s papers... i sign them right now .. ( i joke..)
 
About the driver comment: it's normal and completely expected for there to be common code in a GPU driver that applies to some or all of the GPUs a driver supports, alongside the specifics for the GPU being driven. That's hopefully just a given. So it was just a guiding hand to not conflate any commonality with it running the driver for a different ASIC, and then reading things into that.

I'd have said that regardless of working for AMD or not, since the above is true for all GPU vendors.

I want to talk about Vega and RX here as much as everyone else since I'm a GPU enthusiast, but that's not in my wheelhouse (unless you're an NDA'd developer of course!), so I can't go into specifics.

Do you reckon you could convince the appropriate helmsman to drop by?
 
The primitive shader refers to vs+gs being executed as one shader (tesselation shader stages also get merged with others, with tesselation there's one shader pre-tesselation and one post-tesselation). This cannot be disabled in the driver, it has to be active at all times. (Potentially with extensions exposing this you could do some things more efficiently.)
It can only make a difference if geometry and/or tesselation shaders are in use, however.
Sure, but again this is not likely something that happens automatically. There's a "fast geometry shader" logic on NV part as well and that's something that has to coded for specifically (via NvAPI not standard D3D).

L2 cache flushes hurt even when you are not bandwidth bound. In this particular example case, the whole GPU needs to wait until the L2 cache flush is done before it can start executing the next shader. I would assume that frequent RT->texture transitions hurt Vega less than Fiji.

DCC obviously also helps, since it allows skipping decompress operations (which stall the GPU for much longer times than cache flushes). Publicly available DCC documentation about GCN3/4/5 however is pretty non-existent. This is the only thing available http://gpuopen.com/dcc-overview/. I would like to see more detailed DCC document of AMD PC hardware in the future.
Yes but how do you spot this in a FPS number? :smile: From a game of which you have no idea how many RT->texture transitions it's doing (or anything else for that matter). It's basically getting to a discussion about specifically targeted benchmarks to just to show that Fury X != Vega FE, which I think is ridiculous on so many levels.
 
Yes but how do you spot this in a FPS number? :smile: From a game of which you have no idea how many RT->texture transitions it's doing (or anything else for that matter). It's basically getting to a discussion about specifically targeted benchmarks to just to show that Fury X != Vega FE, which I think is ridiculous on so many levels.
Simple: FPS number is higher when there's less stalls and flushes :)

But I will wait until Vega RX launch reviews. I am sure AMD will spill more details about their architectural changes regarding to gaming then (ROP flushes = gaming).
 
This is what really bothered me about what you said, yes he can. He has no obligation to say anything and again most likely isn't supposed to reveal anything. Your post almost comes off as a demand, and thats not right.

And would you have been satisfied with that?

not how it was intended. If interpreted as comment to an equal rather than someone you can make demands of, its more a casual "come on man help us out here".

and yes.
 
Back
Top