AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

What if a concept is found to only allow for bad implementations?
Half of the concept's motivation was found to be physically impractical, for example.
The other half has been found wanting by an industry that's spent years trying, and other techniques have been implemented successfully and sometimes to better results.

If Edison had surrender in his 99 tried he had never invented the bulb and would have a society in darkness. our society is build under the failure and perseverance of other. Fail doesn't mean something is wrong, it just mean you have found a way where it doesnt work.

By this, and again, I'm not saying BZ was what we needed. I'm saying that sometimes you need to take a bet and tried. AMD did and lost.

That's just saying that if BD performed up to the level of a vanity SKU like a heavily binned and watercooled future core, things wouldn't be bad. This cannot be done without separating the performance from all the unacceptable things it takes to get it.
Why stop there? If BD launched with Zen's performance during an LN2 suicide run, AMD would have done better as well.

You dont need to take it to extremes...
 
If Edison had surrender in his 99 tried he had never invented the bulb and would have a society in darkness. our society is build under the failure and perseverance of other.
Probably not, since the motivation for the Edison bulb was a resistive lighting method that was more compact and usable in a home setting than existing arc lamps.
Lighting existed already, and if he had given up someone would have done something similar--since Edison is the one who patented the first commercially viable bulb, not the only one who was trying or the only one to succeed. If we accept that the incandescent bulb was never created (note, modern incandescents use materials that make Edison's implementation quite inferior), there are fluorescent, halogen, LED, and other methods that are actually replacing incandescents anyway.

I'm exaggerating for effect, but at some point the guy trying to scale the method of collecting fireflies in a jar to the level of mass-market home lighting may need to consider it as a dead-end at some point.

I'm saying that sometimes you need to take a bet and tried. AMD did and lost.
The need to take a chance on something is separate from whether the thing chosen was a good one.
Companies can bet on what turns out to be a mistake, or things that can never be competitive.

You dont need to take it to extremes...
It was done for effect, in order to make clear that the premise does not provide a clear reason to pick a performance point, or limits to how far it can be taken.
Commercially, I'm not sure my exaggerated scenario would be much worse.
 
AMD had way more of those hundreds of millions of dollars when BD was being designed. Even if we accept that BD was somehow much more constrained, then it seems having a top-flight executive and money leads to the conclusion of not doing BD.

lets not forget, at this time, AMD had a big problem: the way the whole entreprise was working was a complete mess.. you had tons of executive in US offices, tons of executives on Deutshchland, you had tons of R&D offices here and there, where nobody was know what is doing the other ones.
They was waste so much money and time, because the right hand was never know what the left hand is doing. And sadly even if a clear direction was set, everyone was taking their own direction ( thinking it is in phase with the "main direction " ).

Maybe it is where Lisa have got the biggest impact, before everything else, it was to put some order in this mess.
 
Last edited:
Maybe it is where Lisa have got the biggest impact, before everything else, it was to put some order in this mess.

Her hiring as a VP in 2012 was related to improving execution. It's not clear at what level she was involved with the bringing on of Keller or Koduri, as far as her influence on the R&D and engineering processes go. It wouldn't be until 2014 that the job title would change.
Tangentially, bringing on the current CFO has done quite a bit in finding ways to refinance AMD's debt and buy the engineers time, but that also prior to Oct 2014.
A truer test may need to wait until the long lead times have had time to pass, such as seeing how things work out with the RTG and how designs past Zen manage to improve, since Keller and several senior CPU and SOC designers/engineers were poached under her watch.

On the topic of Vega, I was curious about the peak numbers reported for triangle throughput for Vega. 11 triangles over 4 geometry engines seems like an odd number. Some kind of constraint on the binning process, or a limit to culling if that total has to include culled triangles?
 
Her hiring as a VP in 2012 was related to improving execution. It's not clear at what level she was involved with the bringing on of Keller or Koduri, as far as her influence on the R&D and engineering processes go. It wouldn't be until 2014 that the job title would change.
Tangentially, bringing on the current CFO has done quite a bit in finding ways to refinance AMD's debt and buy the engineers time, but that also prior to Oct 2014.
A truer test may need to wait until the long lead times have had time to pass, such as seeing how things work out with the RTG and how designs past Zen manage to improve, since Keller and several senior CPU and SOC designers/engineers were poached under her watch.

On the topic of Vega, I was curious about the peak numbers reported for triangle throughput for Vega. 11 triangles over 4 geometry engines seems like an odd number. Some kind of constraint on the binning process, or a limit to culling if that total has to include culled triangles?

its an average, not an fixed numbers, certainly due to the culled triangles now in hardware instead on software ( when it was used )
 
its an average, not an fixed numbers, certainly due to the culled triangles now in hardware instead on software ( when it was used )

The Vega footnote slide states: " Vega is designed to handle up to 11 polygons per clock with 4 geometry engines".
That doesn't strike me as being an average.
In addition to that, it was speculated that Vega would allow the GPU to exceed the design limit of 4 shader engines, per Anandtech.
http://www.anandtech.com/show/11002/the-amd-vega-gpu-architecture-teaser/2
However, AMD's slide doesn't seem to support that. That might also mean that other limitations such as CUs per shader engine or RBEs per engine are similarly the same. The purported unit counts for Vega 10 haven't gone beyond Fiji, so it could be relying the other tweaks and clocks to do more.

On a different line of inquiry:
Earlier in the thread, it was noted that the Primitive Shader seems to hop from vertex to geometry when serves as a new front-end replacement, seemingly skipping the stages for tessellation.
Perhaps if this is one of the items where AMD listened to console devs, they figured it would be used without it?
Otherwise, is there a way to call that unit piecemeal?

There is limited customization for using a paired compute and vertex processing step for the PS4 that allows compute to serve as a sieve for the vertex shader, which might be a possible inspiration for this more expanded shader type.
 
[offtopic]

Also the IBM core was the P6 and by failure I mean in terns of desing where IBM put a lot of money and then had to throw it away and get back like Intel with the P4.

Please don't use "P6" for Power 6, "P6" universally means the Pentium Pro architecture.

Power 6 was exactly the opposite than what you were claiming;

It was not "brainiac" "super-high-ipc" design. It was in-order very high frequency core, going even further than P4, targeting lower IPC than it's predecessors.

[/offtopic]
 
It was not "brainiac" "super-high-ipc" design. It was in-order very high frequency core, going even further than P4, targeting lower IPC than it's predecessors.
Has any other general purpose CPU core ever reached 5 GHz clock rate (stock clocks)? I don't remember any.
 
just like Nvidia adapted Intel's PixelSync and Microsoft made it a DX12.1 core feature (ROV)
Speaking of which, Intel already incorporated PixelSync into two games: GRID 2 and GRID AutoSport, under the name of Advanced Blending and Smoke Shadows, curiously I can't run these features with a Pascal GPU, I am guessing they are hidden behind a simple vendor lock.
 
Speaking of which, Intel already incorporated PixelSync into two games: GRID 2 and GRID AutoSport, under the name of Advanced Blending and Smoke Shadows, curiously I can't run these features with a Pascal GPU, I am guessing they are hidden behind a simple vendor lock.
I don't think GRID 2 and GRID AutoSport are actually DX12 titles? I mean it gets complicated about what's available where. ROV is not available in vanilla D3D11, only with D3D11.3 (but for this you still need Win 8.1). Meaning it's only available in vanilla D3D11 over some vendor specific hacks. And Intel hacks are not the same as NVidia hacks or AMD hacks. So no it's not just a simple vendor lock.
 
On the topic of Vega, I was curious about the peak numbers reported for triangle throughput for Vega. 11 triangles over 4 geometry engines seems like an odd number. Some kind of constraint on the binning process, or a limit to culling if that total has to include culled triangles?
Might just be relative to a base clock speed. In theory the slides suggest each CU can have different clocks with the NoC passing data between domains. That's 2.75 polygons per GE per clock. Not unreasonable for a higher clocked domain. Assuming a base 1.5GHz clock that would put the GEs(CUs for primitive shader?) ~4.1GHz. That should be well within the realm of a scalar unit more suited to triangle setup.

The 220W FX 9590 would turbo from its base 4.7 GHz clock to 5.0 GHz, but it was basically a factory overclocked cpu.

http://www.anandtech.com/show/8316/...l-the-fx9590-and-asrock-990fx-extreme9-review
That was on 32nm as well. Modern process and 5GHz would probably be more common.
 
I don't think GRID 2 and GRID AutoSport are actually DX12 titles? I mean it gets complicated about what's available where. ROV is not available in vanilla D3D11, only with D3D11.3 (but for this you still need Win 8.1). Meaning it's only available in vanilla D3D11 over some vendor specific hacks. And Intel hacks are not the same as NVidia hacks or AMD hacks. So no it's not just a simple vendor lock.
Full list of DX11 vendor hacks:
https://docs.google.com/spreadsheets/d/1J_HIRVlYK8iI4u6AJrCeb66L5W36UDkd9ExSCku9s_o/edit

Intel is the only vendor that supports Rasterizer Order Views in DirectX 11.0 (Win7) with their PixelSync backdoor. You need a separate code path for official ROV support in DX11.3 (Win8.1+) / DX12.1 (Win10).

Each backdoor requires separate code path for each IHV. OpenGL and Vulkan handle extensions much better, as common IHV extensions eventually become standardized ARB extensions. No need to write separate code for each IHV.
 
Was hoping for a way to run them on Windows 10 with an NV GPU, apparently that is not possible indeed, and NV didn't care enough to make it possible to run these features through driver hacks.
In DX9 era, backdoors were even more widespread. IHVs mostly supported each other's hacks (http://aras-p.info/texts/D3D9GPUHacks.html). But nowadays every IHV has their own incompatible backdoor API for DX11. That's a shame...
 
The Vega footnote slide states: " Vega is designed to handle up to 11 polygons per clock with 4 geometry engines".
That doesn't strike me as being an average.
In addition to that, it was speculated that Vega would allow the GPU to exceed the design limit of 4 shader engines, per Anandtech.
http://www.anandtech.com/show/11002/the-amd-vega-gpu-architecture-teaser/2

I read somewhere else (or maybe was an interview or both) that the 11 polygons is due to using the new primitive shader but must be coded to achieve that, in theory it was mentioned if just using a wrapper the improvement would be up to 2x rather than 2.75x.
But I would like to see that theory performance put to the test in a game, and I wonder how many devs will code to use it anyway even ignoring the APIs as they stand as Vega is pretty niched (nice solution maybe in a generation or 2 timeline), without the dev coding you are back to the 4 polygons/clock.

Regarding the 4 shader engine potential improvement, disappointed it is not implemented now but lets see what they do with Vega 20.
Cheers
 
I read somewhere else (or maybe was an interview or both) that the 11 polygons is due to using the new primitive shader but must be coded to achieve that, in theory it was mentioned if just using a wrapper the improvement would be up to 2x rather than 2.75x.
But I would like to see that theory performance put to the test in a game, and I wonder how many devs will code to use it anyway even ignoring the APIs as they stand as Vega is pretty niched (nice solution maybe in a generation or 2 timeline), without the dev coding you are back to the 4 polygons/clock.

Regarding the 4 shader engine potential improvement, disappointed it is not implemented now but lets see what they do with Vega 20.
Cheers


Yeah I read that here
https://www.pcper.com/reviews/Graph...w-Redesigned-Memory-Architecture/Primitive-Sh

The new programmable geometry pipeline on Vega will offer up to 2x the peak throughput per clock compared to previous generations by utilizing a new “primitive shader.” This new shader combines the functions of vertex and geometry shader and, as AMD told it to me, “with the right knowledge” you can discard game based primitives at an incredible rate


Its gotta be programmed for with primitive shaders to get that.

and here too

http://www.shacknews.com/article/98377/ces-2017-amd-says-vega-gpus-not-ready-yet

Vega also uses an improved geometry pipeline that has a throughput of 11 polygons per clock cycle, up from four in AMD’s previous generations of GPU. A new primitive shader stage in Vega’s geometry pipeline allows this 2.75x boost. AMD Vega’s Next-Generation Compute Unit brings 64 shaders which allow for 128 32-bit operations per clock cycle. The Vega’s new rasterizer enables a fetch-once, shade-once approach which culls pixels that are invisible to the final scene. AMD is also changing cache hierarchy with the Vega platform by making the render back-ends clients of the L2 cache which allows for shared point synchronization of pixel and texture memory access.

So if games aren't using primitive shaders, its culling is effectively the same as Polaris.
 
So if games aren't using primitive shaders, its culling is effectively the same as Polaris.
You simply read something into the statements of those, who (with quite limited knowledge) tried to interprete AMD's pretty sparse statements. They and we simply don't know, what difference the new stuff in Vega makes without using the added flexibility of primitive shaders.

================================

And by the way, the number of shader engines doesn't necesarily determine the cull rate (just think of nV's "polymorph engines"). Maybe one should consider slightly more fundamental changes than just bumping the number of rasterizers (and the draw stream binning aka tiling rasterizer ist a good hint in my opinion). And I'm also quite sure there never was a hard cap of 4 shader engines in AMD GPUs. Most probably it was more of a practical limit, as without major changes the scaling wouldn't allow an efficient use of resources for average cases. But in principle AMD's existing system could be extended to use more rasterizers and shader engines. Why should there be a hard cap at 4 engines? What would break down with 6 engines? Doesn't make any sense to me.
 
In DX9 era, backdoors were even more widespread. IHVs mostly supported each other's hacks (http://aras-p.info/texts/D3D9GPUHacks.html). But nowadays every IHV has their own incompatible backdoor API for DX11. That's a shame...
Ah, the simpler times... When GPUs were allowed to expose "custom formats" and strange render states. Nowadays this is done by basically replacing DX entry points/interfaces. Impossible to do without NV, AMD and Intel agreeing on the mechanics not to mention that MS probably wouldn't like that.
 
You simply read something into the statements of those, who (with quite limited knowledge) tried to interprete AMD's pretty sparse statements. They and we simply don't know, what difference the new stuff in Vega makes without using the added flexibility of primitive shaders.

================================

And by the way, the number of shader engines doesn't necesarily determine the cull rate (just think of nV's "polymorph engines"). Maybe one should consider slightly more fundamental changes than just bumping the number of rasterizers (and the draw stream binning aka tiling rasterizer ist a good hint in my opinion). And I'm also quite sure there never was a hard cap of 4 shader engines in AMD GPUs. Most probably it was more of a practical limit, as without major changes the scaling wouldn't allow an efficient use of resources for average cases. But in principle AMD's existing system could be extended to use more rasterizers and shader engines. Why should there be a hard cap at 4 engines? What would break down with 6 engines? Doesn't make any sense to me.


I can show you videos of AMD employees saying these things, so.......where do you think the articles came from lol.

nV has no such limit, its culling is limited by the rops its got. They have fixed function units to do all of this type of work.

What I think is happening is AMD is using its shader array through primitive shaders and when the pressure gets too high on the shader array for doing such work, triangle culling count is decreased thus why the hard limit of 11 tris per clock culling.

The calculations for what is culled have to be done in the shader array, AMD doesn't seem to have fixed function units to do this work. nV has had this type of stuff for what now 3 gens maybe longer (but it only really became apparent with Gameworks tessellation libraries)? We know Polaris and prior don't have it, and Vega, they are saying primitive shaders have to be used, sounds like they don't have it either. Too late to add something in when they saw nV's advantages, so at least get something going so it will work one way or the other. Look at certain games sponsered by AMD, they got developers to make sure the game didn't hit the polygon trough put limits on their card, Dice comes to mind, they programmed an entire early triangle culling in their engine for AMD cards, what is the difference here? Now AMD wants developers to use primitive shaders for this work?

AMD couldn't do the way nV is doing it because I think fundamentally will change the GCN architecture too much with the time they had to do those changes, it wouldn't be possible. Vega is a large chip too, so looking they might not have even had the transistor budget to do it and keep things sane for price......
 
Last edited:
Back
Top