AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

If you use Radeon GPU Profiler with a Vega GPU, you can see clearly that driver generates primitive shader code for domain shader program and geometry shader program instead of traditional shader code.
Can somebody confirm this? I thought primitive shaders are not working.
 
Vega has internally less different hardware states for the execution of said shader types but AMD advertised explicit programmability for higher throughput and several new features while Rys from AMD even stated implicit acceleration which both seem to never happen for Vega/GFX9.
As it is now Vega internally just works differently without apparent performance improvements.
 
The "automatic primitives" statement was made by this forum's owner in an informal tweet, quickly followed by a "no promises" statement.
Rys isn't in the business of making gratuitous "bold statements". If he wrote that is because that was the plan at the given time of the tweet. Plans change, especially with all the executive juggling that happened at RTG during the past year.

To take some burden off of his shoulders, I think the company line on automatic use of the features came earlier in the year.
Raja Koduri's AMA in May of 2017 indicated developers wouldn't need to take on extra work to use NGG and primitive shaders.




I only have Vega64, so rx470's picture is an example from AMD.

The GFX9 Linux driver changes do document a merging of the same LS>HS and ES>GS shaders as the shader types in the profiler shots. One possibility is that those merged stages have been placed under the primitive and surface shader labels by default or perhaps that's what they've been called. If the announcements about the cancellation of the automatic path are true, then maybe the names have gone further than the tech.

I don't think there's been an example given from AMD about what a primitive shader meeting the functional description would look like. Patents and similar culling via compute methods have conceptual analogs, although what it would look like in an implementation at that end of the pipeline is unclear. For reasons not elaborated, analysis of position data dependences and tool-based extraction were supposedly straightforward and hard to beat by manual coding, yet said smoothly working functions never saw release. The compute-based predecessors to the tech were an understood and pretty accessible technique, and yet whatever primitive shaders used to plug into the geometry front end were described as being obscure and intractable as assembly coding requiring intimate hardware knowledge of GFX9.

There are scant references in the Vega ISA about this, although some additions to the GCN ISA include an RBE coverage instruction specifically relevant to Vega 10. There are certain message-type instructions used to remove primitives from the buffers used by the geometry engines that are new and potentially fraught with arcane hazards related to a portion of the hardware unused to interference and possibly prone to race conditions. Maybe if those were found in generated code, it might be a primitive shader as described?

I've speculated as to some possible reasons why the new method was not deployed, but without details it's hard to say if that speculation has any weight. Some recent patent claims concerning split-frame rendering seemingly take back some of the programmability in culling given to primitive shaders. Possibly, the design or concept has taken a turn from what was put into Vega if other reasons like bugs or performance problems didn't scuttle it.
 
So apparently both Zen2 and Vega20 are on 7nm HPC. Going to be interesting to see Vega 20's clocks. if GF 14nm LLP was as bad as a lot of people think vega 20 could have some very respectable clocks.
 
Going to be interesting to see Vega 20's clocks. if GF 14nm LLP was as bad as a lot of people think vega 20 could have some very respectable clocks.
Clocks alone are scary, the 7nm has to bring down the power. The original 'Vega slides' presented Vega 10 with TDP of 225W. On the other hand, the same slides introduced 7nm Vega 20 with TDP projected to reach 300W.

On the positive side, the dual-Vega V340 is labelled as a 300W card.
 
Clocks alone are scary, the 7nm has to bring down the power. The original 'Vega slides' presented Vega 10 with TDP of 225W. On the other hand, the same slides introduced 7nm Vega 20 with TDP projected to reach 300W.

On the positive side, the dual-Vega V340 is labelled as a 300W card.
Projected TDP between 150W and 300W. 7nm's characteristics weren't fully understood back when that slide was made.
That said, from what I've been hearing, it's probably on the high side of that projection. TSMC's 7nm isn't hitting their promises.

I'm expecting 2.1GHz ish operation at 250W-300W.
My low end projection is 1.9GHz and high end is 2.5GHz. Depends on the characteristics of 7nm HPC (Confirmed by Ashraf Eassa that AMD is using the HPC variant), and the quality of the physical design.
A big part of NVIDIA's advantage comes from their excellent physical design, and I hope AMD starts investing more there.
 
if GF 14nm LLP was as bad as a lot of people think vega 20 could have some very respectable clocks.

The same process can do 4+ GHz on znver1 CPUs easily. As you can see the clock rates are not only defined by the process, but also by the design. That is why Vega10 can do ~1650 MHz, and Polaris only ~1350, on the same process ...
 
The same process can do 4+ GHz on znver1 CPUs easily. As you can see the clock rates are not only defined by the process, but also by the design. That is why Vega10 can do ~1650 MHz, and Polaris only ~1350, on the same process ...

1650mhz yes but not in an efficient way. And not air cooled with stock settings.
 
The same process can do 4+ GHz on znver1 CPUs easily. As you can see the clock rates are not only defined by the process, but also by the design. That is why Vega10 can do ~1650 MHz, and Polaris only ~1350, on the same process ...
Thax like I total didn't know...

But you then have a comparable competitors cpu (pipeline length execution resources, prf) on a "comparable" process doing 5ghz.

Also Zen and Vega have the exact same clocking behaviour which is different to all other cpu / gpu I have owned were the voltage curve goes from near linear to exponential like in 100mhz....

I can run big underclocks on Vega upto about 1650 on Avg but to get 1750 requires everything to 11.
 
Thax like I total didn't know...

But you then have a comparable competitors cpu (pipeline length execution resources, prf) on a "comparable" process doing 5ghz.

Also Zen and Vega have the exact same clocking behaviour which is different to all other cpu / gpu I have owned were the voltage curve goes from near linear to exponential like in 100mhz....

I can run big underclocks on Vega upto about 1650 on Avg but to get 1750 requires everything to 11.
That's just the nature of FinFETs. Pascal cards (if you manage to get past the voltage lock) also exhibit this behavior. Intel CPU's become impossible to cool before they hit that point, but I imagine it would still be there.

You're extremely efficient up to a point, then suddenly you need an explosive amount of voltage for very little gains.
 
The question is how much of an uplift can AMD get from a big Navi/Vega 7nm chip?

HOCP just did a series of articles comparing various NVIDIA and AMD GPUs across 15 games, the transition from FuryX with it's limited 4GB RAM and 28nm to Vega 64 with it's 8GB RAM and 16nm only resulted in a 30% uplift. Compare that to a 1080Ti to 980Ti and the uplift is 70%!

It gets worse as you go down in generations, the FuryX is only 20% faster than 390X. While a 980Ti is 46% faster than 780Ti.

Granted NVIDIA's uplift 1080Ti to 2080Ti slowed down, but that's because of 16nm to 12nm transition. We know nothing about their 7nm uplift. And If AMD can repeat a similar 30% uplift with Navi/Vega 7nm this puts them on par with 1080Ti. Leaving the 2080Ti and the 7nm NVIDIA flagship untouchable. AMD badly needs to increase their uplift this round.

Thoughts?

https://www.hardocp.com/article/2018/09/04/amd_gpu_generational_performance_part_1/1
https://www.hardocp.com/article/2018/08/07/nvidia_gpu_generational_performance_part_2/17
 
To use a quote from the Navi thread:
They hit their top architecture config back in 2013 with Hawaii. The arch simply couldn't scale well past that point.

AMD wasn't able to improve GCN's characteristics in past 5 years. Results can be seen in graphic workloads. (Compute is a different story but still, its SW side is ...)
 
To use a quote from the Navi thread:

AMD wasn't able to improve GCN's characteristics in past 5 years. Results can be seen in graphic workloads. (Compute is a different story but still, its SW side is ...)

We must be watching different results then. Could you for example explain how can Polaris perform at level of Hawaii?
 
Back
Top