AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

@Rys what are you measuring in the beyond3d Suite at the Polygoneoutput? The triangles handover to gpu, or the triangle which the gpu really is drawing?
 
By default, PCGH present the results for the 100% culled test (using strips, so a modern GPU's peak throughput ideally) and the 50% culled test (using lists). In both cases the geometry is always fully submitted to the GPU with no host-side culling.
 


Great thanks!

Latency and ALU look very good and improved vs Fiji.

Polygons have a massive increase from 2.2k -> 6k but still well shy of 1080 (which appears higher than Titan XP???) at 11k. @ 50% its about a 50% increase from 3.9k -> 5.9k which is above Titan XP @ 5.4k. 1050 vs 1050 Vega is 87% faster at 100% culled which is a good improvement over Fiji, but appears way behind Pascal, though it seems like clocks are directly related to it since the 1080 is higher than Titan XP.

Vega seems slower on Texture fill than Fiji though for some reason at the same clocks, going from 71k -> 89k on Fury X.

Memory bandwidth does seem to be a huge killer though as its much lower than Fury X was for random and only ties it for blacks. Way behind Pascal and likely limiting it heavily in gaming.

Can't wait to see RX results in a few weeks and see how much of this is going to change, or stay the same.
 
Thank you for the answer Rys. If you use the drop down menue at the left there are more results. Like list 0% culling. On the right drop down menue you will find more gpus.

What I missing is Strip with 0% culling. Or does it makes no sense?
 
So anyone is welcome to correct me here, but as a layman I see at least two major culprits:

1 - Texel fillrate (per TMU per clock) is pretty terrible compared to Polaris and even Fiji (maybe connected to new ROPs as L2 cache clients?)
2 - Effective bandwidth is actually lower than Fiji

Something strange going on with geometry performance too, as the promised 2.6x geometry performance boost due to the new primitive shader simply isn't there. It was supposed to be 11 triangles/clock when in fact we're seeing the same 4 triangles/clock as Fiji. At 1050MHz Vega is hitting close to 4000MTriangles/s, when the slides suggested it should reach up to 11000MTri/s at ~1GHz, putting it above the Pascal cards if it was at the default clocks.



Combine this with current Vega FE clocks barely going above Polaris 20's, when slides from January suggested a 2x clock increase, and it's pretty much the perfect storm of anticipated vs. current performance.
Yes, there is something iffy with the new pixel engine.
 
1 - Texel fillrate (per TMU per clock) is pretty terrible compared to Polaris and even Fiji (maybe connected to new ROPs as L2 cache clients?)
What if Vega doesn't have TMUs? With the 2xFP16(INT16?) and 4xINT8 they could be filtering with the shader cores. Then lower bandwidth and/or register pressure slowing things down. With everything seemingly programmable that makes sense. Could apply to ROPs as well. Still leaves the question of what's taking up all the space.

2 - Effective bandwidth is actually lower than Fiji
This is probably the real killer. How exactly is this measured? Might be a weird measuring error due to Infinity. Tying a cluster to a particular channel might not align well to the testing.
 
By default, PCGH present the results for the 100% culled test (using strips, so a modern GPU's peak throughput ideally) and the 50% culled test (using lists). In both cases the geometry is always fully submitted to the GPU with no host-side culling.
The option is there though to make visible the other tests - as is the case with almost all other sub-tests. I've only culled a few texturing results which i felt show redundant information.

Something strange going on with geometry performance too, as the promised 2.6x geometry performance boost due to the new primitive shader simply isn't there. It was supposed to be 11 triangles/clock when in fact we're seeing the same 4 triangles/clock as Fiji. At 1050MHz Vega is hitting close to 4000MTriangles/s, when the slides suggested it should reach up to 11000MTri/s at ~1GHz, putting it above the Pascal cards if it was at the default clocks.

AMDs wording was, IIRC, much more carefully chosen. Something along the lines of being able to work on 11 triangles concurrently, as per foot notes of the slide deck - i vividly remember the discussion here. edit: looked it up: "Vega is designed to handle up to 11 triangles per clock with 4 geometry engines" and one of the common assumptions here was that this was because the four geometry engines could share information, so that vertices could form an adjacent polygon strip comprised of up to 11 triangles. And the product specs for Vega FE also mention 4 triangles per clock.
 
Last edited:
Latency and ALU look very good and improved vs Fiji.
When looking at the FE clicked at 1050, the results are identical to Fury X. So while they have obviously improved clock speeds, the first order pipeline structure seems to be (unsurprisingly) unchanged.

The random texture unit results are strange. This is essentially just another BW test, isn't it?
 
Last edited:
The option is there though to make visible the other tests - as is the case with almost all other sub-tests. I've only culled a few texturing results which i felt show redundant information.

Is there also a polygontest with strip and 0% culling?
 
He says in the video, setting the power management manually disables the auto voltage of the GPU, causing it to always pull 1.2v.
So, during PCGH testing, the card does indeed throttle down heavily (to about 1269MHz under heavy load). They also found that 1.2V is used by default for 1600MHz stage. Thus I feel the discussion about other clocks is rather academic at this point. Fact of the matter is, Vega FE consumes large amount of power @1600MHz sustained clocks, that in order for it not to throttle under, it needs unlocking the Power Target, through which power consumption is increased even further.
http://www.pcgameshardware.de/Vega-...elease-AMD-Radeon-Frontier-Edition-1232684/2/
 
To the extent that texture rate and fill BW impact gaming performance, these results invalidate the "it's a pro GPU, not a gaming GPU" argument, and steer the conclusion towards "there's a HW performance issue with a number of units", don't they?

Which leaves the question whether or not it can be solved for the RX.
 
And the product specs for Vega FE also mention 4 triangles per clock.
Giving your numbers, we at least know Vega achieves some of it's goals in geometry processing, (it's 80% faster than Fiji clock for clock in the 100% culling strip), which defeats the Fiji drivers argument for good this time.
Which leaves the question whether or not it can be solved for the RX.
AMD didn't confirm the number of TMUs as of yet, this should be a straight forward information, supposedly they are the same number as Fiji, unless they are less, in which case this could explain the scores we are seeing.
 
Is there also a polygontest with strip and 0% culling?
Not that I know of. I have nothing omitted, except the mentioned redundant texture fillrate tests.
edit 17.07.2017: Turns out, hidden deep in the script files, where one script calls tests off of the other, there indeed are a couple more tests hidden. Now it remains to be seen how useful their results are.
The random texture unit results are strange. This is essentially just another BW test, isn't it?
Yes, compressible (one color) vs. basically non-compressible (rand.) textures.

Giving your numbers, we at least know Vega achieves some of it's goals in geometry processing, (it's 80% faster than Fiji clock for clock in the 100% culling strip), which defeats the Fiji drivers argument for good this time.
That's actually been done with Polaris already.
AMD didn't confirm the number of TMUs as of yet, this should be a straight forward information, supposedly they are the same number as Fiji, unless they are less, in which case this could explain the scores we are seeing.
No, they did not. I guess everyone assumed automatically, the "quad TMU per CU" did not change. When I first saw the results and then repeated re-runs showed the same numbers, I was pondering about maybe ineffectiveness of or issues with texture cache. Nvidia did unite that with L1 data cache... maybe there's an unsolved contention here.
 
Last edited:
Something along the lines of being able to work on 11 triangles concurrently
I'll need to double check, but I recall from drivers the scalars used 5 of 16 registers for addressing each wave. That could be where the 11 triangles come from. Broadcasting limitation to VALUs perhaps?

Did that test suite have any of the filtering tests from way back? Curious how well that aligns to the theoretical flops.
 
You lost me here.
I recall some tests from a while (long while) ago testing filtering on various texture formats. Depth, HDR, INT8 rates etc for point sampling, bi/trilinear, anisotropic, etc. Theory being the TMUs had better filtering capability than ALUs. Might confirm a lack of TMUs if the different rates align to the ALU ratios.
 
I think it might be pointless to discuss vega fe like it was a finished product and the current findings are representative.

my theory is AMD put out vega fe to do what it can currently do, for whatever reason. it was not ready for gaming. they didn't have whatever software supported most of the new gaming relevant hardware ready for launch. This can include the bios.
 
That's actually been done with Polaris already.
Don't use these results, it's an old Polaris driver!

AMD didn't confirm the number of TMUs as of yet, this should be a straight forward information, supposedly they are the same number as Fiji, unless they are less, in which case this could explain the scores we are seeing.
That's not true if the texture units are held back by memory BW. As can be seen by the fact that the numbers are the same for FE clocked at 1050 and at 1600.

I think it might be pointless to discuss vega fe like it was a finished product and the current findings are representative.
It's an expensive, finished, released-to-market product. Even if you don't want to make RX Vega conclusions, the results are still interesting on its own.

my theory is AMD put out vega fe to do what it can currently do, for whatever reason. it was not ready for gaming. they didn't have whatever software supported most of the new gaming relevant hardware ready for launch.
You'd have a point if only fill rate were an issue: you could blame the not enabled tiler for that. Maybe.
But that doesn't hold for the texture units. AFAIK, there are no tiler consequences there.
 
I recall some tests from a while (long while) ago testing filtering on various texture formats. Depth, HDR, INT8 rates etc for point sampling, bi/trilinear, anisotropic, etc. Theory being the TMUs had better filtering capability than ALUs. Might confirm a lack of TMUs if the different rates align to the ALU ratios.
Alas, that's a rather ancient OpenGL Test. Will see if I can run it tomorrow in the office. But IIRC the results have been... strange for a couple of other cards a few years back, so I stopped using it on a regular basis. I still don't see, however, how I can correlate certain filtering modes to ALUs. Except the results between filtering modes differ wildly from the one in Fiji/Polaris - which the ones tested with the modern B3D suite do not indicate.
 
Back
Top