AMD Vega Hardware Reviews

It would have been cool if Vega were an amazing product that forced NV to bring GP100 to us. Ryzen vs Intel style apocalyptic stuff. :) Maybe if it had been a bit faster than GP102.
 
Abysmal? At worst it lost just a tad and even then probably mostly due lower memory bandwidth (no, they didn't compensate for that), at best it beat Fury X silly, I'm not sure how that's "abysmal"

Losing by a tad which it did in games is an awful result.
Fiji + Polaris improvements + Vega improvements (excluding clock speed increase and FP16) = worse than Fiji???

Of course Vega FE may not have had good drivers which is why I'm asking about RX.

As for bandwidth, HBM2 is more efficient than HBM1 and should make up for the small loss in peak throughput. It was also AMD's gamble to cut the number of stacks in half and rely on Hynix and Samsung being able to double the memory speed.
 
You can't cull generically before tessellation. You don't have any triangles until after DS.
You could if you established an approximation of an object and quickly transformed that. Culling entire patches, strips, etc at a course level if they were nowhere close to in the scene. The application should do that, but better to assume otherwise as that would be a huge win.

It would have been cool if Vega were an amazing product that forced NV to bring GP100 to us. Ryzen vs Intel style apocalyptic stuff. :) Maybe if it had been a bit faster than GP102.
Always the x2 with Infinity! Would actually leave Vega in it's comfort zone. Besides, why even bother with all the PCIE lanes on Threadripper if you don't intend to fill them with GPUs? Or 2P Epyc with a stupid configuration.
 
Anyone know of anyone that tested RX Vega vs Fury X at same clock speed? I know Gamersnexus did the test with Vega FE (and those results were abysmal).

Computerbase did, 6% improvement, they also tested with HBCC on and off, no improvement when averaged over,

https://www.computerbase.de/2017-08/radeon-rx-vega-64-56-test/7/

The card looks to be bandwidth starved, most of the people are recomending to leave the core alone and just go for the memory overclocking. The MSAA performance is also bad which seems to be the reason why AMD is ahead in a game at one site while behind in another.

Some people are getting 19xx on core overclocks, not sure if a bug or not.

edit - Almost 2Ghz,

http://www.3dmark.com/fs/13372488
 
It might be worth systematically reviewing the behavior of using MSAA. Multi-sampling lends itself well to compression, which presumably would help mitigate its effect if HBM bandwidth were the issue.

The commentary was that various shader-based techniques were faster, did that include hybrid modes that modified sample count or patterns?
It might reflect a change in the burden it places on other parts of the GPU besides the memory bus, like the tiling method, L2, or perhaps how optimistic the primitive shader can be for culling.
 
Computerbase did, 6% improvement, they also tested with HBCC on and off, no improvement when averaged over,

https://www.computerbase.de/2017-08/radeon-rx-vega-64-56-test/7/

The card looks to be bandwidth starved, most of the people are recomending to leave the core alone and just go for the memory overclocking. The MSAA performance is also bad which seems to be the reason why AMD is ahead in a game at one site while behind in another.

Are we sure that its bandwidth starved?

Because GN Vega 56 review suggests otherwise. They get 12% increase from power limit increase. Then they overclock HBM by nearly 20%, which results in a pathetic 3.6% improvement. That's not memory bandwidth bound.
 
Some people are getting 19xx on core overclocks, not sure if a bug or not.

edit - Almost 2Ghz,
From FE overclocking those were a fluke as the card starts inserting NOPs and otherwise idling. So clocks go higher and performance decreases. Have to actually test something to confirm the overclock is productive. Pascal does something similar.

Out of curiosity, has anyone tried overclocking the PCIE bus? Just to see if it's tied to Infinity and holding back memory speeds. Sounds strange, but the lanes are somewhat interchangeable. The whitepaper did say "control over operating frequencies with a third clock domain, beyond the graphics core and memory domains."

It might reflect a change in the burden it places on other parts of the GPU besides the memory bus, like the tiling method, L2, or perhaps how optimistic the primitive shader can be for culling.
I'm guessing it messed up the tile size calculation. A setting that could be off even without AA adding samples and spilled to RAM.
 
Last edited:
Are we sure that its bandwidth starved?

Because GN Vega 56 review suggests otherwise. They get 12% increase from power limit increase. Then they overclock HBM by nearly 20%, which results in a pathetic 3.6% improvement. That's not memory bandwidth bound.
If the card is running at the limit of performance due to power related throttling, then increased bandwidth should make nearly no difference to performance.

If you want to quote data and have a serious discussion, it's best to link to the data.

Honestly, I think Vega is a waste of time, so I'm not going to go googling to find the data you've referenced or to look for other data that supports or contradicts it.
 
What I'm saying is even a vertex shader gets turned into a primitive shader by the driver, therefore primitive shaders are almost always used. Unless you find a way to skip it.
In Vega running the primitive shader is not required. You can still run a VS like with previous architectures.

Does Vega still have the primitive discard accelerator introduced in Polaris or would the primitive shaders (in theory) make that obsolete?
They coexist though a primitive shader can make the culling further down the pipeline unnecessary depending on if the algorithms match exactly.
 

I'm taking that to be AMD ultimately makes everything a primitive shader for the purpose of driver side optimizations. Even if only performing the standard pipeline functions. Therefore primitive shaders are always enabled. I don't recall Mantor mentioning primitive shaders specifically in that interview, only laying out an optimization (deferred attribute interpolation) that would only work with a primitive shader. So I'm deducing primitive shaders are enabled, only usable by AMD currently, and given the shape of everything, not optimal. I realize I'm hedging a lot there, but that's what's been presented that I've seen. Like I said above, it's not clear where the stages begin and end as they don't follow the standard pipeline structure. In the Linux driver they were creating giant monolithic shaders spanning many steps, so primitive and DSBR may very well be the same shader.
I am aware of Rys Tweet, but I also see the footnotes in the architecture whitepaper and Computerbase's assertion.

DSBR, according to the whitepaper, works with the Energy benchmark. So there's at least that.
I would be the last one to argue that in case of Vega FE's launch driver and with a very high probability also for currently available drivers for RX Vega.

welp the 56 is cheaper than i can find a 1070 for and is faster […]
To be fair, tell us where you could find a Vega 56 already. I want one too!

--
FWIW, I've spent the better part of monday to get Vega running at mostly constant, repeatable clock rates for the few B3D Suite tests that compare clock for clock to Fiji. Had to adjust for almost every single test via trial and error, with Wattman and GPUz reporting different clock speeds or with ETH-Hashrate repeatably jumping up by 2 MH/s when I undervolted the memory on Vega 56. I can only guess how people are able to run their entire benchmark suites at certain overclock speeds.
 
Last edited:
Buildzoid on Reddit is where I saw it, but a few other accounts as linked above. He took a FE straight to 1900MHz with no performance impact. Only have my phone atm so can't find it.

I am aware of Rys Tweet, but I also see the footnotes in the architecture whitepaper and Computerbase's assertion.
I realize that, but there seems to be a distinction between what optimizations are enabled. If the most basic primitive shader is a simple merger of the first two stages then primitive shaders are "enabled". Those shaders with the 17 primitives per clock, improved culling, deferred attributes, etc may not be. Hence we see conflicting results based on that definition. Being enabled for only one test is a rather weak example of enabled, but does meet the criteria. So one person gets a yes, another a no, but "no" in regards to increased primitive rate or other ability.
 
From FE overclocking those were a fluke as the card starts inserting NOPs and otherwise idling. So clocks go higher and performance decreases. Have to actually test something to confirm the overclock is productive. Pascal does something similar.
This is interesting. I have never heard of anything like this happening on NVIDIA cards (or any other cards for that matter), though I do remember reading that GDDR5 had a feature like that.
 
Are we sure that its bandwidth starved?
ComputerBase's testing on overclocking memory alone, core clock alone, or both seems to indicate that overclocking HBM2 alone produces significant gains across 17 games

5uBstbg.png
 
They use the 150% Power Target for OC, so you'd have to select the RX Vega 64 Max as baseline.
Based on that, two titles score +6 and +7% respectively (Titanfall 2 and Watch Dogs 2) the rest 1-5%.
Fair enough, but it is still gaining more from memory overclocking than core overclocking. Do you not think that Vega is bandwidth starved then?
 
Fair enough, but it is still gaining more from memory overclocking than core overclocking. Do you not think that Vega is bandwidth starved then?
Given the fun I had with using and trying to apply wattman settings and have the card actually behave accordingly, I would reserve my personal judgement if I may until that tool works as intended for Vega.

I would also say, it depends very much on the use case. For Ethereum mining I would consent that Vega indeed is very much limited by memory transfer rate and timings. ;)
 
Last edited:
Back
Top