AMD Vega Hardware Reviews

silent_guy · Aug 15, 2017

The fact that some memory BW tests considerably improved between the FE and the RX version makes me wonder if we're looking at some infinity fabric issue.

The whitepaper states that everything is connected through it, including L2 caches and HBM memory controllers. I assume that included the high BW data link and not just configuration bus.

That surprises me a little bit if L2 and MC are usually joined at the hip.

A very high BW fabric sounds like the thing that needs a lot of tuning, with many corner cases where things can go wrong, but impossible to fix with an ECO.

yuri · Aug 15, 2017

Malo said:
imo the decision to stay with the same 4 compute engine and rop ratio as Fiji has killed Vega and makes any new GCN doa as well.

There were 4 SEs + 64 ROPs on Hawaii already. Sure compute is on rise and ROPs got improved but still...

CarstenS · Aug 15, 2017

ToTTenTranz said:
You'll have to compare the R9 Nano to the Vega Nano, not the high-end/high-consumption cards based on Vega 10.

The discussion revolved around whether or not Vega was beyond it's optimal power curve. A means of that was comparison with Fury X. But when people demand to compare Vega using the power profile, which explicitly lessens this gap to the ideal point, either the comparison to a card that still is beyond becomes invalid or the whole point becomes moot.

For default power profiles, I agree with you to compare to Fury X - or R9 Nano to Vega Nano, for that matter.

Rasterizer said:
Thanks to you as well. Do either of you have any ideas for why Vega 64 still shows a significant regression in effective texture bandwidth vs. Fiji? Or to what extent primitive shaders, primitive culling and DSBR are actually working in drivers at the moment (which may be a closely related question)?

I think I have expressed a suspicion already. Maybe it is due to the very short duration of that specific test and Vegas progression through the DPM states. I have not yet found a way to increase the duration of this particular micro benchmark.

sir doris · Aug 15, 2017

Malo said:
https://www.overclock3d.net/news/gpu_displays/amd_s_rx_64_launch_pricing_was_only_for_early_sales/1

WTF. So the $499 price for Vega 64 was only a launch discount for under an hour and now the actual price is $599? That's disgusting.

That is poor form, I was very tempted by the £450 Vega 64's, but in the end decided not to 'cause I want to continue (very small time) mining for a little while. I really can't see them selling for the £550 - £600 being asked for the standard air cooled Vega 64's now. Unsurprisingly there are still some available for those prices. Maybe AMD is expecting a decent increase in performance soon...

Anarchist4000 · Aug 15, 2017

Malo said:
https://www.overclock3d.net/news/gpu_displays/amd_s_rx_64_launch_pricing_was_only_for_early_sales/1

WTF. So the $499 price for Vega 64 was only a launch discount for under an hour and now the actual price is $599? That's disgusting.

Supply and demand. Might not be popular, but if selling everything it's economics. Just need to hope AMD is making a lot of cards. Can't fault them for making more money if production is at capacity.

Moloch · Aug 15, 2017

mrcorbo said:
What are you talking about? You can (and I do) use DXVA decoding in LAV and output that via madVR just fine. Have you tried software decoding HEVC or VP9 with high-bitrates? Not so great. Also, streaming video. I can watch even 8K (non-HDR) videos on YouTube on my 1060 without a single stutter. Try that with CPU decoding.

I knew that, I hadn't had my coffee before commenting

Given the (low) bitrates youtube uses I'm not sure you make a case for 8k steaming however, I can do that on my 980 Ti which obviously doesn't support 8K HEVC.

Geeforcer · Aug 15, 2017

Anarchist4000 said:
Supply and demand. Might not be popular, but if selling everything it's economics. Just need to hope AMD is making a lot of cards. Can't fault them for making more money if production is at capacity.

That would depend: is the price set by the vendors selling above the MSRP or is the actual IVH pricing strategy: lunch_price = MSRP - if( today()< lunch_date + 10 or units_sold < 1,000, 100, 0). The former is market driven, the later is premeditated.

3dilettante · Aug 15, 2017

silent_guy said:
The fact that some memory BW tests considerably improved between the FE and the RX version makes me wonder if we're looking at some infinity fabric issue.

The whitepaper states that everything is connected through it, including L2 caches and HBM memory controllers. I assume that included the high BW data link and not just configuration bus.

That surprises me a little bit if L2 and MC are usually joined at the hip.

A very high BW fabric sounds like the thing that needs a lot of tuning, with many corner cases where things can go wrong, but impossible to fix with an ECO.

The fabric was described as being implemented as a mesh in Vega, and having it sit just outside the L2 and between the slices and channels and other hardware was where I thought it could be placed with the least disruption.
The whitepaper seems to indicate the traditional L1/L2 crossbar between the CUs was maintained.

Some of the patch notes were uncertain about the number of memory channels being listed for the architecture but the L2 was listed as having 16 slices and it's 16 channels of HBM2. If Vega's mesh has the L2 slices on one side and the channels on the other, it seems like a relatively straight trip most of the time.

The fact that the fabric is its own clock domain versus the GPU component and the HBM domain may be complicating factors. It would seem from Ryzen's example that the fabric and memory controllers are something that can improve in a matter of months with firmware updates.

seahawk · Aug 15, 2017

Seeing the real performance of VEGA seems like waiting for Godot

Deleted member 2197 · Aug 15, 2017

Geeforcer said:
That would depend: is the price set by the vendors selling above the MSRP or is the actual IVH pricing strategy: lunch_price = MSRP - if( today()< lunch_date + 10 or units_sold < 1,000, 100, 0). The former is market driven, the later is premeditated.

If vendors selling above MSRP price then there would be no uniform increase. If IHV strategy then prices would uniformly increase by a certain amount ($100).

Esrever · Aug 15, 2017

DrYesterday said:
Ryzen seems to do fine on the GF 14nm process. The 1800X, in a Prime95 torture test, only draws 112W @ 3.8GHz. Is it possible that the GF process is good for CPUs & poor for GPUs? Or did AMD just do a poor job with Vega?

Ryzen couldn't push the power envelope even if it wanted. 4ghz is about the max clock for the process no matter how much voltage you put, which limits what AMD can do to push performance by sacrificing power consumption.

Anarchist4000 · Aug 15, 2017

Geeforcer said:
That would depend: is the price set by the vendors selling above the MSRP or is the actual IVH pricing strategy: lunch_price = MSRP - if( today()< lunch_date + 10 or units_sold < 1,000, 100, 0). The former is market driven, the later is premeditated.

Gibbo said IHV pricing with initial rebates for the launch prices was higher and he argued for lower prices. So it's AMD making more from miners in this case and he suggested prices may increase in the future. Leaving consumers to the bundles. May also be a change to warranty terms because of mining, but speculating there.

He also limited to 3pc for the first 10m and backed down to 1pc because they were moving so fast. Posted in a forum thread so you can go find it if interested. Don't have a link handy.

Anarchist4000 · Aug 15, 2017

Esrever said:
Ryzen couldn't push the power envelope even if it wanted. 4ghz is about the max clock for the process no matter how much voltage you put, which limits what AMD can do to push performance by sacrificing power consumption.

Not necessarily, Threadripper can overclock higher, which seems odd, but that's where the higher binned chips went. Best likely going to Epyc. Still only hitting low 4GHz, but a lot for 16 cores.

Kaotik · Aug 15, 2017

Esrever said:
Ryzen couldn't push the power envelope even if it wanted. 4ghz is about the max clock for the process no matter how much voltage you put, which limits what AMD can do to push performance by sacrificing power consumption.

Now now, with LN2 you can push a Threadripper to 5.5 GHz stable enough for some benching (ye ye, I know, LN2 changes things, but regardless it's possible)

Anarchist4000 · Aug 15, 2017

3dilettante said:
The fact that the fabric is its own clock domain versus the GPU component and the HBM domain may be complicating factors. It would seem from Ryzen's example that the fabric and memory controllers are something that can improve in a matter of months with firmware updates.

With a mesh it's probably ok so long as the mesh is always clocked higher. Links would generally be idle with no contention. Still a weird design as I thought they said it was connecting GPU clusters. This is more like a GPU with some other components attached versus an MCM style chip like Ryzen.

Rootax · Aug 15, 2017

Have we some knowledge about the efficiency of Infinity Fabric links ? Is this a good technology, or is it hell to manufacture chips with this tech in it, how much power does it need for very high bandwidth need like Vega, etc ? Can Infinity Fabric tech be what's wrong with Vega ?

mrcorbo · Aug 15, 2017

Moloch said:
I knew that, I hadn't had my coffee before commenting
Given the (low) bitrates youtube uses I'm not sure you make a case for 8k steaming however, I can do that on my 980 Ti which obviously doesn't support 8K HEVC.

LOL, I can relate to that.

I have found some clips on YouTube that my system (6core/12 thread @ 4Ghz) couldn't play back smoothly in a browser that didn't support hardware acceleration, but played just fine on one that could. I'll see if I can dig one up when I get home. 4K+HDR videos can make my system struggle as well since VP9 profile 2 isn't accelerated on the 1060.

Even when playback performance is OK, it's still nice to not have to spend that many CPU cycles on something that a dedicated decoder can do with a couple of watts and without breaking a sweat.

What's interesting is that while AMD spent zero engineering resources on raising their video decode support to be on par with Nvidia and Intel, they did spend some on enabling encode/decode virtualization support, something that is super important in the consumer space.

With Radeon Virtualized Encoding, “Vega” 10 GPUs can provide hardware-encoding acceleration for up to 16 simultaneous user sessions. This capability should make “Vega” 10 especially well-suited to hosting sessions in multi-user virtualized applications with graphically intensive workloads, such as enterprise remote workstations and cloud gaming.

Anarchist4000 · Aug 15, 2017

Rootax said:
Have we some knowledge about the efficiency of Infinity Fabric links ? Is this a good technology, or is it hell to manufacture chips with this tech in it, how much power does it need for very high bandwidth need like Vega, etc ? Can Infinity Fabric tech be what's wrong with Vega ?

It could be more efficient, but it's simple SERDES like PCIE and most interconnects. Also why it doubles for PCIE lanes on Epyc. Only difference is protocol and I doubt Infinity is the issue here. Issue is more likely compiler and drivers being immature with voltage pushed higher than intended. They shipped what they had as mining and compute were working relatively well. I'd even question if the chip was designed with high density libraries for lower clockspeeds. Then add silicon in higher margin markets from there. As a large APU it would likely be rather efficient at nominal voltages.

3dilettante · Aug 15, 2017

Anarchist4000 said:
With a mesh it's probably ok so long as the mesh is always clocked higher. Links would generally be idle with no contention. Still a weird design as I thought they said it was connecting GPU clusters. This is more like a GPU with some other components attached versus an MCM style chip like Ryzen.

AMD did not say much beyond indicating Vega had the fabric implemented as a mesh, and that it matched the full bandwidth of the memory controller.
Indicating that its bandwidth is on the order of the HBM interface gives a general ceiling and floor to its bandwidth.
I had doubts it would integrate between the CUs and L2 because it would have been a massive drop in bandwidth.
At 16 slices and say 1.7 GHz, the mesh would have strangled the CU array with bandwidth 3.4x too low.

Looking at the strip of hardware that appears to match AMD's diagrams for the IF section, it runs across one side of the GPU between the ROPs and HBCC. There appears to be one rectangle per channel, which may make that one of the broader sides of the mesh, with the L2 segments being on the other. The remainder could hang off the narrow end.
Creating enough blocks to handle 64 CUs would have plugged multiple times that area right in the middle of the GPU.

Rootax said:
Have we some knowledge about the efficiency of Infinity Fabric links ? Is this a good technology, or is it hell to manufacture chips with this tech in it, how much power does it need for very high bandwidth need like Vega, etc ? Can Infinity Fabric tech be what's wrong with Vega ?

Infinity Fabric uses a superset of Hypertransport, which has 8-12 bytes of command overhead per 64 byte transfer. The fabric likely has more complexity in its individual switch points, given its variable topology composed of point to point links. The independent clocking likely injections some latency (shouldn't matter that much in this instance).
I would assume this is all measurably higher-overhead than the straight and generally dumb direct link from past GPUs, although in a forward-looking sense it can provide more options than said dumb link.

The overhead aside, it seems like the fabric is robust enough, and its protocol and implementation should be well-understood given its mature roots. At least in theory, Vega 10 shouldn't be really stretching many of its capabilities.

swaaye · Aug 15, 2017

HardOCP noticed some large performance drops with MSAA and SSAA that significantly affected the card's positioning in results. Deus Ex, DIRT 4 and Tomb Raider were tested. Any other reviews look at MSAA performance? Seems a little curious for a card with gobs of memory bandwidth. So R600.

https://www.hardocp.com/article/2017/08/14/amd_radeon_rx_vega_64_video_card_review/17

Based on our testing there is indication that MSAA is detrimental to AMD Radeon RX Vega 64 performance in a big way. In three separate games, enabling MSAA drastically reduced performance on AMD Radeon RX Vega 64 and the GTX 1080 was faster with MSAA enabled. In Deus EX: Mankind Divided we enabled 2X MSAA at 1440p with the highest in-game settings. The GeForce GTX 1080 was faster with 2X MSAA enabled. However, without MSAA, the AMD Radeon RX Vega 64 was faster. It seems MSAA drastically affected performance on AMD Radeon RX Vega 64.

In Rise of the Tomb Raider we enabled 2X SSAA at 1440p. Once again, we see AMD Radeon RX Vega 64 drop in performance. GeForce GTX 1080 was faster with 2X SSAA compared to Radeon RX Vega 64 with SSAA. Finally, in Dirt 4, which is playable at 8X MSAA, was faster on GTX 1080.

Hardware Canucks ran DXMD with 2x MSAA but in D3D 12 mode instead of D3D 11. This has Vega slightly ahead. Performance with the game overall is higher with D3D 11 however so not sure what to make of that.
http://www.hardwarecanucks.com/foru...-rx-vega-64-vega-56-performance-review-6.html

AMD Vega Hardware Reviews

silent_guy

yuri

CarstenS

Moderator

sir doris

Anarchist4000

Moloch

God of Wicked Games

Geeforcer

Harmlessly Evil

3dilettante

seahawk

Deleted member 2197

Guest

Esrever

Anarchist4000

Anarchist4000

Kaotik

Drunk Member

Anarchist4000

Rootax

mrcorbo

Foo Fighter

Anarchist4000

3dilettante

swaaye

Entirely Suboptimal

Similar threads