AMD Navi Product Reviews and Previews: (5500, 5600 XT, 5700, 5700 XT)

Power limit up to +90%, and also higher voltages up to 1,25 V. Does your card actually hit those clocks under load? I found overvolting to 1,25 V to be necessary for sustained clocks above 2100 MHz. In some games lower voltage is fine, but those on the heavier side (Ashes of the Singularity is a prime example) demand 1,25 V for the GPU to remain stable.

Weird my card came w/ 1.238 volts as the max and with 1.2 as default out of the box. It does around 2100 when set to 2150 in wattman give or take 20mhz. More voltage doesn't help in any appreciable way, frequency graphs similarly around 2100mhz.

I tested with superposition, heaven and doom 2016 its stable so far at 2150mhz, 1.141v and 900 on the mem. I played some warframe too lol.
 
Weird my card came w/ 1.238 volts as the max and with 1.2 as default out of the box. It does around 2100 when set to 2150 in wattman give or take 20mhz. More voltage doesn't help in any appreciable way, frequency graphs similarly around 2100mhz.

I tested with superposition, heaven and doom 2016 its stable so far at 2150mhz, 1.141v and 900 on the mem. I played some warframe too lol.

That's a lot of variability production, even beyond the upcoming factory overclocked 5700 XT, for which I assume they already bin.
 
I've tried to replicate what some of the reviewers achieved on the RX 5700 XT with software PowerPlay tables, and the results were quite disappointing. The extra power limit helped boost the clocks from 1750-1850 MHz all the way into the 2100+ region, but that did only amount for 5%-ish gains at best in gaming performance. Does anybody else have access to the Navi cards to corroborate my results?

I'm inclined to blame the driver, considering that switching from Adrenalin 19.5.2 to 19.7.1/2 also had a negative impact on GCN performance, pretty significant in some games such as Final Fantasy XV.

This review, with ~2.2GHz core, also gets similar lack of scaling with core clock, looks like either the clocks are misleading or memory bandwidth is a huge bottleneck now.


If the latter, then AMD can have a decent refresh with higher clocks on more mature process with better GDDR6.
 
This review, with ~2.2GHz core, also gets similar lack of scaling with core clock, looks like either the clocks are misleading or memory bandwidth is a huge bottleneck now.


If the latter, then AMD can have a decent refresh with higher clocks on more mature process with better GDDR6.
The IHVs try to maximise profits but also have to produce cards that are attractive on the market and sell to have any profits at all. And in todays compressed market, very small differences make a large difference in perception - an 11% price reduction, or a 10% performance increase has a huge impact on the perception of a product transmitted by media. So it makes sense on many levels that the RX5700 and 5700XT come balanced for performance and cost. The upper card in particular pushes power draw vs. clocks, and it doesn't get the bandwidth to proportionally support it. So it makes perfect sense that if you push the core clocks even further, still without correspondingly increasing the memory system that feeds it, that bandwidth constraint will bite performance.
There is no wrong doing here really. It is just a natural consequence, and it will be more apparent for more bandwidth constrained cases.
 
The IHVs try to maximise profits but also have to produce cards that are attractive on the market and sell to have any profits at all. And in todays compressed market, very small differences make a large difference in perception - an 11% price reduction, or a 10% performance increase has a huge impact on the perception of a product transmitted by media. So it makes sense on many levels that the RX5700 and 5700XT come balanced for performance and cost. The upper card in particular pushes power draw vs. clocks, and it doesn't get the bandwidth to proportionally support it. So it makes perfect sense that if you push the core clocks even further, still without correspondingly increasing the memory system that feeds it, that bandwidth constraint will bite performance.
There is no wrong doing here really. It is just a natural consequence, and it will be more apparent for more bandwidth constrained cases.
Well, the RTX 2080 does just fine with the same RAM configuration as RX 5700 XT. If the memory bandwidth is indeed the limiting factor for Navi at higher clocks, it has something to do with the RDNA architecture itself. RDNA is inherently more vulnerable to stalls caused by the off-chip memory transactions compared to GCN. In GCN one of the benefits of the instruction rate of 4 cycles per wavefront is hiding RAM latency (as opposed to the native 1 wavefront per cycle mode in RDNA). AMD beefed up the whole cache architecture in Navi to account for that, but they still may be less efficient in that regard compared to Nvidia's Turing. Still, I wouldn't dismiss the possibility for a significant driver optimization in the future. In particular, the driver gets to choose wether shaders are compiled for 32-wide or 64-wide wavefronts, the latter being theoretically less demanding for memory bandwidth. I also observed pretty poor 99-percentile fame times in some games on the overclocked RX 5700 XT beyond 2100 MHz, which to me screams driver problems.
 
Well, the RTX 2080 does just fine with the same RAM configuration as RX 5700 XT. If the memory bandwidth is indeed the limiting factor for Navi at higher clocks, it has something to do with the RDNA architecture itself.
Before getting into a discussion, lets make sure that we are on the same page. There is no "limiting factor" brick wall when rendering a game. Various parts of the process are limited by different factors, making total performance a statistical phenomenon. Increasing ALU performance will always bring benefits, but they will diminish the further up you go as other limitations will increasingly dominate. The same goes for other such factors. If you would double the bandwidth, you wouldn't double the performance, because your change would only affect the bandwidth limited parts of the total process.
OK, if this is the case, (it is), it's start to look at the problem at hand. If we consider the RX5700XT, it has the same bandwidth available to it as the RX5700, but 18-20% higher nominal (<=problem area) ALU capabilities. It provides 13% or so higher performance depending on game and settings. So if you push the ALU capabilities up another 20% how much of an improvement would you typically expect? Well, the answer HAS to be "less than 13%", because the time portion of the problem that is spent on the bandwidth limited parts has to increase. Just how much less will again depend on the specific settings/game.
Two factors that makes a detailed analysis tricky is that actual frequencies and frequency differences will depend on game, cooler, and temperatures. They are no longer fixed. Also, drivers.
Those two factors come into play when trying to figure out if, and if so, to what degree Nvidia cards are different (your statement that the RTX2080 is "just fine"). The same formal reasoning still applies - the more you increase ALU capabilities, the more time proportionally will be spent elsewhere. Diminishing returns.
The RTX2060 Super and RTX2080 has the same bandwidth, and a difference in, again, nominal ALU capabilities of 47,5% higher (Founders edition). Same caveats regarding titles and settings apply. Average performance is approximately 25% using sites that use a large amount of benchmarks and average them.
Using the RTX2070 Super vs the RTX2080FE we see a difference in nominal ALU of 17% and a performance difference of 8%, again using averages of several dozens of tests.

Conclusion: I just don't see much of a difference here. Actually it suggests that the RTX2080 is more bandwidth limited than the RX5700XT which makes sense given its higher ALU capabilities, but the data is vague enough that drawing conclusions from a few % either way would be folly.

RDNA is inherently more vulnerable to stalls caused by the off-chip memory transactions compared to GCN. In GCN one of the benefits of the instruction rate of 4 cycles per wavefront is hiding RAM latency (as opposed to the native 1 wavefront per cycle mode in RDNA). AMD beefed up the whole cache architecture in Navi to account for that, but they still may be less efficient in that regard compared to Nvidia's Turing. Still, I wouldn't dismiss the possibility for a significant driver optimization in the future. In particular, the driver gets to choose wether shaders are compiled for 32-wide or 64-wide wavefronts, the latter being theoretically less demanding for memory bandwidth. I also observed pretty poor 99-percentile fame times in some games on the overclocked RX 5700 XT beyond 2100 MHz, which to me screams driver problems.
 
Last edited:
That's a really nice explanation of the difficulties presented to you when doing any graphics performance analysis, especially on PC. There are a couple of others things worth throwing into the mix as food for thought: you're not the only thing using the GPU on PC when you play a game on Windows, and the OS and other GPU users can and will do things that affect the running performance of your game; doing anything to alleviate a particular intra- (or inter-)frame limit will potentially (and often) cause all following limits to change in some way, because GPUs are largely dynamically executing machines with many classes of intra-block pressure that can't always be categorised or modelled as nicely as you think. "ALU" limits manifest in myriad ways, same with any of the other big buckets of GPU performance we like to talk about.
 
I finally pulled the trigger and ordered my Ryzen 7 3700X, 2x8GB DDR4-3600 B-Die and Asus X570 Prime Pro (yes, decided the Intel NIC + nice looks and good VRM are worth the extra money). Some of them are in short supply, so I might be upgrading my current rig in the order the parts arrive (except for the mobo obviously).
But then, I need a new video card, which is the main issue with my current rig (and can't send it to be checked by DeepCool before I get replacement, their AIO leaked on it which is 99.99% source of the issues). I've pretty much decided on the RX 5700 XT, hopping over to the red side for a chance after two NVIDIAs, but the wait for the custom models seems like forever now that the rest of the parts are ordered.
Should I just jump the gun and get the reference? Is the cooler really that bad? Getting 3rd party cooler for it would probably break my budget.
 
I'm considering the reference cooled 5700XT also. My last reference cooled card was a 5870. I didn't have much choice as my 4870x2 died and I needed something equivalent to replace it (5870 had just come out, no competition from Nvidia). It was ok and not as loud as the also reference 4870x2, but remember how much cooler and quieter my next open air cooled card was.

I just hope the prices of the AIB cooler models are reasonable over here. That's the big unknown for me. 2070 Super is $300 (50%) more which is rediculous, so I'm sure they will still be a better deal, but I'm hoping they don't jack up the prices too much.
 
I put the arctic cooling acellero 3 on mine, works great. Extremely quiet and cool. If you could find it for a reasonable price then its a good solution, but the AIB will still probably be less, overall, and some just as good, like the HIS IceQ line.
 
Should I just jump the gun and get the reference? Is the cooler really that bad? Getting 3rd party cooler for it would probably break my budget.

I would wait for the custom models. 5700XT seems to have some oc potential with a better cooler.
 
Conclusion: I just don't see much of a difference here. Actually it suggests that the RTX2080 is more bandwidth limited than the RX5700XT which makes sense given its higher ALU capabilities, but the data is vague enough that drawing conclusions from a few % either way would be folly.

This is a great topic for detailed investigation. On the Nvidia side I only have some preliminary calculations using benchmark results of my own averaged across 12 games. All in 1440p as otherwise the RTX 2060 vanilla would risk running into the memory size limitation. Judging by this data, as soon as we get into the TU104 territory a lot of the theoretical ALU capability starts getting wasted. Not taking actual GPU clocks in individual games into account obviously prevents from drawing a solid conclusion. Still, the data is enough to show the impact of the extra RAM bandwidth that the RTX 2060 Super and RTX 2070 have over the RTX 2060 vanilla, and put the TU104 cards' memory bandwidth into question, as you suggested.

turing_scaling.png

As for Navi, I made a detailed analysis including clocks speeds in various games. Both cards when overclocked were also running with a memory OC up to 14.4 gbps for the XT and 14.8 gbps for the base 5700 model. To me it shows that even the base RX 5700 has a pretty poor scaling despite being able to reach higher memory clocks (latter being a case of silicon lottery I guess).

navi_clock_scaling.png
 
Wow. 2.1 ghz on an AMD GPU !

Why is the perf / clock higher on 5700 ? Because of the higher mem overclock ? Or because clocks are overall lower ?
 
Wow. 2.1 ghz on an AMD GPU !

Why is the perf / clock higher on 5700 ? Because of the higher mem overclock ? Or because clocks are overall lower ?
You probably mean performance gain to clock gain ratio. Well, that's what we are trying to figure out in the first place. The throughput difference between 14.8 and 14.4 gpbs GDDR6 is less than 3%. It's likely that neither is enough to feed the Navi 10 at the clocks approaching 2 GHz and beyond. In addition to the rushed-out and probably poorly optimized driver.
 
Back
Top