AMD GPU gets better with 'age'

Right so we have to wait for 2 to 3 years for games to be made to run better on AMD hardware is that it? Lets buy an AMD card for and wait for two years? Sorry that just doesn't sell video cards.

Gotta come out with products that perform better on games that are out at that time.
 


yes that is the second time you posted that, I am sure a person that buys an AMD card, loves waiting for 2 years to say they have finally caught up or beaten their competition and be happy with their purchase.

If they can't show it from day one, when released, that defines their cards.

Saying drivers will do the trick, or games in the future will be better optimized for them, just doesn't work to sell cards. And anyone that is willing to take that risk, its their money they can take that risk and possible waste money or if they were lucky enough kept their card long enough to see the benefits, instead of upgrading to a better card, well there ya have it.
 
yes that is the second time you posted that, I am sure a person that buys an AMD card, loves waiting for 2 years to say they have finally caught up or beaten their competition and be happy with their purchase.

If they can't show it from day one, when released, that defines their cards.

Saying drivers will do the trick, or games in the future will be better optimized for them, just doesn't work to sell cards. And anyone that is willing to take that risk, its their money they can take that risk and possible waste money or if they were lucky enough kept their card long enough to see the benefits, instead of upgrading to a better card, well there ya have it.


The point was trying to move all that whining argument to the appropriate thread.


But I guess the BFFs love all the AMD-hating content in this thread now, so they're just not letting it go, not even at 254 pages and 5060 posts.
Oh well, whatever.
 
ah should have clarified that. I asked for something to show your line of thought, and I thought that is why you linked that thread.
 
The point was trying to move all that whining argument to the appropriate thread.


But I guess the BFFs love all the AMD-hating content in this thread now, so they're just not letting it go, not even at 254 pages and 5060 posts.
Oh well, whatever.
So, which thread were posting in originally? Because the link you gave is to this thread, but it doesn't match the dimensions you've descried. Perhaps this is part of the problem?
 
So, which thread were posting in originally? Because the link you gave is to this thread, but it doesn't match the dimensions you've descried. Perhaps this is part of the problem?

His post was accurate when it was made. At some later point in time, the moderators cleaned up some of the original thread by moving some of those messages into this very thread.
 
Radeon 7970 is still competitive in compute shader performance. Beats GTX 780 in many cases. Compute shader usage in games has increased steadily in the last years. Just two years ago many cross platform developers still shipped their games for both last gen + current gen. Xbox 360 and PS3 didn't support compute shaders. Core rendering techniques were mostly vertex+pixel shader based. Compute shaders were mostly used for some high end effects (for current gen consoles + PC). In the last two years this situation has changed rapidly.

GCN architecture is the core of both current gen consoles. Cross platform developers optimize their engines very well for GCN. This has of course also improved the performance on PC GCN cards. Nvidia spends significantly more of their resources in hand tuning their drivers for each new AAA title. GTX 5xx, 6xx and 7xx are not as important anymore than 9xx and 10xx. There will not be as much game specific optimizations for these old cards anymore. These two things combined affect the relative performance of new games on older cards. AMD gains a little bit from better console developer optimizations and Nvidia's old GPUs lose a little bit as there's not as much game specific driver optimizations anymore.

Personally I see Polaris (RX 470) as a fixed Radeon 7970. Similar raw specs. Compute performance is almost identical. The awkward geometry bottleneck is finally gone, increasing the game performance a lot.
 
Radeon 7970 is still competitive in compute shader performance. Beats GTX 780 in many cases. Compute shader usage in games has increased steadily in the last years. Just two years ago many cross platform developers still shipped their games for both last gen + current gen. Xbox 360 and PS3 didn't support compute shaders. Core rendering techniques were mostly vertex+pixel shader based. Compute shaders were mostly used for some high end effects (for current gen consoles + PC). In the last two years this situation has changed rapidly.

GCN architecture is the core of both current gen consoles. Cross platform developers optimize their engines very well for GCN. This has of course also improved the performance on PC GCN cards. Nvidia spends significantly more of their resources in hand tuning their drivers for each new AAA title. GTX 5xx, 6xx and 7xx are not as important anymore than 9xx and 10xx. There will not be as much game specific optimizations for these old cards anymore. These two things combined affect the relative performance of new games on older cards. AMD gains a little bit from better console developer optimizations and Nvidia's old GPUs lose a little bit as there's not as much game specific driver optimizations anymore.

Personally I see Polaris (RX 470) as a fixed Radeon 7970. Similar raw specs. Compute performance is almost identical. The awkward geometry bottleneck is finally gone, increasing the game performance a lot.

A great summary. I must admit that I'm sorely tempted to pick up an AMD GPU on my next upgrade cycle due to what appears to be the ever increasing synergy between AMD GPU's and game optimisation (because of consoles). I'm fixed into Nvidia now because of 3D Vision (which I absolutely adore - when it works), but eventually I'm going to want to move to a 4K HDR variable refresh rate screen and I'm fairly certain they won't be supporting 3D Vision by that point which will break my tie in to Nvidia. NV still owns the high end of course but unless you're going to spend megabucks, that shouldn't really matter.
 
A great summary. I must admit that I'm sorely tempted to pick up an AMD GPU on my next upgrade cycle due to what appears to be the ever increasing synergy between AMD GPU's and game optimisation (because of consoles).
GCN is even better in future games. GCN is a fully bindless architecture (except for index buffers). It doesn't have any fast path hardware for vertex fetch, constant buffers or static binding. This must hurt a bit in current games. GCN pays a little bit extra for more flexibility. Other GPUs will likely lose a little bit of performance from bindless resources (esp older ones). As far as I know, no PC game yet uses bindless resources, so we need to wait a bit to see the results. AMD also is exactly as fast reading data from a raw/structured buffer compared to a constant buffer. Nvidia performance drops. Constant buffers cannot be directly written by compute shaders. If you want to achieve the fast path on Nvidia, you need extra copy operations (= more GPU stalls). Not a big problem in old games, but compute shader based pipelines are getting more common.

SM 6.0 will also expose lane swizzles and some other features found on GCN based consoles. This allows more direct porting of console shaders to PC. Again this helps GCN, since these shaders are already optimized for it.
 
https://developer.nvidia.com/content/how-about-constant-buffers

Struggling with the limitations of constant buffers is still very important on Nvidia hardware (tiny size + no writing by GPU). In this example the lighting pass got up to 33% faster using constant buffers. AMD on the other hand gives exactly the same performance with constant buffers and structured buffers. It is all just memory for AMD. Both buffer types get loaded to scalar registers if the access is coherent and loaded into vector registers if the access is divergent (same for vertex buffers, raw buffers and typed buffers).

Console devs have no reason to use constant buffers anymore (extra limitations with no gain). The only reason is that the PC port will run faster on Nvidia hardware. But do all devs have the motivation to program multiple code paths solely for this reason? Constant buffer limitations are getting more severe as time goes by.
 
Console devs have no reason to use constant buffers anymore (extra limitations with no gain). The only reason is that the PC port will run faster on Nvidia hardware. But do all devs have the motivation to program multiple code paths solely for this reason? Constant buffer limitations are getting more severe as time goes by.

Even though it sucks, the bright side is that it can be really fast when it's little enough data on Nvidia. I assume it's actually LDS mem that is used there, and it might be part of the reason why graphics and compute together are a pain on Nvidia, they simply optimized themselves into the corner.

It would be cute to have a flag to tell the AMD driver to pre-load the CB contents into LDS. No bank conflicts on read AFAIR. The LDS can be persistent, having a lifetime as long as there are waves of that kind on a CU. Just as a workaround until we get a LDS model for the graphics pipeline stages ... one can dream. ;)
 
Even though it sucks, the bright side is that it can be really fast when it's little enough data on Nvidia.
AMD Terascale also had separate constant buffer hardware. I measured up to 6x perf diffference on Radeon 5870 (CB vs structured, coherent access). CB read (coherent) was also 2x faster than LDS read. It was really hard to get heavily compute shader based code running fast on Terascale.
I assume it's actually LDS mem that is used there, and it might be part of the reason why graphics and compute together are a pain on Nvidia, they simply optimized themselves into the corner.
Sounds plausible. I have also heard other guesses like using LDS memory for storing temporary graphics stage outputs. Nvidia's graphics pipeline load balancing has been better than AMD. Having more on-chip memory is always better for purposes like this. If you look at the pixel shader epilogue generated by AMDs shader compiler, you also notice that transformed vertices are loaded from LDS (before interpolating them). AMD also uses some LDS for graphics pipeline purposes. Example (page 42+): https://michaldrobot.files.wordpress.com/2014/05/gcn_alu_opt_digitaldragons2014.pdf.
It would be cute to have a flag to tell the AMD driver to pre-load the CB contents into LDS. No bank conflicts on read AFAIR. The LDS can be persistent, having a lifetime as long as there are waves of that kind on a CU. Just as a workaround until we get a LDS model for the graphics pipeline stages ... one can dream. ;)
I have done this manually several times. It is faster to load lookup table to LDS compared to directly indexing (incoherent) it from memory. Both on Nvidia and AMD. As long as your thread groups are big enough, there isn't much overhead in loading the lookup table to LDS once per group (one 100% coalesced load instruction per thread from L1/L2 cache). Sharing constant data loaded to LDS accross thread groups is an interesting topic of discussion. I have also found other use cases for it. It would definitely bring some gains, but it would complicate the LDS resource management a lot (potential on-chip memory fragmentation, resource starvation, etc).
 
Last edited:
Are the bottlenecks you mention for Maxwell and earlier as painful for Pascal?
Don't know. My dev computer still has GTX 980. Nvidia hasn't been lately as open as AMD about their GPU architecture. I would hope that we get documents like this from Nvidia: http://developer.amd.com/wordpress/media/2013/07/AMD_Sea_Islands_Instruction_Set_Architecture.pdf.

Last gen example. We immediately noticed that Maxwell (GTX 980) was way superior to Kepler (GTX 780) in shared memory (LDS) atomics. Kepler was super slow compared to AMD GCN (GTX 780 was roughly 3x slower than Radeon 7970). Maxwell fixed this issue. Now AMD and Nvidia are roughly as fast in LDS atomics.

Nvidia dev blog has some benchmark numbers about this (Titan = Kepler, Titan X = Maxwell):
https://devblogs.nvidia.com/parallelforall/gpu-pro-tip-fast-histograms-using-shared-atomics-maxwell/

Gamers/reviewers really do not need to know about highly specific technical bottlenecks like these. But we developers would definitely need all the fine technical details in order to navigate around these bottlenecks. It seems that GPU manufacturers like to speak about these bottlenecks after they release a new hardware that fixes the bottleneck :)

AMD also has some clear bottlenecks left. RGBA16 (both unorm and float) bilinear filtering is half rate. Nvidia and Intel have been full rate for long time already. AMDs filtering hardware has better subpixel interpolation quality when zoomed up massively. But that is almost never relevant. 2x better performance would be highly relevant. Most games nowadays are using 16 bit float textures extensively.
 
Last edited:
It would be cute to have a flag to tell the AMD driver to pre-load the CB contents into LDS. No bank conflicts on read AFAIR. The LDS can be persistent, having a lifetime as long as there are waves of that kind on a CU. Just as a workaround until we get a LDS model for the graphics pipeline stages ... one can dream. ;)
You can get bank conflicts on reads and writes to LDS.

What do you mean by LDS model for the graphics pipeline stages?
 
Back
Top