Vulkan is a GCN low level construct?

From the same review, it looks like in a GPU-limited scenario the AMD driver works just as well:

WFVrjar.png



Though this isn't Vulkan, so unless we're also trying to make DX12 into a GCN construct, the point is a bit moot.

I actually saw some benchmarks of DOOM running in Vulkan on different older CPUs comparing a 1060 and a 480.
Pg4xTmn.png

http://www.hardwareunboxed.com/gtx-1060-vs-rx-480-in-6-year-old-amd-and-intel-computers/


^This is before 372.54

372.54 brought driver support for vulkan runtime version 1.0.0.17 and with it reduced CPU render times by a factor of 4 in one benchmark I saw, but obviously the effects of that need to be seen using higher end graphics cards

12062

http://www.sweclockers.com/test/22533-snabbtest-doom-med-vulkan
 
372.54 brought driver support for vulkan runtime version 1.0.0.17 and with it reduced CPU render times by a factor of 4 in one benchmark I saw, but obviously the effects of that need to be seen using higher end graphics cards
How did they know it reduced CPU render time? these new results didn't show any significant change from before.
 
Have you tried Quake 4 on a GTX 1080? I have and it's atrocious.
Is it the bug that causes the game to only load low resolution textures? The Steam version of the game has some kind of bug that causes this. It can be fixed with a few config variables.
 
How did they know it reduced CPU render time? these new results didn't show any significant change from before.

The performance gains from vulkan are a lot bigger than when I tested when the Vulkan update first hit the game.

CPU render times can be seen using the ingame monitor, theres a video on YouTube of doom vulkan using two different drivers ;

 
The performance gains from vulkan are a lot bigger than when I tested when the Vulkan update first hit the game.

CPU render times can be seen using the ingame monitor, theres a video on YouTube of doom vulkan using two different drivers ;


Whoa... 3ms cpu time down from 9?
 
Both Maxwell (980 Ti) and Pascal (1080) flagship GPUs get a huge +30% performance boost from Vulkan. This pretty much proves that Vulkan is not a "GCN construct". Slower Maxwell and Pascal cards show smaller gains simply because the game seems to be getting close to 100% GPU bound on those GPUs (there's free CPU cycles to spare). Kepler (780 Ti) seems to be an anomaly. Could be that Kepler drivers are not yet fully optimized for Vulkan. Kepler is likely not a high priority to Nvidia anymore. Fermi doesn't even have any Vulkan or DX12 drivers.
 
Kepler (780 Ti) seems to be an anomaly. Could be that Kepler drivers are not yet fully optimized for Vulkan. Kepler is likely not a high priority to Nvidia anymore. Fermi doesn't even have any Vulkan or DX12 drivers.
Kepler behaves like any 3GB card running Vulkan would: massive reduction in fps with no apparent reason, It's probably a behavior remnant of the old Mantle structure where performance dropped on cards with average memory sizes. Here we see the 3GB 1060 taking a 30% hit under Vulkan compared to the 6GB 1060, despite running exactly the same as the 6GB under OpenGL.

Doom_01.png

http://www.techspot.com/review/1237-msi-geforce-gtx-1060-3gb/page2.html
 
Whoa... 3ms cpu time down from 9?

Yeah, serious business eh?

Both Maxwell (980 Ti) and Pascal (1080) flagship GPUs get a huge +30% performance boost from Vulkan. This pretty much proves that Vulkan is not a "GCN construct". Slower Maxwell and Pascal cards show smaller gains simply because the game seems to be getting close to 100% GPU bound on those GPUs (there's free CPU cycles to spare). Kepler (780 Ti) seems to be an anomaly. Could be that Kepler drivers are not yet fully optimized for Vulkan. Kepler is likely not a high priority to Nvidia anymore. Fermi doesn't even have any Vulkan or DX12 drivers.

Meh, I feel like it's becoming far too common for people to just dismiss a whole API because of one title/benchmark. GCN cards gained up to 40% when Vulkan update for DOOM first hit and all of a sudden Vulkan is Mantle (people confused GCN intrinsics with some kind of inherent LL GCN optimization in Vulkan) and blah blah blah.

Seems to me like when GPU is the bottleneck they perform almost identically (whereas for GCN seems there's a slight Vulkan advantage possibly due to intrinsics being used).

Some recent games have had me really confused. Deus Ex Mankind Divided performs really strangely. In the built-in benchmark the Fury X is something like 25% faster (seems to track compute xput) yet in the game a reference (1200mhz) 980ti outperforms it by 5-10% when the FPS is low, and Fury X appears to lead when the FPS is high. CPU overhead seems higher on the nvidia side as well, and this is DX11. Weird stuff.

I usually expect NV cards to be on top when the FPS is high; higher geometry and less CPU overhead generally speaking, Deus Ex is opposite day.

Doesn't Doom have a frame limit at 200?

Yeah it does

Kepler behaves like any 3GB card running Vulkan would: massive reduction in fps with no apparent reason, It's probably a behavior remnant of the old Mantle structure where performance dropped on cards with average memory sizes. Here we see the 3GB 1060 taking a 30% hit under Vulkan compared to the 6GB 1060, despite running exactly the same as the 6GB under OpenGL.

Doom_01.png

http://www.techspot.com/review/1237-msi-geforce-gtx-1060-3gb/page2.html


I've noticed games that support both DX11 and DX12 have less efficient memory management using DX12, lots more VRAM used and potential for stutter increases. Annoying. Any good reason why that is ?

Wow, those results are very different from the sweclockers ones >_>
 
Last edited:
Kepler behaves like any 3GB card running Vulkan would: massive reduction in fps with no apparent reason, It's probably a behavior remnant of the old Mantle structure where performance dropped on cards with average memory sizes. Here we see the 3GB 1060 taking a 30% hit under Vulkan compared to the 6GB 1060, despite running exactly the same as the 6GB under OpenGL.
3GB memory explains this perfectly.

High level APIs, such as DirectX and OpenGL track the residency of each resource separately and move them automatically/repeatedly in/out of GPU memory based on accesses. Residency tracking is one of the big performance hogs of the high level APIs. It also causes random frame drops as the information of a missing resource comes very late (bind resource -> draw). If a resource is not resident, it must be immediately loaded to GPU memory (and old resources must be kicked out according to LRU caching policy). If enough resources are missing on the same frame there will be a stall -> frame drop.

In DirectX 11 and OpenGL you could over-commit GPU memory without big problems. All you got was some random single frame drops. The driver automatically handled resource switching. However in DX12 and Vulkan, the developer needs to manually implement resource management. It is easiest just to allocate big chunk of GPU memory and load all the common level assets there permanently. Of course there's some residency tracking for high detail (close up) streamed resources, but generally there's much more "pinned" GPU data in DX12/Vulkan applications compared to DX11/OpenGL. This is similar to console resource management (game has guaranteed memory amount). This is however problematic on PC if the developer has designed the game (ultra settings in this case) around a larger memory budget (such as 4GB). Doom Vulkan version doesn't even start on 2 GB graphics cards, while the OpenGL version runs just fine (albeit with some frame rate issues).
 
Seems to me like when GPU is the bottleneck they perform almost identically (whereas for GCN seems there's a slight Vulkan advantage possibly due to intrinsics being used).
Async compute gives GCN a slight GPU performance edge over Nvidia in Vulkan and DX12. Both Nvidia and AMD offer various intrinsics (such as wave/warp operations) as Vulkan extensions. Id software hasn't mentioned using IHV specific intrinsics, but they have talked highly about async compute. This would single handedly explain why AMD gains around 10% more performance from Vulkan than NVidia.
 
I've noticed games that support both DX11 and DX12 have less efficient memory management using DX12, lots more VRAM used and potential for stutter increases. Annoying. Any good reason why that is ?
I think the key issue there is BOTH DX11/12. The resource models are different so DX12 gets tacked on. For an exclusive DX12 environment there are other strategies that could be used that address the concerns.

Meh, I feel like it's becoming far too common for people to just dismiss a whole API because of one title/benchmark. GCN cards gained up to 40% when Vulkan update for DOOM first hit and all of a sudden Vulkan is Mantle (people confused GCN intrinsics with some kind of inherent LL GCN optimization in Vulkan) and blah blah blah.
Definitely agree with this. It's easier to just assume AMD had superior driver support coming from Mantle, which shouldn't come as a surprise. A lot of Nvidia's optimizations for DX11 would need reworked, but once complete I'd expect solid results. Results that look to now be arriving on Doom.

Id software hasn't mentioned using IHV specific intrinsics, but they have talked highly about async compute.
I thought one of their developers stated they did use some intrinsics for TSSAA. In this case it was passing data between lanes to reduce bandwidth while compositing. Do the SSAA and pass the result to the neighbors or something.
 
Regarding the thread topic: I think yeah, DirectX 12 and Vulkan are GCN low-level constructs but in a very positive way. They took what they could from AMDs pioneering work with Mantle which of course was tailored to GCN. As a result, many concepts in DX12 and Vulkan seem to have similar effects on GCN cards as the use of Mantle would have had. So, the Dollars AMD spent on Mantle (i.e. flying Johan from Stockholm back and forth... j/k) were pretty good invested.

The only caveat I have is that from the outside you could be tempted to think that with the effort for Mantle and the prospect of an "automatic" performance uplift with DX12/Vulkan, AMD terribly neglected their basic DX11 driver architecture, just doing basic bug fixing for newer titles as necessary. That's the impression you could also get, when viewing the dramatic performance uplifts going from DX11 to 12 in games like ashes for AMD cards.

That not only a single one of the new DX12 or Vulkan titles had some birthing problems did not really help to convey the positive side of the message AMD was trying to instill.
 
I don't think they neglected their DX11 driver(I know that is not what you're saying, Carsten, rather I am just arguing against the concept in general). It's more likely that it had too much... I'm not sure what the correct term here is, code rot? Baggage?

It's not like this is a new thing at AMD's graphics division. Recall the much vaunted 100% OGL rewrite of quite a few years back that wound up instead being slowly phased in and didn't ultimately change that much in the end compared to the amount of hype AMD both generated themselves and allowed to build? I think likely a similar thing happened there. Too much to do, so their effort wound up ultimately failing.
 
Yeah, serious business eh?

Meh, I feel like it's becoming far too common for people to just dismiss a whole API because of one title/benchmark. GCN cards gained up to 40% when Vulkan update for DOOM first hit and all of a sudden Vulkan is Mantle (people confused GCN intrinsics with some kind of inherent LL GCN optimization in Vulkan) and blah blah blah.

Seems to me like when GPU is the bottleneck they perform almost identically (whereas for GCN seems there's a slight Vulkan advantage possibly due to intrinsics being used).

Some recent games have had me really confused. Deus Ex Mankind Divided performs really strangely. In the built-in benchmark the Fury X is something like 25% faster (seems to track compute xput) yet in the game a reference (1200mhz) 980ti outperforms it by 5-10% when the FPS is low, and Fury X appears to lead when the FPS is high. CPU overhead seems higher on the nvidia side as well, and this is DX11. Weird stuff.

I usually expect NV cards to be on top when the FPS is high; higher geometry and less CPU overhead generally speaking, Deus Ex is opposite day.

CPU overhead is higher on nvidia, AMD's problem is that they've a single threaded driver that gets overwhelmed on CPUs with low IPC, AMD's own and i3.

AMD cards usually use less CPU than corresponding nvidia cards in the comparisons here,

 
Doom does use AMD GCN intrinsics, actually, AMD was quite open and proud of it for Doom being the first title to make use of it. And rightfully so.
https://community.bethesda.net/thread/59229?start=0&tstart=0

Why would anyone want not to use present resources if they could?
DX9 and DX11 also had lots of IHV specific extensions (API backdoors) and OpenGL has IHV specific extensions (official well documented ones). Most developers didn't use these IHV specific extensions, because you'd need to write and maintain and test multiple different code paths. There is no guarantees either that an extension works properly with future hardware from the same IHV. Once you start writing hardware specific code, your maintenance cost will increase. Consoles are an exception to this rule, since console hardware stays unchanged for many years. We see a new PC GPU generation at least every 2 years.

You only want to write hardware specific code on PC if that gives you big gains and if it helps with more than on GPU. Fortunately AMD GCN architecture has been used for long time. There's a big user base available. Also with these intrinsics, you can port your console GCN code to PC, reducing the cost of writing the hardware specific code.
I thought one of their developers stated they did use some intrinsics for TSSAA. In this case it was passing data between lanes to reduce bandwidth while compositing. Do the SSAA and pass the result to the neighbors or something.
Now that you mentioned it, I remember it as well. Most likely they just ported their cross lane optimized console TSSAA code to PC GCN. Nvidia has similar cross lane intrinsics available now (https://developer.nvidia.com/reading-between-threads-shader-intrinsics).
CPU overhead is higher on nvidia, AMD's problem is that they've a single threaded driver that gets overwhelmed on CPUs with low IPC
It's true that the total DX11 CPU overhead might be slightly higher on Nvidia, but as long as you have 4 core CPU or more (and/or hyperthreading), the driver distributes the workload nicely. AMD DX11 driver taxes the single (application) render thread heavily. The render thread tends to be the bottleneck even without the driver taking part of the execution time.

Both approaches however are highly wasteful. There's just too much resource bookkeeping and translation going on. Vulkan and DX12 reduce this overhead to minimum and allow you to either use one render thread or split work to multiple threads yourself (depending which suits your application the best). Extra driver worker threads are not needed.
 
Last edited:
CPU overhead is higher on nvidia, AMD's problem is that they've a single threaded driver that gets overwhelmed on CPUs with low IPC, AMD's own and i3.
AMD's DX11 overhead has been found to be significantly higher even in synthetic tests as well, even on high end overclocked CPUs.
AMD cards usually use less CPU than corresponding nvidia cards in the comparisons here,
That remains a single comparison, see others from the same channel for opposite results
 
Back
Top