DX12 Performance Discussion And Analysis Thread

Its interessant to see that, Async was seems more used in consoles even in FarCry Primal . Same for Doom, But it seems to dont be enabled in Doom or FCP in the PC version ?.

Maybe they need Vulkan for Doom. Hopefully it comes out relatively soon...
 
In general i dont see interest on posting thing from them, but in this case
http://wccftech.com/async-compute-p...tting-performance-target-in-doom-on-consoles/

Its interessant to see that, Async was seems more used in consoles even in FarCry Primal . Same for Doom, But it seems to dont be enabled in Doom or FCP in the PC version ?.
Does OpenGL have the necessary extensions to use async anyway? Vulkan-version probably will have it, I think, especially since id was onboard at RX 480 show
 
In general i dont see interest on posting thing from them, but in this case
http://wccftech.com/async-compute-p...tting-performance-target-in-doom-on-consoles/

Its interessant to see that, Async was seems more used in consoles even in FarCry Primal . Same for Doom, But it seems to dont be enabled in Doom or FCP in the PC version ?.

Yeah, developers love Async Compute, especially the ones that have to also release console versions of their games. Looking forward to seeing it and Dx12 get more use in the future. Also nice to see AMD's GPU Intrinsics getting a lot of praise.

Regards,
SB
 
It's a lot harder to see gains of async compute on PC than Consoles as you have to tune for gazillion different configurations. Also if most GPUs (Nvidia / Intel) don't support it then there's no incentive for gamedevs to add it for PC.
 
I believe it was intended to be a perf feature. Just supporting it but not seeing any benefits on all platforms wouldn't do any favors to its wide adoption. Or you could argue they already coded it for consoles so it's just a matter of enabling a "flag" with some added scalability for PCs.
 
I believe it was intended to be a perf feature. Just supporting it but not seeing any benefits on all platforms wouldn't do any favors to its wide adoption. Or you could argue they already coded it for consoles so it's just a matter of enabling a "flag" with some added scalability for PCs.

Well some developpers speak about 10-15% peformance gain on PC games (DeusEX developpers ). But all "games" will not benefit of it the same way, but specially developpers need to know how to achieve it and how to work with it. Its not a magic button. And traditionnaly, as console have allways been more limited in power, developpers try all they can for gain here and there some ms of render time.



(Can start page 53 for some example, even if Async chapters start before )
http://32ipi028l5q82yhj72224m8j.wpe...oads/2016/03/d3d12_vulkan_lessons_learned.pdf


Anyway, one question, could DX12 support or allready support out of order rasterization ? ( dont know what the state of Nvidia on this, but AMD have added it to Vulkan ) http://gpuopen.com/unlock-the-rasterizer-with-out-of-order-rasterization/
 
Last edited:
Additionally according to Oxide games, it wasn't difficult nor did it take long to enable Async Compute in AOTS.

So it's most likely a myth that it is difficult to use on PC. Performance is going to vary on a game by game basis, however. Oxide got a 15% performance boost with AOTS. Doom on console got a 18.75% to 31.25% (3ms to 5ms out of a 16ms rendering time, although that may be with AMD Intrinsics in addition to Async Compute) performance boost on consoles if I'm reading that correctly. Doom doesn't use Dx12, however, so we'll have to wait until the Vulcan version hits to see how and if Async Compute is enabled for Doom on PC.

iD has traditionally partnered closely with Nvidia, so if anyone is going to use Async Compute well on Nvidia hardware it is likely to be them.

Regards,
SB
 
Last edited:
Additionally according to Oxide games, it wasn't difficult nor did it take long to enable Async Computer in AOTS.

So it's most likely a myth that it is difficult to use on PC. Performance is going to vary on a game by game basis, however. Oxide got a 15% performance boost with AOTS. Doom on console got a 18.75% to 31.25% (3ms to 5ms out of a 16ms rendering time, although that may be with AMD Intrinsics in addition to Async Compute) performance boost on consoles if I'm reading that correctly. Doom doesn't use Dx12, however, so we'll have to wait until the Vulcan version hits to see how and if Async Compute is enabled for Doom on PC.

iD has traditionally partnered closely with Nvidia, so if anyone is going to use Async Compute well on Nvidia hardware it is likely to be them.

Regards,
SB

Effectively, for Doom, i think we will need wait Vulkan, as for FC. i doubt it will be patched then. ( dont know why i was think it used DX12, but no, only DX11)
 
Additionally according to Oxide games, it wasn't difficult nor did it take long to enable Async Compute in AOTS.

So it's most likely a myth that it is difficult to use on PC. Performance is going to vary on a game by game basis, however. Oxide got a 15% performance boost with AOTS. Doom on console got a 18.75% to 31.25% (3ms to 5ms out of a 16ms rendering time, although that may be with AMD Intrinsics in addition to Async Compute) performance boost on consoles if I'm reading that correctly. Doom doesn't use Dx12, however, so we'll have to wait until the Vulcan version hits to see how and if Async Compute is enabled for Doom on PC.
Sure it's easy to implement. What happens with that implementation is a whole different matter.
You can claim a percentage gain on consoles where you only have one GPU. On PC performance isn't going to vary just on the game to game basis. It varies on GPU to GPU basis as well and that's not just between AMD and NV GPUs, but within AMD GPU lineup as well. It's not going to be the same gain between R9 390X, Fury and Fury X because this chips have the same amount of graphics resources and different amount of compute resources.
 
Last edited:
I believe it was intended to be a perf feature. Just supporting it but not seeing any benefits on all platforms wouldn't do any favors to its wide adoption. Or you could argue they already coded it for consoles so it's just a matter of enabling a "flag" with some added scalability for PCs.

:no: Too many times this topic has come up and too many times people seem too confused on what is going on, AMD did a huge marketing dump on async, and too many people fell for it. If anyone thinks GPU's are more than 10% underutilized, either the program is crap, or the hardware is crap, or the drivers are crap!

This is all I'm going to say on this any more.
 
:no: Too many times this topic has come up and too many times people seem too confused on what is going on, AMD did a huge marketing dump on async, and too many people fell for it. If anyone thinks GPU's are more than 10% underutilized, either the program is crap, or the hardware is crap, or the drivers are crap!

This is all I'm going to say on this any more.
I still can't get over AMD's technically incorrect marketing dump. The feature has literally nothing to do with asynchrony. Rather, it is all about concurrency. These are completely different concepts. AMD did the world a big disservice by conflating them.
 
Well that is the way marketing is though, I can't blame them for taking advantage of the situation, but I do blame people that fell for it and still talk about it as if AMD's marketing is correct.
 
:no: Too many times this topic has come up and too many times people seem too confused on what is going on, AMD did a huge marketing dump on async, and too many people fell for it. If anyone thinks GPU's are more than 10% underutilized, either the program is crap, or the hardware is crap, or the drivers are crap!

If you do shadow-mapping then basically every chip of the last decade is severely underutilized, sometimes you use only 10% of the chip. And because there is only one rasterizer state it in itself can't be speed up (through concurrency).

I still can't get over AMD's technically incorrect marketing dump. The feature has literally nothing to do with asynchrony. Rather, it is all about concurrency. These are completely different concepts. AMD did the world a big disservice by conflating them.

You don't understand where the naming originated from. Under DX11 everything you do is executed in-order, and it finishes in-order. Thus it's entirely synchronous from a command-stream perspective. The hardware engineers transitioned the hardware to extract CSP (command-stream parallelism, similar as in instruction-level parallelism ILP) from the serial command-stream. That only brings you so far, so they added software-support to allow us developers to explicitly specify when we allow in-order execution not to be important, in effect handling the synchronozation ourself instead of handling an implicitely specified synchronisation. ACE allows compute command stream being executed in any order, at any time, thus asynchronously.

And if you two don't grasp what exactly it is - not a swiss army knife but a tool and as much a natural evolution-step as super-scalar out-of-order SMT-able multi-core CPUs - and if you can not contextualize when why and how it isn't or is used in the various engines, then you shouldn't make so much unfounded noise.
 
If you do shadow-mapping then basically every chip of the last decade is severely underutilized, sometimes you use only 10% of the chip. And because there is only one rasterizer state it in itself can't be speed up (through concurrency).

Is this really truth? We had no nVidia card with more compute performance than Fury X until the GTX1080. And this card is faster than a Fury X in Hitman and Ashes with DX11 over DX12...
 
If you do shadow-mapping then basically every chip of the last decade is severely underutilized, sometimes you use only 10% of the chip. And because there is only one rasterizer state it in itself can't be speed up (through concurrency).

hell no, yeah there are some problems with concurrancy with shadow maps, but not to that degree. if they had such an effect they wouldn't be used at all. I'm sure there are ways to sidestep the underutilized problem.

You don't understand where the naming originated from. Under DX11 everything you do is executed in-order, and it finishes in-order. Thus it's entirely synchronous from a command-stream perspective. The hardware engineers transitioned the hardware to extract CSP (command-stream parallelism, similar as in instruction-level parallelism ILP) from the serial command-stream. That only brings you so far, so they added software-support to allow us developers to explicitly specify when we allow in-order execution not to be important, in effect handling the synchronozation ourself instead of handling an implicitely specified synchronisation. ACE allows compute command stream being executed in any order, at any time, thus asynchronously.

Is it really the ACE that allows asynchronicity?

And if you two don't grasp what exactly it is - not a swiss army knife but a tool and as much a natural evolution-step as super-scalar out-of-order SMT-able multi-core CPUs - and if you can not contextualize when why and how it isn't or is used in the various engines, then you shouldn't make so much unfounded noise.


I agree its a tool, but I never once downplayed async compute. I have only stated you are only going to get a certain amount of performance out it. If shadow maps created such underutiilzation as you stated 10% utilization, 90% under utilization, games wouldn't be using shadow maps then cause that defeats the purpose of having faster hardware. That is a software problem, the programmer better figure out how to get his/her engine to have better utilization. Cause you aren't going to fill up 90% of your ALU's with compute tasks at this point, nor in the past. As I stated, either the software is crap, hardware is crap, or drivers are crap lol.
 
There's another point that has been known for years and yet ignored in the discussion about how much performance there is to be gained...
GTX 980 has 64 ROPs and 16 SMs. A single SM can supply at most 4 pixels per clock on average. Which has a funny side effect that cards like GTX 970 or GTX 980 Ti can't reach their peak fillrate even with the simplest of pixel shaders! What is there to gain by squeezing some totally unrelated compute stuff in there? GP104 is basically the first one that breaks this.
 
Yes, they don't need to bother the command processor, they can handle sending compute queues to be executed by themselves


I think the command processor has a lot to do with what the ACE's do. The ACE's by themselves don't create the ability to do asynchronicity.
 
Back
Top