DX12 Performance Discussion And Analysis Thread

I think the command processor has a lot to do with what the ACE's do. The ACE's by themselves don't create the ability to do asynchronicity.
I'm quite sure without double checking that according to AMD, command processor can handle graphics and compute queues, while ACEs can independently handle compute queues at the same time
 
The GCN command processor is responsible for receiving high-level level API commands from the driver and mapping them onto the different processing pipelines. There are two main pipelines in GCN. The Asynchronous Compute Engines (ACE) are responsible for managing compute shaders, while a graphics command processor handles graphics shaders and fixed function hardware. Each ACE can handle a parallel stream of commands, and the graphics command processor can have a separate command stream for each shader type, creating an abundance of work to take advantage of GCN's multi-tasking.

That is from AMD's GCN whitepaper.

pretty much all three processors must work in unison to do asynchronicity.
 
That is from AMD's GCN whitepaper.

pretty much all three processors must work in unison to do asynchronicity.

Technically, it might argued that if they were doing things in unison, it might not be asynchronous.

To the broader debate over whether ACEs make it possible, asynchronous from the standpoint of the software doesn't need independent processors any more than asynchronous functionality did back when there was only one CPU core in a system. A processor can made to juggle multiple queues if need be, and can actually happen in the case of runlist execution with HWS and virtualization handling.

That appears to have been AMD's choice in this case, but the presence of multiple other vendors with DX12 support shows it wasn't the only one.
The ACEs are rather over-engineered for the purpose the DX12 happens to use them for.
 
Lazy Devs.
Lazy Hardware Devs.
Lazy Devs.

Got it.


Well that wasn't what I was meaning I was actually meaning the opposite, they are not underutilized any more than 10%, if things are done right, which they seem to be and I haven't seen anything that would show that hardware is crap, drivers are crap, and software is crap.
 
:no: Too many times this topic has come up and too many times people seem too confused on what is going on, AMD did a huge marketing dump on async, and too many people fell for it. If anyone thinks GPU's are more than 10% underutilized, either the program is crap, or the hardware is crap, or the drivers are crap!

This is all I'm going to say on this any more.

All games run drawcalls which cannot use 100% of your HW units - you will be limited by either shaders, memory bandwidth, geometry processing or some obscure FF unit like blending. No drawcall will utilize 100% of all your GPU units as each has different workloads. For example postprocessing shaders are mostly bandwidth limited which means your compute units could do more while waiting for memory to be fetched.

Devs have gotten as much as 6-7 ms perf improvement using Async Compute which is HUGE! And no, AMD didn't do any marketing dump on async - those are real gains achieved in real games. It pavs the way to do more stuff on GPU like culling, particles etc.
 
pretty much all three processors must work in unison to do asynchronicity.
Please don't use the term "asynchronous", when what you really mean is simultaneous/parallel execution.

Asynchronicity is not an attribute of the hardware. It's an attribute of the API.
It only means that the order of execution is not defined implicitly by the invocation pattern, but instead modeled explicitly by the use of signals and fences.

This can be either implemented using cooperative scheduling on fences, or a sufficient number of monitors in hardware when opting for simultaneous execution or low latency scheduling.
In the first case, the hardware does not need any support for that at all.

Even GCN "degrades" to cooperative scheduling if you exceed the number of monitored queues. Even though I have yet to see a legit real life example where that actually happened...
Well that wasn't what I was meaning I was actually meaning the opposite, they are not underutilized any more than 10%, if things are done right, which they seem to be and I haven't seen anything that would show that hardware is crap, drivers are crap, and software is crap.
Doing things "right" isn't easy. At least if you define "right" as achieving a constant utilization of all possible bottlenecks, while also keeping the working set below cache sizes and alike.

This has been said a couple of times in this and other threads. The current design of the render paths is still a straight forward evolution from the old fixed function setup, where you would treat the rendering process as a set of operations applied sequentially on the whole frame each. We are yet to see a wide spread move over to tile based renderers, and a departure from the use of overly expensive full screen space effects.

As it stands, you just can't achieve an even/constant load on all subsystems of the GPU.
 
All games run drawcalls which cannot use 100% of your HW units - you will be limited by either shaders, memory bandwidth, geometry processing or some obscure FF unit like blending. No drawcall will utilize 100% of all your GPU units as each has different workloads. For example postprocessing shaders are mostly bandwidth limited which means your compute units could do more while waiting for memory to be fetched.

Devs have gotten as much as 6-7 ms perf improvement using Async Compute which is HUGE! And no, AMD didn't do any marketing dump on async - those are real gains achieved in real games. It pavs the way to do more stuff on GPU like culling, particles etc.


What does draw calls have to do with async compute, I thought we were talking about async compute only.

6-7ms, for which systems? Without that context that latency savings is meaningless.
 
Please don't use the term "asynchronous", when what you really mean is simultaneous/parallel execution.

Asynchronicity is not an attribute of the hardware. It's an attribute of the API.
It only means that the order of execution is not defined implicitly by the invocation pattern, but instead modeled explicitly by the use of signals and fences.

This can be either implemented using cooperative scheduling on fences, or a sufficient number of monitors in hardware when opting for simultaneous execution or low latency scheduling.
In the first case, the hardware does not need any support for that at all.

Even GCN "degrades" to cooperative scheduling if you exceed the number of monitored queues. Even though I have yet to see a legit real life example where that actually happened...

Doing things "right" isn't easy. At least if you define "right" as achieving a constant utilization of all possible bottlenecks, while also keeping the working set below cache sizes and alike.

This has been said a couple of times in this and other threads. The current design of the render paths is still a straight forward evolution from the old fixed function setup, where you would treat the rendering process as a set of operations applied sequentially on the whole frame each. We are yet to see a wide spread move over to tile based renderers, and a departure from the use of overly expensive full screen space effects.

As it stands, you just can't achieve an even/constant load on all subsystems of the GPU.


LOL sorry just going along the lines of the discussion, but yeah.

I completely agree about it being an attribute of the API, but of course the hardware is made based on the API.

Yeah you won't get constant utilization with today's, hardware, nor will it really ever happen, even in titled based rendering, but it should be considerably better.
 
Do you guys think DX12 wil be a bigger benefit for AMD in general because of Xbox One? DX12 opens up for closer to hardware coding and since the game developers already developes for GCN on XBO the code will fit GCN on PC better than Nvidia architectures? I'm thinking the developers have a big enough incentive (Nvidia markedshare) and support from Nvidia to make optimized code for Nvidia GPU's that it wont be any big difference, except for bad console ports.
 
Do you guys think DX12 wil be a bigger benefit for AMD in general because of Xbox One? DX12 opens up for closer to hardware coding and since the game developers already developes for GCN on XBO the code will fit GCN on PC better than Nvidia architectures? I'm thinking the developers have a big enough incentive (Nvidia markedshare) and support from Nvidia to make optimized code for Nvidia GPU's that it wont be any big difference, except for bad console ports.
It's not only that, but also the fact that all of GCN simply couldn't be utilized under DX11, and that AMD DX11 drivers don't support multithreading properly, making the cards more CPU-dependent
 
Umm, you can only send work to the GPU in drawcalls - even compute shaders are converted to drawcalls at ISA level. Async compute/shaders is just an scheduling optimization to run them asynchronously and better utilize GPU units.

Regarding 6-7 ms, devs of Tomorrow's children claim this : https://twitter.com/selfresonating/status/738470011065372672


Right but now with DX12 you don't need to worry as much about draw call amounts anyways ;)

And again, we don't know anything about the system/systems they are talking about.
 
Do you guys think DX12 wil be a bigger benefit for AMD in general because of Xbox One? DX12 opens up for closer to hardware coding and since the game developers already developes for GCN on XBO the code will fit GCN on PC better than Nvidia architectures? I'm thinking the developers have a big enough incentive (Nvidia markedshare) and support from Nvidia to make optimized code for Nvidia GPU's that it wont be any big difference, except for bad console ports.

Doing ports is one thing, but doing bad ports is another. Will dev's that choose to do a pc port stick with no optimizations for nV hardware, where nV has so much marketshare they can't be ignored? I don't buy the whole console wins will drive pc marketshare though optimized games for said hardware. Never worked in the past, and it won't really work now. You might see slight fluctuations but nothing major. nV didn't build their marketshare based on consoles did they?
 
The command buffer generated by the driver is just a bunch of GPU consumable commands which includes state changes and GPU ISA (not OpenGL calls). This is read by the command processor to tell what each unit will be doing. At that level it's just a bunch of instructions for GPU and there is no difference between a graphics drawcall or a compute shader.
 
Well compute shaders can issue additional draw calls, but I don't think it happens all the time. You can't draw anything from compute shaders without a draw call being issued by the pixel or vertex shader prior.
 
Back
Top