DX12 Performance Discussion And Analysis Thread

I haven't seen the AMD vs Nvidia divergent recommendations till now.

edit - I also hadn't seen this:
  • Scheduling latency on the operating system’s side takes 60 microseconds, so developers should put at least more than that in each call, otherwise what’s left of the 60 microseconds would be wasted idling.
 
Last edited:
Shader Model 6? The shader model shipped with DirectX 12 is 5.1 which is essentially SM 5.0 + direct resource indexing (oh, and Root Signature via HLSL).. Yeah, we do not have a true new shader model since SM 4.0...

IMG_2839.jpg


More
IMG_2847.jpg

IMG_2848.jpg

IMG_2849.jpg


More here: http://www.dualshockers.com/2016/03...-coming-in-2017-as-microsoft-shares-new-info/
 
cool, hope to see compute shaders improvements!
I was aware about some consideration for a new FL for 10Level9 class hardware only..

FL 12_0+ only? need to investigate.. Maybe kinda ocl2.x features are required..
 
Last edited:
Big thumbs up for "wave level operations" and "64 bit uints". Templates also allow to write more generic shader libraries (similar to Thrust on CUDA). I am happy :)

Hopefully we also get 64 bit atomics. Media Molecule style point renderer needs them...

I am hoping "lambdas" mean that we can write whole shaders as lambdas, and have one shader dispatching any amount of different shaders. Similarly than NVIDIA dynamic parallelism (and the similar OpenGL 2.0+ feature). This is one of the biggest missing features in DirectCompute compared to CUDA and OpenCL.
 
Last edited:
Nope, loops semantics allows easier unrolling, where goto does not.

Virtuals most of times add useless additional indirections and debugging headaches.
 
We're talking about "need", like "must have". We don't "need" anything beyond a brain and a hex-editor (+ the unmentioned obvious).
I prefer sometimes, that other people take some work off of me, I don't want to program a C++ compiler with my hex-editor just to feel comfortable while programming. "virtual" is a tool, and anyone wanting to have virtual has a right to ask for it, otherwise she will make the virtual work, with his brain and the hex-editor regardless what the language supports officially.

My point being, there is no objective argument for or against any of the mentioned "convenience helpers", like virtual/if/while/etc. Whenever it helps a subset of the audience it's valid to include it.
 
We are talking about HLSL, a tool for writing shaders. What would the impact of the dynamic binding on a shader? Devs must be careful when setting and updating many things like the root argument layout (I am not talking about GCN 15/16 slot limits) and then there is the possibility to add virtual methods? No, thank you.
I am not going to play a programming paradigm war, but I really do not see the purpose of virtuals in HLSL.
 
What's the worst a virtual could cause? Divergent branches in a wave? And that's only if you are stupid enough to actually mix inheriting objects in a single buffer.

That one level of indirection it adds per default (and even that only if the compiler can't eliminate it by strong type assertions) shouldn't be all to bad.

Apart from that, virtuals are mostly a tool for reducing the complexity when using OOP.
Sometimes you just need clean separations of different implementations for the sake of complexity, and a vtable managed by the compiler is just a much cleaner expression than a handcrafted jump table in combination with a hand-passed state variable.
 
It is late so I have not had a chance to look at this myself but the sponsored talk presentations from GDC2016 are now open hooray :)
Here is the one I mentioned earlier that looked like it was going to be interesting as it is meant to discuss Intel's collaboration with Avalanche on Just Cause 3 and implementing CR/ROV along with other aspects of DX12.
https://software.intel.com/sites/de...tions-and-DirectX-features-in-JC3_v0-92_X.pdf
Cheers
 
Having a discussion on another forum, and this has come up. The context is async compute performance across different gens and different IHV's and future products might be helped with hardware support of prediction of work loads, so less work will be need to be done by the programmer to get better async performance on different hardware.

A person i'm talking to has stated this

That's one area where the queue priorities would probably help. There is likely some control, albeit in drivers, over work distribution. Limit graphics to 70% occupancy, reserving 30% for compute for example. This will probably get exposed in Vulkan, DX12 is another matter(one compute queue). Those features should make async somewhat self tuning based on hardware. I know ACEs are programmable(drivers). It would make sense the work distributor could be configured as well. Score shaders by tex/memory:math ratio and attempt to balance all the compute units.

My inclination is that the ACE's and work distributor arn't this smart to do such things and drivers will have to be re-hauled and made much more complex to accommodate on current hard and new and upcoming hardware there would be some costly transistor amounts to make these kinds of predictions and probably won't be seeing them any time soon in hardware. I cited Intel's experience with branch prediction and hyper threading as it is akin to async. I would like to see if there is anything that I'm missing. I can see this happening in the future, but I don't see it happening right now.
 
My inclination is that the ACE's and work distributor arn't this smart to do such things.
Depends on the revision of the GCN architecture.

1.0 certainly isn't, the ACEs had not been programmable back then at all.
1.1 is programmable, but the space is limited, and the space is required for the queue decoding logic.
1.2 might be able to do such a thing. But I'm not entirely sure what the HWS/ "new" ACE units on Tonga and Fiji are actually doing right now.
 
Slides of GDC 2016, DirectX, Vulkan, are available to download now : http://gpuopen.com/gdc16-wrapup-presentations/

Including :

D3D12 & Vulkan: Lessons Learned
Matthaeus Chajdas (AMD)

Practical DirectX 12 – Programming Model and Hardware Capabilities Gareth Thomas (AMD), Alex Dunn (NVIDIA)

Vulkan Fast Paths Graham Sellers (AMD), Timothy Lottes (AMD), Matthaeus Chajdas (AMD)

Let Your Game Shine – Optimizing DirectX 12 and Vulkan Performance with AMD CodeXL Doron Ofek (AMD)

Right on Queue: Advanced DirectX 12 Programming Stephan Hodes (AMD), Dave Oldcorn (AMD), Dan Baker (Oxide)
 
Back
Top