DirectX 12: The future of it within the console gaming space (specifically the XB1)

Oy2cVIU.png
This is a nice iterative improvement over the OpenGL 4.4/4.5 multi-draw indirect.

You don't need to ask me how happy I have been about getting this feature finally to DirectX :). And we got asychronous compute (multi engine) too :)
 
Last edited:
That is an RTS I want to play. Lol battles look bad ass. Reminds me of total annihilation and a bit of dark reign and some stuff I've never seen before
 
Andrew: Do you know whether the modified Intel asteroids demo (with ExecuteIndirect) shown by Max in his presentation was running on Intel (Broadwell) GPU? If so, then I must thank you for very nice ExecuteIndirect performance. It was good to see some (10%+) GPU gains as well (the huge CPU gains were obviously expected).
 
Core i7-4770R
It is the Haswell flagship GPU. Broadwell eDRAM versions have not yet been announced.

Intel® Iris™ Pro Graphics 5200
Graphics Base Frequency: 200 MHz
Graphics Max Dynamic Frequency: 1.3 GHz

It would be interesting to see the GPU clock rate. It might be that the greatly reduced CPU utilization allows the GPU to run at slightly higher clock speed when ExecuteIndirect is used. It is great to see that bindless makes Intel GPUs faster as well and ExecuteIndirect (/MDI) is also beneficial on Intel GPUs. We already know from the OpenGL 4.4 benchmarks that MDI is the fastest way to push draw calls on NVIDIA and AMD GPUs (assuming you don't need to change state/bindings/shaders).

ExecuteIndirect adds support for binding changes, so it is applicable on more situations than MDI. What we are still lacking, is support for state and shader changes. NVIDIA has a new custom extension that allows limited state changes by the GPU between the draws. The future seems clear. Soon the GPU can do all the necessary steps to render complex scenes by itself.

Another great thing Max mentioned in his DX12 presentation is predicated rendering. The GPU can finally read predicates from GPU buffers, meaning that GPU can instruct itself to skip over sections of the command buffer based on compute shader calculation results. This means that we have basically a simple form of command buffer branches. ExecuteIndirect implements a simple form of command buffer loops. If ExecuteIndirect gets expanded in the future to allow looping arbitrary command buffers, we are very close to the goal of fully programmable command processor already. I need to start writing a DX13 wish list already. Hopefully the hardware vendors are listening :)
 
Last edited:
Pretty interesting how UAVs, in at least one D3D11 driver, relied upon heavy CPU analysis/optimisation (to determine parallelism regardless of barriers) to get a big boost in performance.
 
Last edited:
Should have been named "Improvements to CPU/GPU efficiency in DirectX 12" :)

  • Parallel execution engines - 11:30-18:00
  • GPU efficiency (queries, predication, execute indirect) - 18:00-25:30
  • CPU overhead (resource binding, multithreading) - 26:05-36:15


  • Ashes of the Singularity demo - 36:15 - 45:15
    Dan Baker, Graphics Architect, Oxide Games
He talks about actually using all these features to draw thousands of game objects...
 
Back
Top