DirectX 12 API Preview

Is it true that writing better code and draw things in batches would overcome every benefits D3D12 have with low cost to draw calls?
No. The API and driver model have been redesigned to enable multithreading.

First, any Direct3D 12 GPU has to support virtual address space for addressing resources in physical memory. All GPUs should support page table translations, and more advanced GPUs support virtual address space shared with the CPU. Previously, kernel mode driver had to verify and convert ("patch") all such addresses before submitting the batch to the GPU. This is what incurred a significant overhead and prevented multithreaded operation - previously only one thread could submit graphics commands to the kernel mode driver.

Second, new command lists and pipeline states are read-only (non-mutable), which eliminates read-write operations and the need for inter-process synchronisation. This enables multithreaded command submission to the user-mode driver.

These two are essentially what makes Direct3D 12 much more CPU efficient while at the same time allowing 10x more draw calls; other things like explicit resource management and pipeline state management also relieve the runtime and driver from doing a lot of guess work required by "automatic" management.

2806.cpucompare.png

https://devblogs.microsoft.com/directx/directx-12/

is it also a pure technical limitation with D3D11 which no optimizing can overcome?
Exactly, it's a problem with design of the API and driver model.

BTW Microsoft, Nvidia and ATI have been planning improvements to virtual memory support since at least 2006/Vista, not sure why did it take them so long...
 
Last edited:
Will not being forced to have your engine optimize for drawcall batching or perhaps the better CPU/GPU integration also allow more dynamic everything? E.g. dynamic environmental destruction, mesh morphing and what not? Or was DirectX never the bottleneck there?
 
What do you mean "not being forced to have your engine optimize for drawcall batching"? You pretty much are being forced to use command lists/bundles and command queues, this is now the only possible way to submit graphics commands. You are also humbly advised to use pretty long command lists and multithreaded draw call submission (i.e. multiple command queues in separate threads) to make better use of driver and runtime-level parallelism.

https://channel9.msdn.com/Events/Build/2014/3-564
Direct3D 12 API Preview
0:32:10 0:41:15 bundles
0:41:15-0:51:10 command lists and command queues
0:53:10-0:58:40 CPU parallelism
0:58:45-0:60:00 Direct3D/UMD/KMD profiling
 
Last edited:
Will not being forced to have your engine optimize for drawcall batching
I think sort by shader will still be a thing (unless you can go the ubershader route), IIRC changing shaders can still be costly... someone please correct me if I'm wrong. I do think more draw calls opens up a few avenues in regards to techniques though. I don't know enough about dynamic geometry to comment and again no experience but I think environmental destruction was more of a memory and fillrate issue... again someone please correct me if i'm wrong.

You pretty much are being forced to use bundles
Actually I remember reading somewhere that if you don't have to use bundles, don't.
 
Is it true that D3D12's lower cost to draw calls only helps bad console ports and bad coding in general? Is it true that writing better code and draw things in batches would overcome every benefits D3D12 have with low cost to draw calls?
It depends heavily on engine architecture. Some engines give technical artists lots of freedom to do whatever they want. Technical artists can freely configure rendering passes and shaders with visual flow graph editors. This kind of tools produce a huge permutation of shaders and lots of different materials with various inputs. It is difficult to automatically batch data sets like this. If your tools and engine architecture prioritizes technical artist productivity, ease of prototyping and iteration time over technical factors (such as the count of draw calls, count of rendering passes and count of shader permutations), DirectX 12 is a huge improvement over DirectX 11.

Some engine pipelines are heavily programmer oriented. All data flow is highly optimized (bit packed optimized data) and rendering passes are tightly coupled together to reduce the extra memory traffic. There's not much freedom for artists to add new rendering passes, modify data layouts or add completely new kinds of shaders. Pipelines like this tend have small amount of shader permutations (usually physically based) and are using deferred rendering (and other techniques that expose some limits to the input and output data and draw call ordering). Lighting and post processing is nowadays usually done with (highly optimized) compute shaders (authored solely by programmers). It is much easier to batch pipelines like this. In some extreme cases a pipeline like this doesn't need more than a single draw call to render the entire visible scene. In this extreme case DirectX 12 cheap draw calls do not bring much (if any) performance gains. Other DirectX 12 features such as ExecuteIndirect, asynchronous compute, ROV and conservative rasterization however might be very useful.

DirectX 12 is a huge improvement for console porting in general. Finally we can do low level GPU memory/resource management on PC. We can use our own optimized resource management systems that are hand optimized for our engine's data streaming model. With DirectX 11 you had to pray that the driver was clever enough to do the right thing. This often resulted in stuttering and random frame rate spikes on PC. Players complained bad porting, and GPU manufacturers had to create case by case optimizations to their drivers. graphics programmers had to maintain and optimize two completely different resource management models. Comparing DirectX 11 to Java is a nice way to explain the problems: you have no way to optimize the memory layout, garbage collection causes random stalls and each virtual machine behave differently (regarding to these two). A common tip to make garbage collected languages run a game smoothly is to avoid memory allocations alltogether when a level is running. Similarly GPU manufacturers recommend you to create all your DirectX 11 graphics resources at loading time, because the driver will cause stalls otherwise.
 
Last edited:
Then you were giving a wrong answer
I couldn't find the main source I was looking for but two of the sources I was basing my statement on:
Efficient-Rendering-with-DirectX-12-on-Intel-Graphics.pdf - page 20
Getting-the-best-out-of-D3D12.ppsx - slide 25

to the question that nobody asked.
I was responding to your saying to being forced to use bundles... sorry for trying to be informative and accurate.
 
Efficient-Rendering-with-DirectX-12-on-Intel-Graphics.pdf - page 20
Getting-the-best-out-of-D3D12.ppsx - slide 25
It doesn't really say that bundles should be avoided at all costs.

Bundles
Reusable command lists to further lower CPU overhead

  • Some minimal state inheritance is allowed
    • Some patching may occur at submission time
    • If you don’t need to inherit something, set it (again) in the bundle
  • Overhead is already very low in DirectX 12
    • Need ~10+ draws to make bundles a win on Haswell/Broadwell
    • Only consider bundles if you have lots of static draws that can’t reasonably be combined (via instancing or similar)
    • Don’t add any GPU overhead/indirections to enable bundles!
Bundle Advice
  • Aim for a moderate size (~12 draws)
    • Some potential overhead with setup
  • Limit resource binding inheritance when possible
    • Enables more complete cooking of bundle
I was responding to your saying to being forced to use bundles...
No, I said "forced to use bundles, command lists and command queues".

sorry for trying to be informative and accurate.
It's OK, not everybody's forte.
 
Last edited:
It doesn't really say that bundles should be avoided at all costs.
FOR THE SECOND TIME that's NOT what I said. And the documents I posted imply there is setup overhead, and submission overhead when inheriting bindings (driver patching required). In addition it suggests upwards of 12 draws a bundle and to use instancing or a similar technique if practical. So in other words don't automatically use bundles unless certain conditions are met.

No, I said "forced to use bundles, command lists and command queues".
You're hardly being forced to use bundles when there are guidelines for when to and when not to use them.

It's OK, not everybody's forte.
I love how you changed the font size, real classy.
 
"Stay on target !"-sic

(I'd like not to use my ban hammer, and I don't like using my delete hammer either !)
 
I said, in a response to the original question, that you cannot avoid "being forced to have your engine optimize for drawcall batching" because it's an essential part of the API and you are now "being forced" to use this "batching" - specifically command lists and/or bundles attached to multiple command queues. If I didn't add two dozen explanatory footnotes, it's because I felt that this is not what the original poster wanted to know, but you probably know better.
 
Along with just cause 3 we now have another game announcing DX12 integration which includes making use of the FL 12_1 features exclusive to Maxwell and Skylake. It looks like FL12_1 may end up being used more extensively and thus being more important than a lot of us expected from quite early on. Hopefully Arctic Islands provides a full implementation.

http://www.dsogaming.com/news/codem...ter-ordered-views-conservative-rasterization/
 
GPU manufacturers had to create case by case optimizations to their drivers
Does dx12 mean we will see an end to this and drivers that are several hundred mb in size because they contain some huge case statement like
case exe name of
aaa.exe then
........
bbb.exe then
........
ect.
 
Does dx12 mean we will see an end to this and drivers that are several hundred mb in size because they contain some huge case statement like
case exe name of
aaa.exe then
........
bbb.exe then
........
ect.
I think as I understand it, should be the result. There is no older versions of dx12, and if it's strict it should either work or not work. A lot of the old mistakes with APIs shouldn't carry forward (but mistakes can happen).

This would benefit other manufacturers in the GPU space since its way too costly to have coding teams creating drivers specifically for poorly coded games.
 
Does dx12 mean we will see an end to this and drivers that are several hundred mb in size because they contain some huge case statement like
They're several hundred megabytes in size not because of app-specific code, but because there are too damn many APIs and SKUs :) So no, DX12 only makes that works.

I'm actually not sure why in the days of windows update we haven't gone back to non-"unified" drivers... it doesn't make a whole lot of sense to be downloading code for a GPU that you don't even have these days.
 
They're several hundred megabytes in size not because of app-specific code, but because there are too damn many APIs and SKUs :) So no, DX12 only makes that works.

I'm actually not sure why in the days of windows update we haven't gone back to non-"unified" drivers... it doesn't make a whole lot of sense to be downloading code for a GPU that you don't even have these days.
The size of the Driver update I'm on board, but the frequency of the updates (per release for a large AAA) game is something else though no?
 
I'm actually not sure why in the days of windows update we haven't gone back to non-"unified" drivers... it doesn't make a whole lot of sense to be downloading code for a GPU that you don't even have these days.
Users have clearly showed nVidia that driver packages are confusing and that you need to give them your e-mail address to get the latest gamer drivers.
 
Back
Top