DirectX 12: The future of it within the console gaming space (specifically the XB1)

It doesn't matter. The problem is that the results in the benchmark that doesn't use deferred contexts (DC) for the game that was proven to gain a lot from DC was not run with DC. I.e. that benchmark differences are inflated.
My problem with DX12 and all the praise it gets is that it's not different from Mantle (in my opinion, it's 99.999% Mantle) and the only reason for its existence is Nvidia and Microsoft not willing to use Mantle.

It's not Microsofts fault if they decide not to run the benchmarks the way you want to see them. Users have run the tests themselves on this forum the results are the following:
27 fps with deferred contexts and 23 without it

A mere 20% increase in performance compared to DX12. In most cases 20% is already considered and impressive boost, DX12 is bringing in anywhere between 300-500% on this particular benchmark.

Your problem with DX12 and all it's praise that it gets has nothing to do with Mantle, this is a thread about DX12, not DX12 vs Mantle, and not about Mantle. Besides no one ever in this thread mentioned anything about Mantle being a slouch either, but its understandable that on a Xbox Sub-forum about DX12 that we... talk about DX12 - because Xbox isn't running Mantle.

Your views of the political landscape of graphical APIs is not correlated with the performance benchmarks that have been showcased unless you can prove a link that the results Anandtech displayed and money being handed out by Microsoft. But Anandtech tell you outright that DX11 deferred contexts is not enabled - as opposed to lying about it not being enabled; this is not an elaborate attempt to boost DX12s image.
 
I concur. Opinions on the Mantle v DX12 propaganda don't belong here. Any reference to Mantle should be in relation to what it could bring to XB1 (using an AMD APU) and compare/contrast DX12 in that same context. There is, to be fair, a second AMD APU-based platform that won't be seeing DX12, so understanding the similarities and differences and how it affects the game development landscape is important in that respect - ie. Will DX12 methods be portable to PS4, or will devs face a multiplatforming schism the likes we never would have predicted given the incredible similarities in architectures?

As gamers, we're wanting better games and hopefully the new APU methods will enable that. For devs, better performance is always welcome, but they also want to see less hassle and aggro too. If one code base can serve the three platforms and still bring to bare the modern API benefits, that'd be awesome. That's worth discussing.

What the rest of the graphics industry feels about DX12 and AMD is irrelevant to this forum.
 
My problem with DX12 and all the praise it gets is that it's not different from Mantle (in my opinion, it's 99.999% Mantle) and the only reason for its existence is Nvidia and Microsoft not willing to use Mantle.

Shockingly enough, groups with similar education and experience, when seeking to solve the same problem, with the same tools tend to arrive at similar solutions. However, it is pretty naïve to assume that AMD hacking around with Mantle is similar with Microsoft committing to a new version of DirectX, and delivering that within parameters that meet the requirements placed on it. AMD seems to be stuck releasing the SDK (coming really soon now for a while), and can't even get things to be reliable and predictable between minor iterations of their hardware (see Tonga being not that great with Mantle - yes, yes, a patch might / will fix that). Do you honestly find the TAM associated with GCN 1.x (something here) and that associated with modern AMD, NVIDIA, Intel, QC, Imagination GPUs and the XBOX One (to name a few) comparable? Also, do you really think that the intricacies of addressing the latter could have been / were solved by AMD and just copy&pasted in a few months by the team doing DirectX? If so, why?
 
While I don't know if that Star Swarm benchmarks from Anandtech are really going to be representative of most D3D applications, it is curious that DX12 does seem to provide performance benefits beyond CPU-limited situations. The results in a GPU-limited scenario are not what I was expecting. The power measurement is also interesting. I wonder if next years games on Xbox One will have measured power consumption increases. Xbox One is most likely near CPU-limited in most games, and it has a very low-end multi-core CPU, so based on those results it seems to be an ideal scenario to gain some improvements from D3D12. Hopefully Microsoft will allow some devs leeway with the NDA to promote DX12. Maybe we'll get some first-hand insight into what the improvements are over the Xbox One's D3D11 with fast path semantics.
 
Dx12 on Android, or what are you saying? Wouldn't that rather be glNext?

EDIT: Wahh, this came way too late. Was asking sebbbi this.
 
... and Intel, Imagination (PowerVR) and Qualcomm. All of these companies are involved in DirectX 12. I think it is extremely valuable that Microsoft is pushing this highly efficient API also to mobile devices. Mobile devices gain the most from an efficient low level API.

I wish I was a fly on the wall in all the corporate meetings on Mantle or the reactions to it. There seems to be a clear PR emphasis with DX12 that has Intel, Nvidia, IMG, and whatnot over AMD. The standard's deliberations would have included AMD in its deliberations, so what prompted AMD to devote so much to Mantle?

Corporate shenanigans seems to be the basis of any answer, but that can overlap any number of scenarios from marketing to design gridlock.
 
Most here know this , but ill point it out anyway again.. DirectX is critical to Windows10.. MS really has doubled down on HW acceleration in its Modern stack, winrt. The majority of its 500,000+ store apps/games on phone/tablet/pc are CPU bound and they should get improvements due to the underlying frameworks/architectures they use will pivot on dx12..

WinRT-xaml which uses d2d/dcomp/dwrite you would expect these xml apps to get immediate improvements if xaml pivots on dx12, highly likely.

Similarly InternetExplorer and its rendering engine trident heavily uses d2d/dwrite/dcomp too, I fully expect that to also pivot and take advantage of dx12. So all modern web apps(winjs) and websites should get immediate improvements.

So whilst we concentrate on AAA games in our arguments, its these 'modern' CPU bound store apps/games that will probably get the immediate improvements and no game/app code need changing in lots of cases if the fx underneath does it right ..
 
While I don't know if that Star Swarm benchmarks from Anandtech are really going to be representative of most D3D applications, it is curious that DX12 does seem to provide performance benefits beyond CPU-limited situations. The results in a GPU-limited scenario are not what I was expecting. The power measurement is also interesting. I wonder if next years games on Xbox One will have measured power consumption increases. Xbox One is most likely near CPU-limited in most games, and it has a very low-end multi-core CPU, so based on those results it seems to be an ideal scenario to gain some improvements from D3D12. Hopefully Microsoft will allow some devs leeway with the NDA to promote DX12. Maybe we'll get some first-hand insight into what the improvements are over the Xbox One's D3D11 with fast path semantics.

I'm hoping someone could clarify more, I poised a question earlier but later scratched edit it out thinking I knew the answer, but the reality is that I don't.

I'm confused about the performance of all GPUs in regards to the starswarm demo. If the demo was made using very large batches or instances, many of you would agree that from a performance perspective it would do well. The removal of pressure on the draw calls and simultaneously you are allowed to optimize the batches to fit with the GPU, in this case perhaps you are looking to fulfill Warps/Waves to it's fullest.

What I don't understand is that with Mantle/DX12, the GPUs accept a ton more draw calls in parallel, all of a sudden the CPU limiter is gone, what happens then? Isn't each Warp/Wave now just being filled partially? How is in-order execution happening still? All the CPU cores are slamming the GPU, how could it possibly be so efficient at scheduling that each Wave/Warp is full before processing, that all the caches are getting hit and are there at the right time, that it's fetching from memory before calculations need to be done and reducing GPU stalls?

That being my question, are the inefficiencies of dealing with a large number of draw calls existent? Or is it performing similar to batch/instance jobs? Could we not make a prediction that if all the ships were batched or instanced a specific way, that the performance would be higher?

Is this where the concept of hardware graphics contexts come into play? So the GPU is swapping between contexts much like how an OS will switch between threads?
 
That being my question, are the inefficiencies of dealing with a large number of draw calls existent? Or is it performing similar to batch/instance jobs? Could we not make a prediction that if all the ships were batched or instanced a specific way, that the performance would be higher?

There's no need for a prediction. There is an explicit two-pass batch coalescing option in the Mantle path that combines batches. It is pointed out as the probable reason why the Mantle path beats AMD's DX12 path and why there is a regression at two cores. (edit: On the GPU scaling test between DX12 and Mantle)
 
There's no need for a prediction. There is an explicit two-pass batch coalescing option in the Mantle path that combines batches. It is pointed out as the probable reason why the Mantle path beats AMD's DX12 path and why there is a regression at two cores. (edit: On the GPU scaling test between DX12 and Mantle)

Oh right, that makes sense. So the benefits of enabling a lot of draw calls also serves as a detriment. So for developers a fine balance needs to be made between performance (larger batches, but shared shaders and materials), and graphical fidelity (individuality of shaders and materials).
 
As far as I understand, there is a GPU bottleneck in the command processor , Is it possible the reason for xbox one's multiple command processors?

A typical game will not be running large numbers of small draws. Indeed, it's a stupid idea especially on GCN. The only reason they'd do it is to handle these CPU benchmarks. You shouldn't be surprised to find programs written in unique ways exposing bounds that are completely uninteresting to nearly everyone else.
 
So for developers a fine balance needs to be made between performance (larger batches, but shared shaders and materials), and graphical fidelity (individuality of shaders and materials).
There are techniques that allow batching without sacrificing texture/material variety, such as virtual texturing and bindless textures. There's also lots of different techniques that reduce the count of shader permutations (deferred shading for example).
 
There are techniques that allow batching without sacrificing texture/material variety, such as virtual texturing and bindless textures. There's also lots of different techniques that reduce the count of shader permutations (deferred shading for example).

Thanks Sebbbi. From the sounds of it the benefits of enabling an enormous number of draw calls is not that great of a benefit, many work arounds exist for todays types of games. I suppose indie/smaller developers will benefit from not having to optimize as well as AAA developers know how to. Like just if you peruse around Unity forums or what not, a lot of people seem to get draw call limited faster than they expected.

But I guess my question is, why is the focus so much around draw calls, is there a future mode of programming that will require it? Is there a particular type of work that requires a large number of dispatches and there aren't many effective ways to get around it?
 
all this discussion of "draw calls", its probably a good time to bring back this post from 2013 .. seeing as we are now here ...

"2013: The Road to One Million Draws"

Imagine a scenario where every leaf on a tree could be a separate, independent draw, or where every soldier in an army could be unique but rendered with a single OpenGL command. That is what we’re trying to achieve here and is a fairly attainable goal with a modest investment of effort. Future GPU and API features will allow us to push more and more of scene graph management, state change processing and high level decision making to the GPU, improving performance further.
 
Last edited:
Well, I think there is a question that is missing here. How does the ability to give the GPU a huge load of draw calls effect GPU performance. We see, that the CPU is now more or less jobless (in relation to the count of draw calls of current games). As far as I know, developers try to reduce the draw calls as good as possible so they don't get slowdown's because of the cpu. so what is the difference for the GPU of getting many draw calls instead of less. Does this provide the ability to work on a draw call with less resources used on gpu, too (e.g. because of smaller draw calls; less preparations needed; less dependencies; ...)?


funny thing about this... found a really really old NVidia presentation
http://www.nvidia.com/docs/IO/8228/BatchBatchBatch.pdf
 
0ky9h3i9dloyvk18ya8i.jpg
 
Well, I think there is a question that is missing here. How does the ability to give the GPU a huge load of draw calls effect GPU performance. We see, that the CPU is now more or less jobless (in relation to the count of draw calls of current games). As far as I know, developers try to reduce the draw calls as good as possible so they don't get slowdown's because of the cpu. so what is the difference for the GPU of getting many draw calls instead of less. Does this provide the ability to work on a draw call with less resources used on gpu, too (e.g. because of smaller draw calls; less preparations needed; less dependencies; ...)?
It depends. If the draw calls have identical state and are larger than 64 vertices, the GPU can be fully fed all the time. However if each of your draw calls accesses separate vertices from the memory, there will be cache stalls at the beginning of each draw call (bigger draw calls amortize this startup cost). If there are state changes between the draw calls, you need larger draw calls to reach peak GPU utilization, because the GPUs cannot render unlimited amount of parallel draw calls with different state (each state needs to be stored somewhere and there is limited space). Some state changes cause various pipeline flushes and other nasty things. For example you do not want to change the depth/stencil modes, render targets or viewport/scissor often.

Basically you get the best performance by either doing a tight loop on CPU side (no state changes, just change the vertex start index) or by doing a single big multi-draw indirect call (= GPU side command buffer loop, where the GPU itself reads the draw call parameters directly from the GPU memory). Multi-draw indirect is basically free for the CPU, while the CPU side tight loop cost depends on the graphics API. If you look at the Anandtech benchmarks it seems that a dual core CPU is enough to make the GPU command processor the bottleneck in DirectX 12 and Mantle (on current Nvidia and AMD hardware). However it's not possible to draw definite conclusions from an unfinished graphics engine running on a unfinished graphics API and unfinished drivers.
 
Back
Top