DirectX 12: The future of it within the console gaming space (specifically the XB1)

Well, I think there is a question that is missing here. How does the ability to give the GPU a huge load of draw calls effect GPU performance. We see, that the CPU is now more or less jobless (in relation to the count of draw calls of current games). As far as I know, developers try to reduce the draw calls as good as possible so they don't get slowdown's because of the cpu. so what is the difference for the GPU of getting many draw calls instead of less. Does this provide the ability to work on a draw call with less resources used on gpu, too (e.g. because of smaller draw calls; less preparations needed; less dependencies; ...)?


funny thing about this... found a really really old NVidia presentation
http://www.nvidia.com/docs/IO/8228/BatchBatchBatch.pdf

Need Sebbbi maybe to kindly shed some light on that. My experience was only related to XNA on 360, and it was easy to get slowdowns from too many draw calls (granted I was doing that on purpose). However, with a project that was rather heavy on particles I did have to make optimizations to avoid some hiccups during more gory sections with lots of AI fighting you. I need to finish that game or port to Unity. LOL

And Edit - Thanks Sebbbi!
 
But -- Max already posts here. Max... come talk to us first ;)
I'm sure we can ask the right questions and get the right tools together for an internet-view
0ky9h3i9dloyvk18ya8i.jpg
 
However, it is pretty naïve to assume that AMD hacking around with Mantle is similar with Microsoft committing to a new version of DirectX

DX10 was out in 2006. It was obvious then that DX10 is not the solution, and AFAIK a lot of developers were begging for the feature set (mostly reducing CPU load, draw calls, etc.) now known as DX12 to be in fact inside DX10. And then for the same to happen in DX11. And only now, in 2015, for some strange reason these features suddenly appear. My understanding would be that MSFT "committed to DX12" solely because a certain single-vendor solution was heavily developed.
More than that, I've seen numerous developers trying to persuade MSFT that the change is needed, but it didn't happen up until now.
More than that I think you can even find some of my posts from 8-9 years ago on this very forum about how many problems with CPU load DX9 and DX10 has.
 
because the GPUs cannot render unlimited amount of parallel draw calls with different state

Obviously they can. On low level GPU has a "draw call per triangle strip" model. The "state change" is an emulated feature that does not exist in hardware. Hardware has no state and doesn't care about any state. Constants and "registers" (in their D3D sense) are totally artificial constructs that do not exist in modern hardware. On low level your granularity is limited solely by wave-fronts and similar code running per-vertex and per-pixel in each ALU when particular thread/context is running.

by doing a single big multi-draw indirect call

Which is the best solution under current circumstances. But AFAIK still has quite lot problems in specific drivers.
 
I feel quite sorry for AMD.

They have just about everything against them ... Intel's money, process, advantage, and multi-billion incentive scheme to not use AMD products like Kaveri, Nvidia's R&D budget, high turnover of management, etc, etc ... yet they still manage to drive forward a much needed initiative like Mantle, and then everyone else kicks them into the shins, takes the project and runs off with it.

Up next: Nvidia and Intel take Adaptive sync via display port and run off with it.
 
Since we are on topic of draw calls, I read a page from the Mantle thread that small draw calls many will perform the same as batched - state validation only occurs if the state is changed otherwise it will leave things as is. Is this correct?

Also it brings up culling, as more draw calls to build something will allow for fine grained culling. I didn't even know! I always thought the GPU would just subtract the triangles it can't see, I didn't know that it was the whole mesh or none of the mesh.

How do batches and culling work? If you batch 100 trees in the area but 50 are covered by a mountain are you still forced to draw all 100?

I'm going to research lol, maybe someone will beat me to it.

edit: so can split a mesh if an object is partially/fully occluded. Then gather up the remaining meshes to render. Man this sounds like a lot of work
 
Last edited:
I feel quite sorry for AMD.

They have just about everything against them ... Intel's money, process, advantage, and multi-billion incentive scheme to not use AMD products like Kaveri, Nvidia's R&D budget, high turnover of management, etc, etc ... yet they still manage to drive forward a much needed initiative like Mantle, and then everyone else kicks them into the shins, takes the project and runs off with it.

Up next: Nvidia and Intel take Adaptive sync via display port and run off with it.

I thought DX12 was long in development, before Mantle and AMD was able to develop that in the interim with some of the features that they already knew about DX12 from being partners?
 
I thought DX12 was long in development, before Mantle and AMD was able to develop that in the interim with some of the features that they already knew about DX12 from being partners?
Only indication of that was some NVIDIA rep claiming such, which tbh sounded like they had to give the impression that "we were doing it before amd did it"
 
I thought DX12 was long in development, before Mantle and AMD was able to develop that in the interim with some of the features that they already knew about DX12 from being partners?

For what we know, it is more the Mantle features and design who have end on DX12...
 
Obviously they can. On low level GPU has a "draw call per triangle strip" model. The "state change" is an emulated feature that does not exist in hardware. Hardware has no state and doesn't care about any state. Constants and "registers" (in their D3D sense) are totally artificial constructs that do not exist in modern hardware. On low level your granularity is limited solely by wave-fronts and similar code running per-vertex and per-pixel in each ALU when particular thread/context is running.
You need to think of the entire GPU. Shader state is emulated, but parts of the fixed function pipeline have state limits. This is probably what sebbbi was referring to.
 
For what we know, it is more the Mantle features and design who have end on DX12...
A bit OT for this but glNext is Mantle(or a fork), so be interesting to watch all three and how close the performance is on console. Assuming glNext is used on the other console. Interesting stuff for sure, look forward to what it brings.
 
A bit OT for this but glNext is Mantle(or a fork), so be interesting to watch all three and how close the performance is on console. Assuming glNext is used on the other console. Interesting stuff for sure, look forward to what it brings.
If glNext is a fork that would explain a lot of the behaviour from AMD. I'm not sure if it's going to be implemented in PS4 vs their GNM/GNM+ or if it'll impact their own development, but they do support openGL today, so it would make sense to.
 
Obviously they can. On low level GPU has a "draw call per triangle strip" model. The "state change" is an emulated feature that does not exist in hardware. Hardware has no state and doesn't care about any state. Constants and "registers" (in their D3D sense) are totally artificial constructs that do not exist in modern hardware. On low level your granularity is limited solely by wave-fronts and similar code running per-vertex and per-pixel in each ALU when particular thread/context is running.
This is true for compute shaders on GCN. Compute is fully bindless. Resource descriptors are loaded to scalar registers of the CU. Each wave could be running a different compute shader. However this is not true for all the rasterization state. Try for example to change your scissor rectangle between every draw call and check the CU occupancy (hint: it's not going to look pretty). GCN is fully bindless, but it is not stateless.
 
Well we should have more information really soon, as Valve and AMD will present "glNext" at the GDC next month.
 
I thought DX12 was long in development, before Mantle and AMD was able to develop that in the interim with some of the features that they already knew about DX12 from being partners?

So much goes on behind the scenes of API development that I'm not sure we will know the full story, or if given the number of players and viewpoints we can.
It could very well be that there was some kind of iteration on DX being worked on by stakeholders. There would have been multiple visions on where it would be taken, and we have a long history of sub-versioning of DX and rumored late-stage regressions in featuresets that hint at the difficulty in resolving conflicts when different vendors either lack hardware capability now--and you can ill afford to abandon significant players, or are at odds over what direction the future should take.
A lot of things can be done to shift things one way or another, and each party can cite their fraction of the overall story to justify their desired narrative.
 
I'm not sure if it's going to be implemented in PS4 vs their GNM/GNM+ or if it'll impact their own development, but they do support openGL today, so it would make sense to.
I'd be surprised if an additional API found it's way only PS4. I don't even believe PS4 support OpenGL ES (unlike PS3) and they only reason Sony would add a third is if it's widely used already. It'll probably take a while to gain traction and unlike PS3, I've read zero complaints about PS4 (or specifically the GNM/GNMX APIs) being difficult to work with or adapt too.
 
Try for example to change your scissor rectangle between every draw call and check the CU occupancy (hint: it's not going to look pretty).

It depends if it's because of driver behaviour or rasterizer itself. I would still bet on driver here.
Although it could also be rasterizer, we do know that rasterizer does some pretty non-obvious stuff there (cache misses because of scan order anyone?).
But ok, I concur, the FFP parts are still a problem, let's hope it will be removed soon.
 
A bit OT for this but glNext is Mantle(or a fork), so be interesting to watch all three and how close the performance is on console. Assuming glNext is used on the other console. Interesting stuff for sure, look forward to what it brings.
Is there actual source for this?
 
A couple of people have mentioned feeling sorry for AMD because of DX12. But I wonder if all of this actually plays into their hands and this was all a calculated risk. I've read that AMD's inferior CPU cores perform closer to Intel when programs are designed for more parallel computing. If Windows 10 does more to take advantage of that as well as DX12 wouldn't it make AMD APUs a very good purchase once they get more tablet and ultra book design wins. Especially since they should be able to get Windows free based on price under Microsoft's new program. Add the new single motherboard design and they could have some really cheap machines that would really push Intel if Carrizo or something later really hits good marks.
 
Back
Top