DirectX 12: The future of it within the console gaming space (specifically the XB1)

Hello all,

I think the whole idea of DX12, glNext, or whatever new multi-threaded API we're talking about is mainly to have a low-cost performance boost for legacy engines. In my opinion the industry will (some even have already) move on to use the GPU to run the scene graph and submit work for itself, and that doesn't require any sophisticated threading on the cpu side (in the api i mean). I'm not exactly sure if XB1's gpu is or isn't capable of this (i'd guess it is), but to me it seems a bit pointless nowadays to put so much emphasis on multi-threaded cpu submission when that's not where the biggest wins are. If the console GPU's are NOT capable of doing it, then it might be worth it, otherwise it's just a stopgap.
 
Hello all,

I think the whole idea of DX12, glNext, or whatever new multi-threaded API we're talking about is mainly to have a low-cost performance boost for legacy engines. In my opinion the industry will (some even have already) move on to use the GPU to run the scene graph and submit work for itself, and that doesn't require any sophisticated threading on the cpu side (in the api i mean). I'm not exactly sure if XB1's gpu is or isn't capable of this (i'd guess it is), but to me it seems a bit pointless nowadays to put so much emphasis on multi-threaded cpu submission when that's not where the biggest wins are. If the console GPU's are NOT capable of doing it, then it might be worth it, otherwise it's just a stopgap.

This was actually discussed a few posts earlier as making the most sense since the GPU needs to know where everything is anyway. I've never heard of a GPU submitting it's own work though. But from Sebbbi's posts earlier it makes the most sense to move that way.
 
I think nVidia talked a bit about this with kepler/maxwell with dynamic parallelism, but I'm not too sure how extensive it was.

Dynamic Parallelism in CUDA is supported via an extension to the CUDA programming model that enables a CUDA kernel to create and synchronize new nested work. Basically, a child CUDA Kernel can be called from within a parent CUDA kernel and then optionally synchronize on the completion of that child CUDA Kernel. The parent CUDA kernel can consume the output produced from the child CUDA Kernel, all without CPU involvement.

Wicked.
Slide deck on how it works. this is so cool
http://on-demand.gputechconf.com/gtc/2012/presentations/S0338-GTC2012-CUDA-Programming-Model.pdf
 
Dynamic Parallelism in CUDA is supported via an extension to the CUDA programming model that enables a CUDA kernel to create and synchronize new nested work. Basically, a child CUDA Kernel can be called from within a parent CUDA kernel and then optionally synchronize on the completion of that child CUDA Kernel. The parent CUDA kernel can consume the output produced from the child CUDA Kernel, all without CPU involvement.

Wicked.
Slide deck on how it works. this is so cool
http://on-demand.gputechconf.com/gtc/2012/presentations/S0338-GTC2012-CUDA-Programming-Model.pdf

Definitely seems like a good candidate for DX12.
 
This was actually discussed a few posts earlier as making the most sense since the GPU needs to know where everything is anyway. I've never heard of a GPU submitting it's own work though. But from Sebbbi's posts earlier it makes the most sense to move that way.

Sorry, i must have missed it.
It's almost doable with the draw*indirect family (assuming all else bindless or otherwise same-stated), the only missing bit is that you still have to manually submit the "dummy" calls, altho i'm inclined to assume that they're already a lot more lightweight since no state change occurs between them. Now that i think about it, i'd be surprised if it couldnt be properly done already on consoles.
 
No worries. It is a long thread.

I'm not sure if consoles can do it, it sounds like bundles to me what you are describing.

Direct3D 12 introduces a new model for work submission based on command lists that contain the entirety of information needed to execute a particular workload on the GPU. Each new command list contains information such as which PSO to use, what texture and buffer resources are needed, and the arguments to all draw calls. Because each command list is self-contained and inherits no state, the driver can pre-compute all necessary GPU commands up-front and in a free-threaded manner. The only serial process necessary is the final submission of command lists to the GPU via the command queue, which is a highly efficient process.

In addition to command lists, Direct3D 12 also introduces a second level of work pre-computation, bundles. Unlike command lists which are completely self-contained and typically constructed, submitted once, and discarded, bundles provide a form of state inheritance which permits reuse. For example, if a game wants to draw two character models with different textures, one approach is to record a command list with two sets of identical draw calls. But another approach is to “record” one bundle that draws a single character model, then “play back” the bundle twice on the command list using different resources. In the latter case, the driver only has to compute the appropriate instructions once, and creating the command list essentially amounts to two low-cost function calls.
 
I just don't think AMD handed over the GCN & Jaguar IP and told MS "do whatever you want with it".

The "major shift from past" refers, IMO in that context, to the fact it's just MS & AMD now, while before it was MS + 2 others.

Just look at the things Sony "did" with PS4 APU, for example the ACE's doing 8 threads each was supposedly Sony's idea, and it's in all GCN 1.1 GPU/APUs including XB1's, same with the other stuff pointed out as "Sony did this" were found to be in at least other GCN 1.1 chips - I don't think MS is any different in this regard.
 
Dynamic Parallelism in CUDA is supported via an extension to the CUDA programming model that enables a CUDA kernel to create and synchronize new nested work. Basically, a child CUDA Kernel can be called from within a parent CUDA kernel and then optionally synchronize on the completion of that child CUDA Kernel. The parent CUDA kernel can consume the output produced from the child CUDA Kernel, all without CPU involvement.

Wicked.
Slide deck on how it works. this is so cool
http://on-demand.gputechconf.com/gtc/2012/presentations/S0338-GTC2012-CUDA-Programming-Model.pdf
OpenCL 2.0 introduced similar vendor neutral functionality. Both AMD (GCN) and Intel (Broadwell) already support it. Nvidia doesn't yet have OpenCL 2.0 drivers (but the hardware definitely supports the feature, as this CUDA feature is a super-set of the OpenCL 2.0 equivalent).
It's almost doable with the draw*indirect family (assuming all else bindless or otherwise same-stated), the only missing bit is that you still have to manually submit the "dummy" calls, altho i'm inclined to assume that they're already a lot more lightweight since no state change occurs between them. Now that i think about it, i'd be surprised if it couldnt be properly done already on consoles.
OpenGL 4.4 introduced extended version of multi-draw indirect. With this new standard ARB extension (https://www.opengl.org/registry/specs/ARB/indirect_parameters.txt) the GPU can read the draw call count from a GPU buffer, meaning that you don't need empty (dummy) draw calls anymore (assuming you do draw call array compaction on the GPU / use append buffers). At least AMD (GCN) and Nvidia (Kepler, Maxwell) support this. I am not sure about Intel Broadwell, since the current Broadwell drivers only support up to OpenGL 4.3.
 
XB1 does support something similar to multidraw indirect in their xdk

Every core in the CPU can now talk to the GPU filling all it's cores , as well as the GPU can self-feed it's cores with work using the new "multi-draw indirect" api's in the xdk

New MultiDraw APIs (May 2014)
There are new APIs for dispatching multiple draws with a single call are now available in the ID3D11DeviceContextX Interface. The new methods are ID3D11DeviceContextX::MultiDrawIndexedInstancedIndirect, ID3D11DeviceContextX::MultiDrawInstancedIndirect, ID3D11DeviceContextX::MultiDrawIndexedInstancedIndirectAuto, and ID3D11DeviceContextX::MultiDrawInstancedIndirectAuto. The new API's functionality is similar to OpenGL's multi_draw_indirect.

For more information, see Multi-Draw calls.


Ref: https://forum.beyond3d.com/posts/1819756/
 
I do like the trend that developers can start merging multi GPU setups as a single entity using mantle and dx12. If this becomes a trend pc rigs will last a long time.
 
I do like the trend that developers can start merging multi GPU setups as a single entity using mantle and dx12. If this becomes a trend pc rigs will last a long time.

I guess it depends how much extra work is required by the developer to support that feature in a game. Would it just be a generic dual GPU path or would it need to be more specific to vendor, architecture or even overall performance of the individual GPU's? If it's either of the top two levels I can imagine it being pretty popular, at least in the big game engines.
 
Is anyone else a little annoyed at Brad Wardells disingenuous intentions with his DX12 statements?

He's making it sound like DX12 will lead to huge performance increases for all scene's being rendered ONCE game engines fully support DX12 and once 2nd round of games supporting DX12 arrive circa 2017-18. If a scene benutzes lots of draw calls it will alleviate performance bottlenecks, but not every scene will do that, and I couldn't envision 90% of scene's being rendered utilizing 100,000 draw calls per frame. RTS can benefit sure. A scene with thousands of low poly combatants fighting in the background like in Ryse 3 as you fight with as the protagonist in the foreground sure. But are devs going to have a unique leaf mesh for each leaf on a tree or blade of grass? No. So when you are walking down a corridor in Battlefield 2018 will performance 40% to 500% greater on DX12 than DX11 no.


It would be like disingenuously stating Motorstorm Apocalyse shows that PS3 is twice as powerful as Xbox 360 because if M:A ran on Xbox 360 M it would run at much lower fps, when the reality is the game was designed to push & utilizes the cell processor cpu for physics and ps3's superior memory bandwith to the cell.
 
Last edited:
It is a little annoying. But I will reserve my opinion until I see his companies product. If oxide games shows us something we've never seen before I'll have no choice but to agree with him that a new generation in rendering is coming and its exciting times will be well documented on this forum. If not and we see something that looks like dx11 could pull off with some talented folks then I'm ready to write off his word entirely.
 
I applaud what Brad and Nitrous are doing ... Any Developer/Researcher putting their repuations/lives on the line for what they believe and in the process pushing the state of art forward ...

And yes I believe pushing to a million+ draws a worthy endeavour. Whatever you choose to do with those draws and in whatever way you choose to bundle those draws is up to you, but achieving it is something I definitely support!
 
I applaud them as well I just don't appreciate the multiple very broad sweeping statements about gargantuan performance increases he's made that are giving hundreds of thousands of readers false expectations about DX12. Its like appreciating Ford's ecoboost engines and what they offers on a vehicle but not appreciating Ford fooling most consumers about its over exaggerated benefits.
 
He's actually riding dx12 and mantle to market his game as being the premiere new API game. Since he's the only one committed to it he's taking the spotlight to himself. If there were other developers that could they would too. It's the right time to piggy back if you are not a well known company.

Free and great marketing right? All eyes are on Brad right now. I've never heard of his company until this year.

And if I had to choose a piece of hardware to make a hoopla about it would be Xbox. Look at our views here on the threads and Xbox has the most. People are genuinely interested in this hardware despite how it's completely overshadowed in sales by its competitor.

I don't even think the oxide games product is for Xbox LOL. Man. If he announces the new Star Control on the new engine I'm all in.
 
Back
Top