DirectX 12: The future of it within the console gaming space (specifically the XB1)

D3D12 support in SharpDx

It's great to see that MS is working with the open community! Seems like the SharpDx guys have access to DX12 and have already got an internal version of its SharpDx D3D12 apis ready ...


SharpDX: Good news! Direct3D12 is being successfully integrated into a SharpDX internal build and will be ready for the official dev release of d3d12

Twitter : http://www.twitter.com/sharpdx2/status/469647402031259648


p.s. SharpDx is a great library for writing Managed apps that wants to use DirectX, MediaFoundation apis, ive used it many times. It's a very thin wrapper ontop of the Native COM based Dx/MF libs, super fast..


I'm hoping we'll see the D2D,DirectWrite,DirectComposition etc and MediaFoundation reved 12 apis soon!!
 
what is a managed app ?

an app written for .NET (MS or MONO). It distributes the app made by a virtual assembly language (like Java, but better since i.e. the Virtual Machine is stack-based). At runtime, on demand the real code is generated and executed on the fly by the JIT engine.
For this properties, and the fact that it gets compiled to real code so it can get really fast, it gets sometime used as scripting engine i.e. by Unity.

You can also precompile the app in real binary - and that's what happen when you distribute it for i.e. iPhone, since there is no support for runtime code execution.
 
Last edited by a moderator:
an app written for .NET (MS or MONO). It distributes the app made by a virtual assembly language (like Java, but better since i.e. the Virtual Machine is stack-based). At runtime, on demand the real code is generated and executed on the fly by the JIT engine.
For this properties, and the fact that it gets compiled to real code so it can get really fast, it gets sometime used as scripting engine i.e. by Unity.

You can also precompile the app in real binary - and that's what happen when you distribute it for i.e. iPhone, since there is no support for runtime code execution.

And with the latest Windows 8.1 store apps model, .NETCore (WinRT)managed apps are natively compiled using the C++ optimizing compiler ... So removing JIT from the story altogether.

Store apps are now, as much as possible, statically compiled! And what can't be is emitted in a metadata file (such as dynamic, reflection code).

The goal with Native compilation was to statically compile into the exe/dll as much as possible, this includes the CLR, GC, BCL's, Thirdparty libs.. etc .. The generated EXE/Dll's contains absolutely everything needed to run the app, and what isn't needed is "tree shaken" out, and is as architecture agnostic as possible (MSIL)....

The store also does some fancy "cloud compilation" that turns MSIL -> MDIL .. and this MDIL image of your application is what the store deploys to a device, and on the device a last "linking" step is done that is the linking to architecture/platform specific nature of your device..



If your interested in the tool chain or .NET managed WinRT store apps then this is a good post ....

http://blogs.msdn.com/b/dotnet/archive/2014/05/09/the-net-native-tool-chain.aspx

p.s. the HSA software stack and tool chain is remarkably similar, almost as if MS influenced it ;)
 
Last edited by a moderator:
Some great coments from Max McMullen on D3D12

Max McMullen (MS dev)

To Jason's initial post:

Some of the high level details were already revealed to respond to your post but it's quite a jump to get to the API details you probably want to hear. D3D 12 doesn't have strongly typed memory allocations like D3D 11, which strictly limited the dimensionality and usage of memory at creation time. On 12, the main memory allocation parameters are CPU access/cacheability and GPU locality vs CPU locality. Some examples:

Dynamic vertex buffers in 11 would be an application managed ring buffer of memory in 12, allocated with write combined CPU cacheability and CPU locality.

11 style default 2D textures do not have CPU access and have GPU locality. 12 will also expose the ability to map multidimensional GPU local resources, useful for reading out the results of a reduction operation with low-latency for example. In this case it would be write combined CPU access with GPU locality. In the GDC unveil of D3D 12 this was briefly mentioned in a slide, called "map default" or "swizzled texture access" IIRC.

Cacheability and locality will not be mutable properties of memory allocations but 12 will allow memory of those given properties to be retasked for multiple resource types (1D/2D/3D, VB/Texture/UAV/..., width/height/depth, etc). More details later this year....

D3D 12 will have multiple methods for moving data between CPU & GPU, each serving different scenarios/performance requirements. More details later this year...
smile.png
.


To Alessio1989's reply:

I expect the feature level/cap evolution to remain the same. D3D will expose some new features as independent caps and simultaneously bake sets of common caps together into a new feature level to guarantee support and reduce the implementation/testing matrix for developers. It's the best of both worlds between D3D9 and D3D10+. 9 allowed fine-grained feature additions without forcing hardware vendors to perfectly align on feature set but created an unsupportable mess of combinations. 10 allowed developers to rely on a combination of features but tended to delay API support for hardware features until most GPU vendors had built or nearly built that combination in hardware. 11 & 12 have evolved to have caps for initial exposure with feature levels baking in a common set over time.

ref from another forum (let me know if this link violates the terms of Beyond3D and ill remove it) : http://www.gamedev.net/topic/656346-direct3d-12-staging-resources/#entry5153199
 
But if you want to use async compute for graphics (simultaneously with other graphics job) you need to sync it's beginning and end with graphics pipeline, right? (see Graham post here and BF4 presentation here, page 35-43).

Well no, not really. I should probably clarify somewhat :)

Think of it as data sync more than anything. You kick a compute job once the data it needs is available (or for whatever other reason), and you sync when you need to access the data it has written.

Typically you would set things up such that the async job has a very high probability of completing naturally before the results are needed by a later process, so while there is a sync point it doesn't mean there is always waiting going on. Just in the unusual cases where it hasn't completed would a potential stall occur. The sync point doesn't have to be at the beginning/end of the pipe either, as it'll just go into the command list like other graphics commands.

'Sync' is perhaps the wrong word. It implies a two way operation. Basically all that happens is when you have finished writing your data, you can add some code that will set a flag in memory saying 'data is ready'. This is immediate and the task will carry on (or finish, etc). When the gpu is processing the command list, it may see "make sure the 'data is ready' flag is set, and if it isn't then wait' command (it could also see a command to set a flag, etc). If you do things right it almost never has to wait, then the sync overhead is basically zero and no stalling occurs.

That's at least a fairly abstract view of what goes on.....
 
Well no, not really. I should probably clarify somewhat :)

Think of it as data sync more than anything. You kick a compute job once the data it needs is available (or for whatever other reason), and you sync when you need to access the data it has written.

Typically you would set things up such that the async job has a very high probability of completing naturally before the results are needed by a later process, so while there is a sync point it doesn't mean there is always waiting going on. Just in the unusual cases where it hasn't completed would a potential stall occur. The sync point doesn't have to be at the beginning/end of the pipe either, as it'll just go into the command list like other graphics commands.

'Sync' is perhaps the wrong word. It implies a two way operation. Basically all that happens is when you have finished writing your data, you can add some code that will set a flag in memory saying 'data is ready'. This is immediate and the task will carry on (or finish, etc). When the gpu is processing the command list, it may see "make sure the 'data is ready' flag is set, and if it isn't then wait' command (it could also see a command to set a flag, etc). If you do things right it almost never has to wait, then the sync overhead is basically zero and no stalling occurs.

That's at least a fairly abstract view of what goes on.....

Really informative post, thanks.
 
DX12 to Allow Better Use of CPU Cores, Xbox One Already Has Similar API For Freeing Up CPU



We know that DX 12 will have a fairly strong impact on PC games when it launches late next year but will it have the same technical impact on the Xbox One? Interestingly, Engel revealed that, “The Xbox One already has an API which is similar to DirectX 12. So Microsoft implemented a driver that is similar to DirectX 12 already on the Xbox One. That freed up a lot of CPU time.

http://gamingbolt.com/dx12-to-allow...ne-already-has-similar-api-for-freeing-up-cpu
 
I think the only surprise there is that anyone still thought the xbox might get a big performance boost from DX12. Interesting to see another dev stating that Mantle will be (a bit) faster though. Doubt we'll ever see direct benchmarks.
 
seeing as the only Dx API's that xb1 devs code against at the moment is Dx11.x I can imagine there being an improvement already once they start using Dx12 (D3D/D2D/DirectWrite/DirectComposition etc) API surface area...

Yes the XB1 D3D driver underneath may already be the optimized MONO driver BUT I'm assuming the D3D12 driver will be again different and hopefully even more optimized ( or as Dx team wants us to believe a much more radically thin driver with much less functionality, leaving it up to the Application to do more)
 
I think the only surprise there is that anyone still thought the xbox might get a big performance boost from DX12. Interesting to see another dev stating that Mantle will be (a bit) faster though. Doubt we'll ever see direct benchmarks.

PS4 can reach 30k draw calls per frame (which is what Infamous SS had) without too much work, according to Suckerpunch Studios. I'd assume Xbox one currently with its current custom dx11.x would be similar.

Thats only a 50-100% increase over last generation. It would suggest they aren't multithreading drawcalls.

Mantle can achieve +100k drawcalls per frame on an FX-8350 in the Starswarm demo. DX12 for the 8core jaguar Xbox One should offer something at least a fair bit larger than 30k as well. (-lower clock, - slightly less ipc, +more cores, +dx12 bundles, + even lower overhead in comparison to Windows OS pc)
 
Last edited by a moderator:
Back
Top