DirectX 12: The future of it within the console gaming space (specifically the XB1)

Just Cause 3 is Avalanche studio's latest addition to the mind (and lots of other stuff) blowing open world action-adventure series: Just Cause. Published by Square Enix, the game released on December 1, 2015 for Microsoft Windows, PlayStation 4 and Xbox One. During the development of the game, Intel and Avalanche have been working closely together to optimize the game on Iris graphics, but also to make use of DX12's features to improve the performance even further as well as bring additional visual quality to the game!
This session will cover the changes Avalanche made to their engine to match DX12's pipeline, the implementation process and best practices from DX12. We'll also showcase and discuss the PC exclusive features enabled thanks to DX12 such as Ordered Independent Transparency, and G-buffer blending using Raster Ordered Views and light assignment for clustered shading using Conservative Rasterization.

http://schedule.gdconf.com/session/...ce-just-cause-3-case-study-presented-by-intel
 
DX != Feature-Level.
We only have DX12, but with four different Feature-Levels for it, 11.0, 11.1, 12 and 12.1.

Just Cause 3 maybe uses FL12.1, maybe they will use 11.0 and check for both features manually.
ROVs are only supported by Haswell, Broadwell, Skylake and Maxwell v2.
Conservative Rasterization is supported on Maxwell v2 (Tier1) and Skylake (Tier3).
Wonder which Tier or maybe even different Tiers JC3 will use.
 
DX != Feature-Level.
We only have DX12, but with four different Feature-Levels for it, 11.0, 11.1, 12 and 12.1.

Just Cause 3 maybe uses FL12.1, maybe they will use 11.0 and check for both features manually.
ROVs are only supported by Haswell, Broadwell, Skylake and Maxwell v2.
Conservative Rasterization is supported on Maxwell v2 (Tier1) and Skylake (Tier3).
Wonder which Tier or maybe even different Tiers JC3 will use.
Wow I didn't know sky lake was tier 3! I guess they solved those fringe cases fairly quickly.
 
Codemasters present a post-mortem on their new rendering engine used for F1 2015 detailing how they balanced the apparently opposing goals optimizing for mainstream processor graphics, high end multi-core and DX12. The F1 2015 engine is Codemasters' first to target the eighth generation of consoles and PC's with a new engine architecture designed from scratch to distribute the games workload across many cores making it a great candidate for DX12 and utilise the processing power of high end PC's. This session will show the enhanced the visuals created using a threaded CPU based particle system without increased the GPU demands and also cover the changes made to the engine while moving from DX11 to DX12. We will also discuss the graphics effects added using the new DX12 features Raster Ordered Views (AVSM and Decal Blending) and Conservative Rasterization (Voxel based ray tracing) adding even greater realism to the F1 world.

http://schedule.gdconf.com/session/...ng-for-multi-core-and-dx12-presented-by-intel
 
Takeaway

An insight into the main architectural changes needed to move successfully to DX12 and realise a performance benefit together with an understanding of some of the new effects possible with feature level 12 capable hardware.
An understanding of how to balance CPU and GPU workloads to get the best of modern PC hardware offering improved visuals and more interactive environments
Any relevance to DX12 on consoles?
 
Look at the docs further - does this take a parameter at all, such as a value to add to the counter? If it can only add "1" for a given lane or not (i.e. it can add the execution mask like a regular IncrementCounter) that is still useful for pack, but limits its utility for general purpose scan, right?
http://amd-dev.wpengine.netdna-cdn..../07/AMD_GCN3_Instruction_Set_Architecture.pdf

According to the GCN3 ISA document "data share instructions" take two VGPR inputs and return one VGPR output. So it could take up to two parameters. Seems that you need to ask AMD for a better document.

We were actually today discussing whether it is possible to do similar stuff in pixel shader with ROVs. But the problem is that you don't have any other thread communication model than ddx/ddy (and ddx_fine and ddy_fine). So you need to do work in 4 lane groups... That would be absolutely too slow :(
 
We were actually today discussing whether it is possible to do similar stuff in pixel shader with ROVs. But the problem is that you don't have any other thread communication model than ddx/ddy (and ddx_fine and ddy_fine). So you need to do work in 4 lane groups... That would be absolutely too slow :(
I don't think you would want to use the ROV mechanism for anything that isn't rasterizer/triangle-driven in the first place as the scheduling would be suboptimal (open question of how optimal the ordered counter mechanism is for various kernels). Stuff like general purpose scan and pack works decently well enough in compute in any case - it's not like ROVs where if there is no hardware support you really have no reasonable fallback.
 
Anyone remember these choice quotes ?

With relatively little effort by developers, upcoming Xbox One games, PC Games and Windows Phone games will see a doubling in graphics performance.

Suddenly, that Xbox One game that struggled at 720p will be able to reach fantastic performance at 1080p. For developers, this is a game changer.

The results are spectacular. Not just in theory but in practice (full disclosure: I am involved with the Star Swarm demo which makes use of this kind of technology.)
http://www.neowin.net/news/directx-12-a-game-changer-for-xbox-one
 
Anyone remember these choice quotes ?

With relatively little effort by developers, upcoming Xbox One games, PC Games and Windows Phone games will see a doubling in graphics performance.

Suddenly, that Xbox One game that struggled at 720p will be able to reach fantastic performance at 1080p. For developers, this is a game changer.

The results are spectacular. Not just in theory but in practice (full disclosure: I am involved with the Star Swarm demo which makes use of this kind of technology.)
http://www.neowin.net/news/directx-12-a-game-changer-for-xbox-one
We discussed a lot of this earlier in the thread to be honest. I think we agreed on the fact that dx12 would bring little in terms of performance for Xbox one with the exception that if Xbox one was a FL12_1 supported then that would be a different case.

By //Build/ conference I found out it was not and that pretty much answered any remaining unknowns.
 
I don't think you would want to use the ROV mechanism for anything that isn't rasterizer/triangle-driven in the first place as the scheduling would be suboptimal (open question of how optimal the ordered counter mechanism is for various kernels). Stuff like general purpose scan and pack works decently well enough in compute in any case - it's not like ROVs where if there is no hardware support you really have no reasonable fallback.
It was a though experiment really, nothing meant for production. We always like to invent new ways to abuse GPU features in a way they were not meant to be used :)

Currently there is no fast way to implement (local or global) prefix sum (or fast radix sort) in DirectCompute. You easily lose 50% of your cycles. HLSL lacks many important features present in CUDA, OpenCL 2.1 and consoles. Hopefully Microsoft will prioritize compute improvements in the next DirectX version (CS_6_0).

For more ordered atomics use cases, see page 50 of this Media Molecule presentation:
http://advances.realtimerendering.com/s2015/AlexEvans_SIGGRAPH-2015-sml.pdf
 
Currently there is no fast way to implement (local or global) prefix sum (or fast radix sort) in DirectCompute. You easily lose 50% of your cycles. HLSL lacks many important features present in CUDA, OpenCL 2.1 and consoles. Hopefully Microsoft will prioritize compute improvements in the next DirectX version (CS_6_0).
When you say "lose 50% of your cycles", these use cases are fundamentally going to be bandwidth limited to start with, so I'm not sure exactly what you're comparing. Are you saying the best prefix sum you can write in DirectCompute is 50% the speed of the best one in OpenCL or similar? I'd be a little bit surprised by that claim; while there's probably a delta - particularly if you are optimizing for one particularly piece of hardware (which really is the rub on most of these languages... performance portability sucks) - 2x seems a bit high.

The MM use case is good: it's similar to what ROVs provide to "streaming compression" style algorithms per pixel in that it's primarily about maintaining determinism (for scheduling or temporal coherence). Ultimately I think a language with more tools on the scheduling front (think Cilk fork/join and hyper-objects, which are similarly deterministic but more flexible) is what you want, but the ability to order something globally is a good tool in the mean time.
 
Last edited:
We discussed a lot of this earlier in the thread to be honest. I think we agreed on the fact that dx12 would bring little in terms of performance for Xbox one with the exception that if Xbox one was a FL12_1 supported then that would be a different case.

By //Build/ conference I found out it was not and that pretty much answered any remaining unknowns.
Did you attend the //Build conference? Is there a link for the info? This slide has been shown a few times, but do you think the person who presented this made an error? Or it's a fake? The link contains the whole presentation.

http://www.slideshare.net/jaimer/windows10-gamedevoverviewexcludingvideos
windows10-gamedevoverviewexcludingvideos-19-638.jpg

What do you think about Phil's comment about knowing what DX12 was doing while designing the XB1? His response strong implies that the XB1 has "all the tweaks" to be full DX12 that "requires new hardware". MS did say that they highly customized the command processors in a way that isn't available on current GPUs and that they want to take this and integrate it back onto mainland PCs.

70Jjat2.png


http://www.eurogamer.net/articles/digitalfoundry-the-complete-xbox-one-interview

4jIIbCH.png

It's not the first time that MS customize something with GPU (or APU) maker for ideas that are not available on mainland GPUs. For example, the Xbox 360 GPU introduced the Unified Shader Architecture that was NOT available on GPUs when it was released on November 22, 2005. It took 6 months or more for other GPU makers like ATI, NVIDIA, and Intel to have hardware with Unified Shader Architecture.
 
Did you attend the //Build conference? Is there a link for the info? This slide has been shown a few times, but do you think the person who presented this made an error? Or it's a fake?
A) firstly the slide doesn't actually represent anything or show anything that indicates Xbox one is fl12_1.
It just so happens that Xbox one picture is beside that enumeration. It's likely meant to just represent console. It does not explicitly state what Xbox is.....

B) yes I did attend, I spoke to d3d team to confirm suspicions as well as other MS employees. Granted though with one particular presenter I was told it was. But after spending a lot of time with another presenter and given their positions he said explicitly, yea it's not fl12_1. Hololens is great.

C) I said nothing about it (yet) until AMD announced the feature levels of all of their cards in which the world found out all GCN cards by AMD were only FL12 and that only Maxwell 2 would have greater than Fl12 at the time.

D) fury X is still fl12.

E) lastly, you haven't posted anything I haven't seen or discussed before. And I really tried to learn things about Xbox as you can see from my posts here in Xbox tech thread as well as this one. Everything you highlighted I've highlighted. I've been through the Xbox SDK as much as misterxmedia.

There's no obscurity. If Xbox had fl12_1 we'd see its usage a long time ago, the benefits are too obvious especially for console exclusives. You don't need dx12 to use fl12_1 features. Certainly not at a console level. And there are no reasons to hold it back for 3 years...
 
Last edited:
A) firstly the slide doesn't actually represent anything or show anything that indicates Xbox one is fl12_1.
It just so happens that Xbox one picture is beside that enumeration. It's likely meant to just represent console. It does not explicitly state what Xbox is.....

B) yes I did attend, I spoke to d3d team to confirm suspicions as well as other MS employees. Granted though with one particular presenter I was told it was. But after spending a lot of time with another presenter and given their positions he said explicitly, yea it's not fl12_1. Hololens is great.

C) I said nothing about it (yet) until AMD announced the feature levels of all of their cards in which the world found out all GCN cards by AMD were only FL12 and that only Maxwell 2 would have greater than Fl12 at the time.

D) fury X is still fl12.

E) lastly, you haven't posted anything I haven't seen or discussed before. And I really tried to learn things about Xbox as you can see from my posts here in Xbox tech thread as well as this one. Everything you highlighted I've highlighted. I've been through the Xbox SDK as much as misterxmedia.

There's no obscurity. If Xbox had fl12_1 we'd see its usage a long time ago, the benefits are too obvious especially for console exclusives. You don't need dx12 to use fl12_1 features. Certainly not at a console level. And there are no reasons to hold it back for 3 years...

There is nothing mysterious in Xbox One and PS4. PC AMD last cards are more advanced in HSA (Carrizo) or ISA GCN 3... And no AMD card have conservative rasterization or ROV...

The only advantage consoles have is only on API software side with some functionality not exposed in DX11 or DX12and fixed hardware.

Between leaks and dev presentation on GDC or SIGGRAPH I think the two consoles have no big secrets anymore...

Speculation is interesting for next generation.

The only interesting things will be improvement created developer for 2016/2017 games with asynchronous compute or other functionalities or techniques find with GCN ISA...
 
Last edited:
I know its not console related but it is dx12
When I said I wish there was no feature levels and a gpu was either dx12x or not,
I was told by b3d readers that having to figure out if the card was dx12 or 12.1 or whatever was a good thing.

PC exclusive features enabled thanks to DX12 such as Ordered Independent Transparency,

Is that the same OIT that was in Grid 2 and exclusive to HD 5200 (Iris).
 
Last edited:
It's not the first time that MS customize something with GPU (or APU) maker for ideas that are not available on mainland GPUs. For example, the Xbox 360 GPU introduced the Unified Shader Architecture that was NOT available on GPUs when it was released on November 22, 2005. It took 6 months or more for other GPU makers like ATI, NVIDIA, and Intel to have hardware with Unified Shader Architecture.
Just before this gets any more ridicilous, Xbox 360's GPU was made by ATI, it was offspring of the illfated R400 project, called (depending on who you ask) R500 or C1
 
Back
Top