DirectX 12: The future of it within the console gaming space (specifically the XB1)

Is this guy saying that only 1 core is emitting draw calls on the XB1 right now ( or the GPU accepting only one source ) ?? I would think that the XB1 would have allowed a fair amount of flexibility already on the subject.
It depends on the level of optimization in Direct3D 11.X, but fundamentally only one thread is allowed to issue draw calls .

On the PC, Direct3D 11 allows free thread-safe multithreading for resource creation, but draw calls are not thread-safe - only one single rendering thread can directly interact with a D3D11 Device (i.e. the kernel mode driver) in the so-called "immediate context". However Direct3D 11 also allows additional rendering threads to run in a "deferred context" - these threads can issue draw calls and state changes and record them into "display lists", but in the end these "deferred" commands have to be executed in the "immediate" rendering context, which is not thread-safe - there is no parallel processing at this final stage and these lists have to be serialized.

There is a good explanatory article here http://code4k.blogspot.ru/2011/11/direct3d11-multithreading-micro.html

This was pretty much the same on XBox 360 - you had a 3-core 6-threaded CPU and a typical game would have a single rendering thread, a thread for game state updates and AI, then audio streaming, file decompression, procedural textures and geometry, etc. - whatever the developers could implement with simple per-frame sync, without using heavy-weight inter-process synchronization techniques which cause more problems than they resolve.


Direct3D 12, on the other hand, allows multiple rendering threads since [post=1836199]all rendering work is performed in the user-mode driver[/post] and only final presentation is handled in the main rendering thread which talks to the OS kernel. This is possible because draw calls and state changes are reorganized to be immutable (i.e. read-only), so they are inherently thread-safe and there is no need to use mutexes or locks. And any resource management is explicitly performed by the application, not by the kernel-mode driver that talks to the actual hardware, so there is no need to sync the device state between multiple rendering threads as well.



Code:
Times, ms        Total              GFX-only
              D3D11   D3D12       D3D11   D3D12
Thread 0      7.88    3.80        5.73    1.17
Thread 1      3.08    2.50        0.35    0.81
Thread 2      2.84    2.46        0.34    0.69
Thread 3      2.63    2.45        0.23    0.65
Total        16.42   11.21        6.65    3.32
 
Last edited by a moderator:
I'd virtually written him off as a kook but the lack of credible devs correcting him is making me begin to wonder.

DX12 is still early days. Anyone with access to it or more in-depth knowledge is surely NDA'd. Even talking about it publicly via tweets is a little risky.

However, my understanding is that Brad Wardell is a businessman and not an engineer and that article fully reinforces my understanding that he is not an engineer.
 
DX12 is still early days. Anyone with access to it or more in-depth knowledge is surely NDA'd. Even talking about it publicly via tweets is a little risky.

However, my understanding is that Brad Wardell is a businessman and not an engineer and that article fully reinforces my understanding that he is not an engineer.

More likely he just over embellishes as he is an engineer, given his degree.
 
DX12 is still early days. Anyone with access to it or more in-depth knowledge is surely NDA'd. Even talking about it publicly via tweets is a little risky.

However, my understanding is that Brad Wardell is a businessman and not an engineer and that article fully reinforces my understanding that he is not an engineer.

He's also under NDA with MS, so am I for certain technologies.. Doesn't stop me from making high level (without detail) remarks for said NDA'd techs..

Brad seems very knowedgable and his tech demo is very interesting, and the Oxide engine devs have a very impressive next gen engine ...
 
It depends on the level of optimization in Direct3D 11.X, but fundamentally only one thread is allowed to issue draw calls .

On the PC, Direct3D 11 allows free thread-safe multithreading for resource creation, but draw calls are not thread-safe - only one single rendering thread can directly interact with a D3D11 Device (i.e. the kernel mode driver) in the so-called "immediate context". However Direct3D 11 also allows additional rendering threads to run in a "deferred context" - these threads can issue draw calls and state changes and record them into "display lists", but in the end these "deferred" commands have to be executed in the "immediate" rendering context, which is not thread-safe - there is no parallel processing at this final stage and these lists have to be serialized.

There is a good explanatory article here http://code4k.blogspot.ru/2011/11/direct3d11-multithreading-micro.html

This was pretty much the same on XBox 360 - you had a 3-core 6-threaded CPU and a typical game would have a single rendering thread, a thread for game state updates and AI, then audio streaming, file decompression, procedural textures and geometry, etc. - whatever the developers could implement with simple per-frame sync, without using heavy-weight inter-process synchronization techniques which cause more problems than they resolve.


Direct3D 12, on the other hand, allows multiple rendering threads since [post=1836199]all rendering work is performed in the user-mode driver[/post] and only final presentation is handled in the main rendering thread which talks to the OS kernel. This is possible because draw calls and state changes are reorganized to be immutable (i.e. read-only), so they are inherently thread-safe and there is no need to use mutexes or locks. And any resource management is explicitly performed by the application, not by the kernel-mode driver that talks to the actual hardware, so there is no need to sync the device state between multiple rendering threads as well.

..Neat chart inserted here.

whew thanks for that response. Single threaded behavior on a PC sure, with all of the different hardware and driver interactions it makes sense to keep things safe rather than speedy but having multiple threads locked out for a fixed hardware platform seemed a bit non-obvious.

The bolded part was particularly apt. Wonder if the amount of control afforded by the addition of hUMA or whatever it's called will allow for a more predictable response to interprocess syncing or are we just gonna have to wait for a language more amenable to such things ... at least one that doesn't run on the Java VM ;-) Haskell on the console !! :LOL:
 
I don't pretend to be versed in virtualization, but in operating system level virtualization system isolation and protection is solution dependent.

In solutions like openvz, all the partitions share the same kernel.

Obviously performance is important and OS level virtualization can provide near native performance while still offering some level of isolation and protection.
...you'd have to make windows being able to run nicely and stable on a different ring, as you dont want a kernel exploit in windows to ruin your business. VM is terribly easier, then.

Given the hardware advantage of the PS4, how much more performance can MS afford to sacrifice to virtualization overhead?

A thin hypervisor can take little resources - you can write one in few hundreds lines (well, a skeleton one). What kills your VM is mostly (except paging) the vm enter/exit.
It happens, if I recall correctly, mainly on purpose (vm calls), on not reflected interrupts, on privileged instruction emulation (but this would be only needed on winOS partition).

I would be very surprised if such overhead would be measured in 2 digits.
 
but having multiple threads locked out for a fixed hardware platform seemed a bit non-obvious.
Yep, it's because the programming model remains the same as with D3D11.2 on the PC.

As we already know, [post=1840169]there are command bundles and reduced resource creation overhead[/post] on the Xbox One, however porting these improvement to the PC would require WDDM 2.0, which is the driver model beneath D3D12.

if the amount of control afforded by the addition of hUMA or whatever it's called will allow for a more predictable response to interprocess syncing or are we just gonna have to wait for a language more amenable to such things
These are two separate issues - first is avoiding inter-process synchronization as much as possible by separating the algorithm into several independent chunks that can run in parallel, the second is doing the still necessary bits of synchronization as efficiently as possible.

Direct3D 12 provides the solution to the first problem, that is streamlining the API for parallel CPU processing as much as possible.


Providing effective multi-processor (or multi-core) access to the shared memory is the second part of the equation.

NUMA is not really about inter-process synchronization on desktop computer devices, it was designed for huge-multiprocessor or cloud-based computing, where the same algorithm has to run on many independent computing nodes with different data sets on each node then sync the results to some other nodes, and most of these nodes are non-local, i.e. they are running on another computer or cluster connected by high-speed network.

On the desktop/console, it's much more efficient to provide a wide high-speed memory connection and a L3/L4 cache to connect L1/L2 caches on each core.
 
DX12 is still early days. Anyone with access to it or more in-depth knowledge is surely NDA'd. Even talking about it publicly via tweets is a little risky.

However, my understanding is that Brad Wardell is a businessman and not an engineer and that article fully reinforces my understanding that he is not an engineer.

The above couldn't be further from the truth. Brad programed one of the top 5 all time games by himself - the original GalCiv. He also programmed innumerable productivity apps. Given his AI work, I tend to believe his comments on CPU utilization. On the GPU side, well, stardock has never been known for cutting edge graphics. Given that he is focusing on the CPU optimization as it relates to improved GPU utilization I will take his word for it. None of the so-called experts in the links above said anything to contradict what he said so I am not sure why people are pointing to that as proof he is kooky.
 
The above couldn't be further from the truth. Brad programed one of the top 5 all time games by himself - the original GalCiv. He also programmed innumerable productivity apps. Given his AI work, I tend to believe his comments on CPU utilization. On the GPU side, well, stardock has never been known for cutting edge graphics. Given that he is focusing on the CPU optimization as it relates to improved GPU utilization I will take his word for it. None of the so-called experts in the links above said anything to contradict what he said so I am not sure why people are pointing to that as proof he is kooky.
The more you read about Mantle the more we see they achieve many of its optimizations in a similar fashion to DX12. They are very similar in that they have superior multithread scaling compared to dx11, split command buffer between multiple cores, & binding descriptors tables etc to reduce # of draw calls.
From the results we've seen with Brads StarSwarm game its the only game that sees massive performance boost (outside of SLI/crossfire improved implementation) on Mantle.
Brad develops large strategy/simulation games and the Star Swarm game involve huge simulations involving rendering thousands of objects with multiple materials per object, with thousands of variables that need to be updated every frame. The draw calls normally swamp the cpu.
All these other games, from Ryse/Forza/BF4/CoD will hit a gpu bottleneck somewhere in the graphics pipeline and likely a ddr3 bandwidth bottleneck (esram may be fine though) before they can allow for 50% fps increase.
 
Last edited by a moderator:
The more you read about Mantle the more we see they achieve many of its optimizations in a similar fashion to DX12. They are very similar in that they have superior multithread scaling compared to dx11, split command buffer between multiple cores, & binding descriptors tables etc to reduce # of draw calls.
From the results we've seen with Brads StarSwarm game its the only game that sees massive performance boost (outside of SLI/crossfire improved implementation) on Mantle.
Brad develops large strategy/simulation games and the Star Swarm game involve huge simulations involving rendering thousands of objects with multiple materials per object, with thousands of variables that need to be updated every frame. The draw calls normally swamp the cpu.
All these other games, from Ryse/Forza/BF4/CoD will hit a gpu bottleneck somewhere in the graphics pipeline and likely a ddr3 bandwidth bottleneck (esram may be fine though) before they can allow for 50% fps increase.
well, yes I don't think we get a 50% fps increase in a title like ryse (we could get a little increase right now because of newer drivers and some of those 10% that were not available, if the game would be patched with those newer drivers), but games like ryse should hold the 30fps with those optimizations (which could be 50% in some situations).
 
Last edited by a moderator:
Quite. The 360 interview did allude to the CPU being a problem as well. Especially in Last Titan Standing, the spikes seem to indicate some inherent issue there.

Kind of curious as to why the titans themselves would be so problematic.
 
I don't understand how Microsoft, who just released the XBox One, is now implementing a new DX into the Xbone architecture that developers have to redesign for? All that says to me is confirmation that MS scrambled to get DX12 into the scene much faster than previous intended due to Mantle as I really can't believe that they would release Xbone then move to a newer API immediately.
 
I don't understand how Microsoft, who just released the XBox One, is now implementing a new DX into the Xbone architecture that developers have to redesign for?.

The alternative was delaying Xbox One until DirectX 12 was read for prime time on Xbox and PC :nope:
 
The draw calls normally swamp the cpu.
All these other games, from Ryse/Forza/BF4/CoD will hit a gpu bottleneck somewhere in the graphics pipeline and likely a ddr3 bandwidth bottleneck (esram may be fine though) before they can allow for 50% fps increase.

Why stop at 50%?
The claim for bringing over DX12 was 100%.
 
I don't understand how Microsoft, who just released the XBox One, is now implementing a new DX into the Xbone architecture that developers have to redesign for?
Who said they have to redesign anything?

If developers are fine with Direct3D 11.2/11.X, that's OK, nobody is deprecating Direct3D 11 and performance should improve with new Xbox One SDK releases (and new Windows releases as well, when they [post=1840198]port D3D11.X features fom Xbox One[/post] and move to [post=1841448]lightweight WDDM 2.0 driver model [/post]).

But if you absolutely have to squeze additional bits of performance from the exising hardware, now you can do that with Direct3D 12.
 
. None of the so-called experts in the links above said anything to contradict what he said so I am not sure why people are pointing to that as proof he is kooky.

How can you say that?

Sure they were. Anyways one of those was Keith Judge a programmer for unreal engine 4 which is supporting directx12 and is baffled by Wardells statement.

Wardell in recent days has contradicted himself.
Now responding to questions on twitter and he's suggesting that directx12 won't close the gap between ps4 and XOne. That contradicts his statement that it would give XOne 2x the performance for most games.
 
Back
Top