DirectX 12: The future of it within the console gaming space (specifically the XB1)

While I don't want to take anything away from what was said, he is talking about the patches coming down the line soon. The thread is in regards to frame rate and he is saying with the next patches they are getting a 5% boost
 
Could be a while as well. If windows 10 is coming to Xbox after it hits PC we are looking at near fall/fall timing.
 
Directx12 could come to XboxOne sooner though.
I thought it was linked to wddm2.0? I know we have had this exact discussion somewhere on these boards whether XBO could do dx12 without w10. But I don't think we as a board came to consensus on it. There were definitely points on both sides. I think it requires it lol. But who knows I could be totally wrong
 
I thought it was linked to wddm2.0? I know we have had this exact discussion somewhere on these boards whether XBO could do dx12 without w10. But I don't think we as a board came to consensus on it. There were definitely points on both sides. I think it requires it lol. But who knows I could be totally wrong
The OS is delivered with the game, as the whole game is a virtual machine image. So win10/DX12 could come with any game, but I doubt it would before Microsoft makes it official. This is also why they can change the system-OS without affecting any old game. It is all just virtualized.
 
I started following that project cars thread a bit more, and I'll be honest I'm a little baffled. I thought I've figured out everything for GNM after reading the Naughty Dog slide decks, and I had fully assumed that it contained all the major features of DX12 - if not had access to more.

So... I read this from the developer
http://forum.projectcarsgame.com/showthread.php?27276-A-question-to-the-devs-about-the-frame-rate-issue&p=942715&viewfull=1#post942715

and that leads me to believe that perhaps GNM does have some limitations somewhere? What major feature could it be missing? I assume it's the multithreaded command buffer, but I swear I was looking at one with the Naughty Dog slides.
 
This guy also says: "It gives us more control over multi-threading which in turn has less CPU overhead. Our engine is massively multi-threaded.

The render team can talk more on the technicalities as it's rocket science to me."

Assuming that he is right though for the sake of it, then it would confirm PCars runs GNMX, which is Sony's path for easy DirectX ports ...
 
This guy also says: "It gives us more control over multi-threading which in turn has less CPU overhead. Our engine is massively multi-threaded.

The render team can talk more on the technicalities as it's rocket science to me."

Assuming that he is right though for the sake of it, then it would confirm PCars runs GNMX, which is Sony's path for easy DirectX ports ...

Yep, that's a reasonable assumption. Why would directx12 tech would improve the PS4 game if the devs weren't using GNMX as we already know draw calls using GNM are already light as a feather on the CPU?

This is interesting too:
Ian Bell said:
Also, and less impressively, the 5ish% improvement we've found on Xbox One also adds a little to PS4.

The best part though is here, I love this:

Random Guy said:
Can you at least say whether the PS4 is using 6 cores or more?

Ian Bell said:
We're using every core

:LOL:
 
Hmmm. Not sure. I've been posting less because I've finally learned my lesson as to what was said earlier which is deciphering the truth from a long list of telephone is like looking to make gold from copper and bronze.

Without a proper source to be believed, we are just left with guessing. I do not see why PCars would be running GNMX to be honest. They wrote earlier in that thread that their render team is amazing and they have really spent a lot of time optimizing the game for all platforms. GNM seems like a simple and obvious win but I could be wrong. They do cite that there is so much more that could be done with their platform. But I am thinking they are referring to async compute.
 
I see folks using Phil Spencer's comments regarding modest gains the bone from dx12 as being modest as a reason that the project cars gains of 30-40% can't be true. That is nonsense. 30% is the very low end of modest at best. Even 2 x performance would still be modest gains.
 
A 30% improvement to the frame rate would be massive, not modest. Now if "performance" is really just a reference to a specific part of the render pipeline and the resultant frame rate improvement is 0-5% depending on the scene, then that could be considered modest.
 
Eh, DirectX on Xbox has never been just DirectX save perhaps on the original one. 360 already wasn't drawcall limited at all, for instance.
 
I see folks using Phil Spencer's comments regarding modest gains the bone from dx12 as being modest as a reason that the project cars gains of 30-40% can't be true. That is nonsense. 30% is the very low end of modest at best. Even 2 x performance would still be modest gains.
You'd consider a 30% pay rise a modest increase? 30% less revenue in a year a modest decrease? 30% faster on your personal best lap time a modest improvement? You're a difficult man to please! (Now substitute those 30% with 100% as per your notion of doubling being modest!).
 
Hmmm. Not sure. I've been posting less because I've finally learned my lesson as to what was said earlier which is deciphering the truth from a long list of telephone is like looking to make gold from copper and bronze.

Without a proper source to be believed, we are just left with guessing. I do not see why PCars would be running GNMX to be honest. They wrote earlier in that thread that their render team is amazing and they have really spent a lot of time optimizing the game for all platforms. GNM seems like a simple and obvious win but I could be wrong. They do cite that there is so much more that could be done with their platform. But I am thinking they are referring to async compute.
From my understanding you can use both within the engine/game and move into GNM only for the parts you need, when porting from a DX base pretty much everyone will want the speed GNMX brings to get it up and running as it emulates the DX path process.

But by using the lower API you get more peformance but this comes with more time,code practise etc so like most will be a case by case basis.
 
Eh, DirectX on Xbox has never been just DirectX save perhaps on the original one. 360 already wasn't drawcall limited at all, for instance.
I'm going to put myself out here a bit, I'm probably not perfect on what I'm about to say, but I think this is along the right lines of it.

Being draw call limited or not isn't the full picture though. Being able to submit in parallel is fundamentally a major advantage over serial submission.
I don't think this was shown in any slides I took of the DX12 presentation, and I'm going to use some incorrect terminology, but after the last DX12 presentation I had a chance to speak with Max about multi-adaptor.

On the topic of dGPU with iGPU we talked about how DX12 had the ability to signal when dGPU was completed its task so that iGPU could take over. This is shown in the slides where the dGPU hands off work to the iGPU for post processing while it moves to work on the next frame. The key here isn't speed, the key here with DX12 is the signalling. The API can signal the next worker to perform work. What you didn't see in the slides was that the GPU can signal the CPU to work as well, not just gpu to gpu. This like getting the GPU to issue it's own draw call commands, though, it's doing it through the CPU. Let me expand.

The GPU can complete some work, lets look at tiled based renderers for instance, so it completes 1 sub section of the frame, the GPU can signal the CPU to continue. Continue what though? Well you could signal the CPU to issue async copy commands from another thread while the renderer threads are still submitting draw calls for the rest of the image. Each tile won't finish at the same time either right. But because thetile is complete and I want to make space on esram, I can copy that tile over to DDR3 by signalling the CPU that the job is done and to perform an action now that this tile is complete. I could also signal the CPU to also simultaneously signal the copy in the next resource. In serial submission, draw calls bound or not, this wouldn't happen, because it's going to be appended at the back of the command buffer, unless you're asking the single core (that is submitting draw calls) to stop and keep looking for these commands to pop up.

This is just talking about async copy. We haven't discussed about what can be done with async compute signalling (lacking the knowledge to make a statement here).
There are things to gain here, and I'm not sure what this brings to the table for performance increases, but it definitely allows some crazy control over the pipeline and how thing happen. These types of performance gains will apply to Xbox because I know for a fact that DX11, whatever version of it cannot perform these feats. DX11.Xbox can do special functions of DX12 (like async compute, low overhead API, and some other FL12 featuers), but it's not doing this core part, it is stuck with immediate and deferred contexts.

I do wonder if this is what PCars is referring to, but it's impossible to know.
 
Last edited:
From my understanding you can use both within the engine/game and move into GNM only for the parts you need, when porting from a DX base pretty much everyone will want the speed GNMX brings to get it up and running as it emulates the DX path process.

But by using the lower API you get more peformance but this comes with more time,code practise etc so like most will be a case by case basis.
Yes, this is my understanding too.
But I also believe they didn't leave that on the table. Xbox used to support DX11.X and DX Fast Semantics (which is the low level variant of the API). In the SDK documents, they did eventually mention that they were going to force all development over to Fast Semantics, and I imagine the same will be with DX12. So if PCars is on Fast Semantics I'm willing to bet that PS4 is using GNM. It's an opinion of course, but I think what SMS has managed with their game and the frame rates that it's running, with the load on the CPU is definitely indicative that this is not a GNMX game.
 
I thought it was linked to wddm2.0? I know we have had this exact discussion somewhere on these boards whether XBO could do dx12 without w10. But I don't think we as a board came to consensus on it. There were definitely points on both sides. I think it requires it lol. But who knows I could be totally wrong

Isn't it wddm2.1 now?

https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=0CCYQFjAB&url=http://download.microsoft.com/download/5/b/9/5b97017b-e28a-4bae-ba48-174cf47d23cd/pri103_wh06.ppt&ei=8vxlVbf3HIrisATCvoCwCA&usg=AFQjCNF6iGz_Bo-VFJTpmzD9JYvTxWnVNg&sig2=P0vGkvj3Kqp_xQqjBgC3Ug&bvm=bv.93990622,d.aWw


WDDM 2.1

Everything WDDM v2.0 GPU can do

Fine grained context switching

Can preempt mid pixel

Doesn’t stall GPU on page fault

True preemptive multi-tasking

Ultimate flexibility for the GPU

GPU can be used for any scenarios without impact on the desktop

Context Switch Guarantees

Pre WDDM v2.1 (XP, v1.0, v2.0)

No guarantee

VERY long shader, VERY large triangle slow to switch

expected performance

Relatively coarse switching for XP and v1.0

V2.0:Good average/typical switch time

WDDM v2.1

Guaranteed to context switch

Same average/typical switch time as v2.0

Much better switch time on applications
with long shaders

WDDM v2.x Efficiencies

WDDM v1.0

User Mode Driver (UMD) creates
GPU-specific command buffer

KMD patches addresses

Copies to GPU visible DMA buffer

WDDM v2.0 and 2.1

UMD creates DMA buffer directly
in GPU memory

No copy, no patch, fast and efficient

Performance –
Memory Footprint


WDDM v1.0

No demand fault (page or surface)

Entire surfaces resident – coarse grained

OS must guarantee residence – CPU overhead

WDDM v2.0

Surface fault – supports load on bind

GPU switches to new context, no stalling

Fault and stall – permits partial eviction

GPU stalls waiting for missing page

WDDM v2.1

Page fault – permits partial eviction/residence

GPU switches to new context, no stalling

WDDM v2.1 Page Faulting

Finally, full fledged page faulting with context switching!

GPUs support general page faulting and
virtual memory per process

On a page fault, GPU context
switches to next run list entry

Context switch is “immediate”

OS can partially populate
allocations to reduce an
app’s working set

GPU faults on non-resident page access

GPU context switches to next run list entry

WDDM v2.x Robustness

WDDM V2.x increases OS robustness

GPU uses virtual addressing instead of physical

Kernel mode driver (KMD) no longer patches DMA buffers with physical addresses

User Mode Driver (UMD) builds DMA buffer

KMD no longer validates command buffer

KMD no longer copies cmd buffer to DMA buffer

No DMA buffer splitting

UMD no longer identifies split points

OS no longer splits DMA buffers to fit resources

Privileged Operations

DMA buffers created in user mode cannot compromise the system

Can’t access memory belonging to other processes

Can’t interfere with correct and robust operation

Certain GPU operations are privileged
and only available to KMD-built DMA
buffers; Examples include

Display settings

GPU configuration

Context switching controls

UMD-created DMA buffers cannot
perform privileged operations
 
Back
Top