DirectX 12: The future of it within the console gaming space (specifically the XB1)

Yup I can't see a 100% performance boost on XB1 from DX12 unless MS gave the XB1 API development to 2 interns with a kegger and said 'have at it, your deadline is tomorrow by 5'.
That's not even the most ridiculous part.

Even if the CPU were to magically run twice as efficiently with an API update, the notion that you'd suddenly be able to double the resolution makes absolutely no sense. It's like people think that the Jaguar needs to do some kind of snazzy secret handshake with the GPU every time a pixel gets rasterized.
 
That's not even the most ridiculous part.

Even if the CPU were to magically run twice as efficiently with an API update, the notion that you'd suddenly be able to double the resolution makes absolutely no sense. It's like people think that the Jaguar needs to do some kind of snazzy secret handshake with the GPU every time a pixel gets rasterized.
We can look at Mantel for a recent example of improved CPU cost, there the promise (*) was 9x speedup
which resulted in a increase of ~16% in battlefield 4
thus 9x vs 2x
we can guess ~3% increase so 31fps instead of 30fps. Better than a kick in the face

(*) past promises do have a habit of biting you later
 
Not saying that I agree with Wardell. However, your math above is rather off. The correct comparison is 13 CU (12*1.08) with 50% MORE potential bandwidth (272 gb/sec) versus PS4. Plus since the XB1 gpu is a DX12 part running DX11, substantial gains should be possible.

XB1 is not running DX11 it uses a bespoke API (commonly referred to as XB1 API). XB1 API is much closer to the tin than DX11 by necessity in the same way the that the Xbox 360 API is not any of the PC DX variants either.

Your comparison is deeply flawed, there are 12 CUs in XB1, API improvements do not add b/w that is system design. Where an API does effect b/w it is merely by being inefficient and wasting b/w, to put it another way a perfect API has a multiplier of 1.0 to system bandwidth.
 
Last edited by a moderator:
Doesn't it really depend on where the bottleneck is and if the redesign of the api removes such a bottleneck ?

The 8 core jaguar isn't exactly lighting the world on fire and we see from DX11 to Mantle multi thread performance sucks

01-battlefield-4-siege-of-shanghai-chart.png


here we almost get a doubling of frames at 720p and 1080p . The only thing that's changed is the api. I've made some graphic card upgrades back in the day that didn't even give me that performance jump.
 
Doesn't it really depend on where the bottleneck is and if the redesign of the api removes such a bottleneck ?

The 8 core jaguar isn't exactly lighting the world on fire and we see from DX11 to Mantle multi thread performance sucks

01-battlefield-4-siege-of-shanghai-chart.png


here we almost get a doubling of frames at 720p and 1080p . The only thing that's changed is the api. I've made some graphic card upgrades back in the day that didn't even give me that performance jump.


So from what I understand of this graph, at the point at which the d3d benchmarks meet the mantle benchmarks it is truly benchmarking the GPU. The rest of the time the GPU was sitting around with cycles to spare because the CPU was too slow with the draw calls?
 
So from what I understand of this graph, at the point at which the d3d benchmarks meet the mantle benchmarks it is truly benchmarking the GPU. The rest of the time the GPU was sitting around with cycles to spare because the CPU was too slow with the draw calls?

Exactly. Right now mantle is as far away from the metal like DirectX is. Only optimized draw calls so far, nothing really special. But in the future the lower level api should be used. Than we may get an additional speed boost from mantle. Same may apply to DirectX. But the really interesting about this is, the cpu should have more time to process other stuff.
 
Doesn't it really depend on where the bottleneck is and if the redesign of the api removes such a bottleneck ?

The 8 core jaguar isn't exactly lighting the world on fire and we see from DX11 to Mantle multi thread performance sucks

01-battlefield-4-siege-of-shanghai-chart.png


here we almost get a doubling of frames at 720p and 1080p . The only thing that's changed is the api. I've made some graphic card upgrades back in the day that didn't even give me that performance jump.

Can you mention CPU and GPU model for this graph?
 
Last edited by a moderator:
Doesn't it really depend on where the bottleneck is and if the redesign of the api removes such a bottleneck ?
Yep.

here we almost get a doubling of frames at 720p and 1080p . The only thing that's changed is the api. I've made some graphic card upgrades back in the day that didn't even give me that performance jump.
Those are some very impressive improvements. However, those bottlenecks shouldn't exist on a console. They don't exist on PS4. They shouldn't exist on XB1 which has an optimised console API and not Windows DX.
 
Doesn't it really depend on where the bottleneck is and if the redesign of the api removes such a bottleneck ?

The 8 core jaguar isn't exactly lighting the world on fire and we see from DX11 to Mantle multi thread performance sucks

01-battlefield-4-siege-of-shanghai-chart.png


here we almost get a doubling of frames at 720p and 1080p . The only thing that's changed is the api. I've made some graphic card upgrades back in the day that didn't even give me that performance jump.

That assumes DX11.X in the XB1 isn't already built in such a way to take similar advantage of the hardware to Mantle. Which is a big assumption to make.

It also assumes that the current bottleneck for XB1 games is the CPU rather than the GPU. And if that were the case then that would also hold true for the PS4 and since the PS4 has a slightey slower CPU, it would in fact be running these games slower than the XB1 rather than faster.
 
That assumes DX11.X in the XB1 isn't already built in such a way to take similar advantage of the hardware to Mantle. Which is a big assumption to make.

It also assumes that the current bottleneck for XB1 games is the CPU rather than the GPU. And if that were the case then that would also hold true for the PS4 and since the PS4 has a slightey slower CPU, it would in fact be running these games slower than the XB1 rather than faster.

Ignoring x1 for the moment because I don't think the current crop of games out today had enough time to architect their games properly for esram; on the ps4 side there have been many indications that they are indeed CPU bound, if I recall sucker punch indicated that CPU was their largest bottleneck. I also do recall reading a review on some ps4 games that indicated a drop in resolution would not have improved the performance of the frame rate indicating that the bottleneck was not GPU side of things. Citation needed here which I am looking for at the moment.

And as for ps4 having a slower CPU than x1, could it be all the hyper visor that x1 needs to hurdle through while ps4 does not resulting in a net positive for ps4?

The arrival of AMD's Mantle API has put a lot of focus on DirectX 11's API and driver overhead, and Respawn's solution to this is simple enough while highlighting that this is an issue developers need to work with.

"Currently we're running it so that we leave one discrete core free so that DX and the driver can let the CPU do its stuff. Originally we were using all the cores that were available - up to eight cores on a typical PC. We were fighting with the driver wanting to use the CPU whenever it wanted so we had to leave a core for the driver to do its stuff," Baker adds.

Going forward, the quest to parallelise the game's systems continues - particle rendering is set to get the multi-threaded treatment, while the physics code will be looked at again in order to get better synchronisation across multiple cores.

From the titanfall tech talk, there is evidence that dx11.x1 does not have any of the multithreaded draw call performance of mantle or d3d12. 1.6Ghz is hella slow for a single thread to power a dense render function.
 
Ignoring x1 for the moment because I don't think the current crop of games out today had enough time to architect their games properly for esram; on the ps4 side there have been many indications that they are indeed CPU bound, if I recall sucker punch indicated that CPU was their largest bottleneck. I also do recall reading a review on some ps4 games that indicated a drop in resolution would not have improved the performance of the frame rate indicating that the bottleneck was not GPU side of things. Citation needed here which I am looking for at the moment.
The reason the PS4's CPU would be a bottleneck is because it's not very fast, and not because it has a clunky API weighing it down. Proof against this would be to find a similar SOC in a PC running a cross platform game and comparing results with and without Mantle versus PS4. If it's comparable to PS4 without Mantle, and much faster with Mantle, that'll show PS4's current API is as suboptimal as DX11 on PC - a first for consoles.
 
The reason the PS4's CPU would be a bottleneck is because it's not very fast, and not because it has a clunky API weighing it down. Proof against this would be to find a similar SOC in a PC running a cross platform game and comparing results with and without Mantle versus PS4. If it's comparable to PS4 without Mantle, and much faster with Mantle, that'll show PS4's current API is as suboptimal as DX11 on PC - a first for consoles.

Indeed. But I guess it comes down to what you define as overhead. Post 126 I wouldn't call it traditional overhead but multithreaded performance, not quite the same as just reducing the overhead of a draw call.

I don't know much about ps4 language, but if I recall OpenGL can't do multithreaded draw calls if I am correct, and ps4 leverages OpenGL. If the ps4 cannot leverage multithreaded draw calls then in theory it should benefit from mantle then?
 
PS4 doesn't leverage OpenGL. It uses GNM.

DF Interview said:
"At the lowest level there's an API called GNM. That gives you nearly full control of the GPU. It gives you a lot of potential power and flexibility on how you program things. Driving the GPU at that level means more work."
More here. It's worth noting that the easier API, GNMX, has a 'significant CPU overhead', so it's true that games on PS4 could be quite hampered by API choice. I suppose that in theory, if DX12 whatever on XB1 is far more efficient that GNMX on PS4, that could make up a performance deficit, but GNM based titles will see little overhead on PS4.
 
PS4 doesn't leverage OpenGL. It uses GNM.

More here. It's worth noting that the easier API, GNMX, has a 'significant CPU overhead', so it's true that games on PS4 could be quite hampered by API choice. I suppose that in theory, if DX12 whatever on XB1 is far more efficient that GNMX on PS4, that could make up a performance deficit, but GNM based titles will see little overhead on PS4.

According to Microsoft DX12 will have two level of functionality, one low level and the other one would be a superset of DX11 rendering functionality.
 
Ignoring x1 for the moment because I don't think the current crop of games out today had enough time to architect their games properly for esram; on the ps4 side there have been many indications that they are indeed CPU bound, if I recall sucker punch indicated that CPU was their largest bottleneck. I also do recall reading a review on some ps4 games that indicated a drop in resolution would not have improved the performance of the frame rate indicating that the bottleneck was not GPU side of things. Citation needed here which I am looking for at the moment.

And as for ps4 having a slower CPU than x1, could it be all the hyper visor that x1 needs to hurdle through while ps4 does not resulting in a net positive for ps4?



From the titanfall tech talk, there is evidence that dx11.x1 does not have any of the multithreaded draw call performance of mantle or d3d12. 1.6Ghz is hella slow for a single thread to power a dense render function.

Borrowing from mosens earlier post in this thread:

Looking ahead, we’re stoked about the releases of Xbox One and Windows 8.1! The process that got us to this point will continue to drive our future releases. We are getting excellent feedback from the industry around the areas that are most important for future API development, and that feedback is directly informing our Direct3D development direction. We’re continually innovating in areas of performance, functionality and debug and performance tooling for Xbox One. We’re also working with our ISV and IHV partners on future efforts, including bringing the lightweight runtime and tooling capabilities of the Xbox One Direct3D implementation to Windows, and identifying the next generation of advanced 3D graphics technologies.

With Xbox One we have also made significant enhancements to the implementation of Direct3D 11, especially in the area of runtime overhead. The result is a very streamlined, “close to metal” level of runtime performance.

When we started off the project, AMD already had a very nice DX11 design. The API on top, yeah I think we'll see a big benefit. We've been doing a lot of work to remove a lot of the overhead in terms of the implementation and for a console we can go and make it so that when you call a D3D API it writes directly to the command buffer to update the GPU registers right there in that API function without making any other function calls. There's not layers and layers of software. We did a lot of work in that respect.

We also took the opportunity to go and highly customise the command processor on the GPU. Again concentrating on CPU performance... The command processor block's interface is a very key component in making the CPU overhead of graphics quite efficient. We know the AMD architecture pretty well - we had AMD graphics on the Xbox 360 and there were a number of features we used there. We had features like pre-compiled command buffers where developers would go and pre-build a lot of their states at the object level where they would [simply] say, "run this". We implemented it on Xbox 360 and had a whole lot of ideas on how to make that more efficient [and with] a cleaner API, so we took that opportunity with Xbox One and with our customised command processor we've created extensions on top of D3D which fit very nicely into the D3D model and this is something that we'd like to integrate back into mainline 3D on the PC too - this small, very low-level, very efficient object-orientated submission of your draw [and state] commands.

It's pretty clear MS have already done a lot of work on DX11.X to reduce the overhead over the PC instance of DX11. So I think it's pretty unrealistic to expect DX12 to bring huge gains. Or at least nothing in the order of what it and Mantle will bring to the PC over DX11.
 
Borrowing from mosens earlier post in this thread:
It's pretty clear MS have already done a lot of work on DX11.X to reduce the overhead over the PC instance of DX11. So I think it's pretty unrealistic to expect DX12 to bring huge gains. Or at least nothing in the order of what it and Mantle will bring to the PC over DX11.

I definitely am not disagreeing with any of the close to the metal API for X1 and PS4. But I seem to be at some point of confusion or at the least, I recognize I am the odd man out in seeing the discrepancy here:
I thought mantle provided:

A) low overhead for draw calls, direct to the metal
B) multithreaded draw calls leveraged by multi core CPUs to bring the amount of draw calls closer to 100,000 draw calls per frame which was what the star swarm demo was about.

I am under the assumption they are exclusive of each other, but that (A) assists (B) in reaching the goal of 100K draw calls due to lower overhead, but (B) is a separate feature set on its own as (A) would be beneficial in both single threaded renderers and multithreaded renderers and (B) is just about improving multithreaded rendering which earlier in this thread both sebbi and MJP pointed out was so bad in dx11 that it wasn't worthwhile pursuing for the most part.

I think everyone agrees that (A) is found on both consoles, but I don't see evidence that (B) is unless there is something I am missing.

If (B) is not leveraged in either console then in scenarios where draw calls get enormous, a weak CPU should see heavy dips in frames; characteristic from the digital foundry articles with the launch window games.
 
Draw calls aren't a problem on PS4 at least. Infamous SS developers already noted that the drawcalls it pushes would cause issues for PC's on DX11. Probably an exaggeration but it does highlight that this must be a PS4 strength as compared to DX11.
 
Draw calls aren't a problem on PS4 at least. Infamous SS developers already noted that the drawcalls it pushes would cause issues for PC's on DX11. Probably an exaggeration but it does highlight that this must be a PS4 strength as compared to DX11.

Thanks, yes in this context I believe so as well
 
I don't know that what I'm going to say is true or not (Please, correct me if I'm wrong). I think we had part of the answer in our hands for last 3-4 days. We had some slides about DX12 from GDC 2014 and Build 2014 in this short period of time. Take a closer look at this slide:

xxNpozh.jpg


Some features are already on XB1:

1) Bundles which is part of "CPU Overhead: Redundant Render Commands" (Page 26).
26949418242381796518.jpg


2) Nearly zero D3D resource overhead which should be part of "Direct3D 12 – Command Creation Parallelism" (Page 33).

GDC slide implying that these are parts of the DX12 which could be find on XB1 today.

Features that aren't on XB1:

1) Pipeline State Objects (PSOs).
32126815023753305173.jpg


2) Resource Binding.

20444746290927439108.jpg


These features will be available on XB1 later. Also "Descriptor Heaps & Tables" which is a sort of bindles rendering (page 19 under the "CPU Overhead: Redundant Resource Binding") would be possible only on GPUs that are fully DX11.2 capable (tier 2) and beyond. Considering that both DX11.2 and DX12 were announced for XB1 and DX team is prototyping DX12 on XB1 HW right now, it's likely that Descriptor Heaps & Tables will be available on XB1, too.

85024947937976908740.jpg

So based on this findings (correct/forgive me if I'm wrong) I think the impact of DX12 on XB1 would be considerable. Every one can download/see the PPT/presentation video files from here.

Edit: According to presenter "Descriptor Heaps & Tables" are essential for using Bundles on DX12. Presenter at GDC 2014 called them resource tables:

In this video (2:21) the presenter talks about resource tables for creating a complex set of resources which is already available on XB1 while it's new to Direct3D 12 and is useful alongside bundles for more efficiency.
http://forum.beyond3d.com/showpost.php?p=1838040&postcount=7747

So can I say that XB1 is a "Tier 2" DX11.2 capable GPU?
 
Last edited by a moderator:
Back
Top