DirectX 12: The future of it within the console gaming space (specifically the XB1)

The question isn't really about how low level the api's are. The GNM api is probably as low level as you can get since it is specific to that device. The question I'm curious about is..is the GNM api currently capable of all of the features that we are seeing in DirectX 12 and Vulkun. The main one being the ability to utilize multiple CPU cores efficiently to feed the GPU.
Why wouldn't it? This was always a limitation of software APIs rather than the hardware. GNM looks to have little-to-no software API overhead and I can't imagine why AMD/Sony would architect an APU where access to the GPU command buffers were limited to specific cores.
 
There are differences between the DX12 api and the Xbox One api. For example, looking at the leaked sdk doc, xbox one seems to use the old DX11 DispatchIndirect and DrawIndirect API calls, instead of the DX12 ExecuteIndirect. Unlike Xbox 360, Xbox One seemed to start with a very vanilla/PC implementation of the D3D API. Things improved over time, and "fast semantics" were added, but the implementation of the D3D API still appears different from DX12. Whether that means DX12 will bring performance improvements, I can't say, but I don't think the API on Xbox One and D3D12 are equivalent.
 
I don't think the API on Xbox One and D3D12 are equivalent.

I wrote my first DX12 application . Rudimentary , paints the screen cornflowerblue...

I know Dx11 api pretty well, and have also gone thru the xdk in depth.

Even with the rudimentary pipeline I setup, it's very VERY different to Dx11 ..

sample dx12 code

I was told many MANY months ago from friends that doing something as simple as rendering a triangle is so much more different than what we ever had to do with Dx11.. And I can see now that they were right :)
 
A trusted dev (Matt) tells on GAF, the improvement of Xbox One API is huge much more than PS4 API but it was needed. Sony API were ready before MS one. I think the result will be visible for end of 2015 lineup and after.
 
But wasn't the porting super easy? :cry:

D3D12 is very different to D3D11 (from what I've seen) ... BUT MS have provided something called "Direct3D 11on12" that is meant to ease porting Dx11 code to Dx12..

You can choose to break from the past and go all in with dx12 apis exclusively .. or you can use "11on12" :)

p.s. surprisingly dx12 is still COM based ;)
 
Why wouldn't it? This was always a limitation of software APIs rather than the hardware. GNM looks to have little-to-no software API overhead and I can't imagine why AMD/Sony would architect an APU where access to the GPU command buffers were limited to specific cores.
I assumed the same. But that GDC slide while it's not confirmation of proof, I don't see why it couldn't dispatch all the compute shader calls on multiple
cores. Unless of course they weren't trying to be multithreaded on purpose.

Little to no overhead and providing tools to developers are different things. We could just have them write in assembly and get no overhead but the project would never be done.

Efficiency is still the key, at the end of the day you want the best performance out in the shortest amount of time.
 
I assumed the same. But that GDC slide while it's not confirmation of proof, I don't see why it couldn't dispatch all the compute shader calls on multiple
cores. Unless of course they weren't trying to be multithreaded on purpose.

Could you link or point to the slide in question?
 
Could you link or point to the slide in question?
My assumption is that the charts are for Xbox and PS4. Since that is the name of the slide deck.

http://www.gdcvault.com/play/1020939/Efficient-Usage-of-Compute-Shaders

It begins at Slide 22 as their first attempts to get compute shaders to run. Results are shown. They explicitly write 'Too Many Dispatch'.

According to Aanadtech, they did not require 6 Cores submitting work to bottleneck the GCP of their AMD card for the Oxide demo. Furthermore, their findings show that the Oxide demo had completed it's run of both AI and mass draw calls with just 2 cores. To be fair it's not an Apples to Apples comparison, but PS4 and Xbox One aren't running 290x/and 980 respectively, so it's not like those consoles could effectively handle the loads of a Corei7 anyway.

So the behavior of what I'm looking at on slide 24, with the all CPU no GPU is something at least in my opinion characteristic of a single threaded renderer. I don't think 6 jaguar cores @1.6ghz would be this far behind in a multithreaded renderer. I posted this earlier, but it was lost in the noise.

In any event, I think it's easy to agree that this slide applies to Xbox One, no matter what. In question is whether it applies to PS4.
 
Last edited:
My assumption is that the charts are for Xbox and PS4. Since that is the name of the slide deck.

http://www.gdcvault.com/play/1020939/Efficient-Usage-of-Compute-Shaders

It begins at Slide 22 as their first attempts to get compute shaders to run. Results are shown. They explicitly write 'Too Many Dispatch'.

According to Aanadtech, they did not require 6 Cores submitting work to bottleneck the GCP of their AMD card for the Oxide demo. Furthermore, their findings show that the Oxide demo had completed it's run of both AI and mass draw calls with just 2 cores. To be fair it's not an Apples to Apples comparison, but PS4 and Xbox One aren't running 290x/and 980 respectively, so it's not like those consoles could effectively handle the loads of a Corei7 anyway.

So the behavior of what I'm looking at on slide 24, with the all CPU no GPU is something at least in my opinion characteristic of a single threaded renderer. I don't think 6 jaguar cores @1.6ghz would be this far behind in a multithreaded renderer. I posted this earlier, but it was lost in the noise.

In any event, I think it's easy to agree that this slide applies to Xbox One, no matter what. In question is whether it applies to PS4.

They choose PS4 version as reference in the presentation..
 
They choose PS4 version as reference in the presentation..
Please expand on this with at least with some referral or reference points we can look at, and if you could please accompany it with some text that we can use to at least understand what you're thinking. Because you've written a lot the last 20 posts or so, but none of your posts have supporting evidence to back your claims. If you are making educated guesses or posting a fact, at least let us know you're doing that, and what evidence would support your guess.

ie claims like this
The drawcall limit on PS3 and Xbox 360 comes from limited power, same things for new consoles with good mow level API...

or this

Draw Calls limit and multithreading limit command buffer are a PC problem comong from Direct X implementation before DX12 not a console problem not on PS3 not on 360 and not on PS4 and maybe low level API of Xbox One .

Yet evidence here shows that with a low level API PS4 is (and I'll just copy and paste this from dualshockers, I'm not putting much effort here)
The PS4’s CPU is defined “decent,” able to handle 30,000 draws instanced in 10,000 actual draw calls, 100-400 asynchronous raycasts per frame, 50-100 animated characters with 300+ bones, even if prefetch is not a replacement for the SPU’s direct memory access.

It was also hard to max out the CPU (while we learned in a previous interview that the GPU is used at its maximum capacity most of the time), with 50-70% used for main jobs and 5-16% for other threads.
Source: http://www.dualshockers.com/2014/04...am-cpu-and-gpu-compute-to-make-our-jaws-drop/

30K draw calls on PS4 instanced to 10K. I assume that is per frame.
30 frames per second.. so about 900,000 draw calls per second.
This is multithreaded performance:
73054.png


That puts GNM about 50% better than DX11 Single Threaded/ Multithreaded. About 1/5 the performance of DX12 multithreaded.
 
Last edited:
Please expand on this with at least with some referral or reference points we can look at, and if you could please accompany it with some text that we can use to at least understand what you're thinking. Because you've written a lot the last 20 posts or so, but none of your posts have supporting evidence to back your claims. If you are making educated guesses or posting a fact, at least let us know you're doing that, and what evidence would support your guess.


First I don't think use a PC benchmark of Oxide to do conclusion about console is very scientific. page 70 The PS4 version. Out of a few slide few mention of Xbox One version in Ubi slide.;)
 
And about draw call on consoles it is an old subject. You can use the search function and PS3 and Xbox 360 were better than PC for a long time.

On Infamous presentation they gave the number of drawcall for the title on PS4 using the Jaguar CPU it is approximately 30000. Ahead of what I have seen on DirectX 11 for better CPU than the Jaguar...
 
And about draw call on consoles it is an old subject. You can use the search function and PS3 and Xbox 360 were better than PC.

On Infamous presentation they gave the number of drawcall for the title on PS4 using the Jaguar CPU it is 30000. Ahead of what I have seen on DirectX 11 for better CPU than the Jaguar...
You still haven't convinced me of anything that GNM has all these features or won't need them because of it's low level nature.

Are you claiming that 30,000 draw calls competes with 4.5 Million draw calls? Or that we'll never need more than 30,000 draw calls? Because that's the change that could happen.

An 8 Core jaguar @ 1.6Ghz vs a 4 core 3.3 Ghz processor seems fair if all those cores are going to be used.

First I don't think use a PC benchmark of Oxide to do conclusion about console is very scientific. page 70 The PS4 version. Out of a few slide few mention of Xbox One version in Ubi slide.;)
I've posted 3DMark.

So now the Ubisoft slide is out because you don't care to admit that as evidence. Sure so now we're comparing 3DMark (4.5Million per second) vs Infamous Second Son numbers (900,000 per second).

While I'm unsure as to the application of this for games, or even performance, you see the two numbers do not line up, and to suggest 4.5M and 900K are equivalent is wrong. Therefore, it is in my strong opinion that GNM does NOT have multithreaded rendering in the same way that DX12/Vulkan does.
 
Last edited:
You still haven't convinced me of anything that GNM has all these features or won't need them because of it's low level nature.

Are you claiming that 30,000 draw calls competes with 4.5 Million draw calls? Or that we'll never need more than 30,000 draw calls? Because that's the change that could happen.

An 8 Core jaguar @ 1.6Ghz vs a 4 core 3.3 Ghz processor seems fair if all those cores are going to be used.

30000 drawcall is from a game not a benchmark where the CPU don't do anything else.

We will see much better on PS4 and Xbox One when GPU will auto feeding themselves with GPU generated command like In Oxide engine or other true current gen engine.
 
Last edited:
And I never said than multithreading command buffer is not needed. I just said it is there in GNM...

Like it is in Mantle, Vulkan or DirectX 12.
 
Last edited:
My assumption is that the charts are for Xbox and PS4. Since that is the name of the slide deck.

http://www.gdcvault.com/play/1020939/Efficient-Usage-of-Compute-Shaders

It begins at Slide 22 as their first attempts to get compute shaders to run. Results are shown. They explicitly write 'Too Many Dispatch'.
Thanks but I don't think this is indicative of GNM being unable (or able) to take commands from multiple cores.

The whole presentation is really the dev teams journey to solve a problem (porting cloth simulation from CPU to GPU) and sharing their experiences of the path to a decent solution. The "too many dispatch" slide is really the fundamental lesson learned that the CPU will bottleneck when trying to send a metric ton of small identical jobs to the GPU and that it's better for some of the small jobs to be bundled together. I think this has always been the conventional wisdom, e.g. if you wanted four pieces of paper from the cupboard you wouldn't go to the cupboard four times collecting one piece of paper each time, you would go once and pickup four pieces of paper.

Ubisoft included this in the presentation because data on what doesn't work is as valuable as data on what does work.
 
Back
Top