DirectX 12: The future of it within the console gaming space (specifically the XB1)


Well if I understand this correct the multiple command processors in the xbone will lead to a better gpu usage. It would be interesting to know if the current directx on the console can handle this right now. If the draw calls are almost single threaded right now, I would not be suprised if it is not used until some big software updates. E.g. dx12
 
You're suggesting people are currently unable to use the full capability of the hardware? Seems rather unlikely.
 
You're suggesting people are currently unable to use the full capability of the hardware? Seems rather unlikely.

well multiple command processors would only really work if the cpu would feed them at the same time. according through the document those command processors could be used to calculate multiple things with the same "hardware" if those two things (operations) use different parts of the hardware. this only means the hardware is used more efficiently.
and while it seems the current implemented directx is still bound on one cpu-thread, the command processors may not be used that way, right now. it may be just because the software side for the xbone was not ready. they are way behind schedule.

from that linked document:
For example, having multiple command processors would allow rendering shadows at the same time as filling G-Buffers or shading the previous frame. Having such drastically different tasks live on the GPU at the same time could make a better usage of the GPU as both tasks will probably have different hardware bottleneck.
but it sound like this is heavely dependent of the developer to optimize for this.
it would not add unused hardware of anything, just use the current hardware more efficient.
 
*ahem* Let's not be stupid and trolling with such statements as claiming others are talking as if XboxOne has 5tflops. Especially when they are talking about improved efficiency. Do so at your own risk of receiving a nice vacation from the site. Yes, I'm talking about you ThePissartist
 
I don't think that I understand the idea behind 2 (or more) graphic command processors correctly. Do it means that with 2 or more graphic command processors, GPU can work on more than one frame (with different scheduling) concurrently?
 
your command buffers are list of commands - AMD ones allows also conditionals for creating small loops or skip parts of the list if needed.
Imagine you have one queue, and a slow command -which is not using all the resources of your GPU- is executing.
Instead of waiting for its completion, your GPU could fetch command from another microengine.

Example: you need to feed constants and commands - until you feed all constants, you cannot go with your commands. But if you fill your constants from another queue, you could do a parallel jobs of submitting the constants, not blocking the main queue (as far as I understand it).
 
Last edited by a moderator:
your command buffers are list of commands - AMD ones allows also conditionals for creating small loops or skip parts of the list if needed.
Imagine you have one queue, and a slow command -which is not using all the resources of your GPU- is executing.
Instead of waiting for its completion, your GPU could fetch command from another microengine.

Example: you need to feed constants and commands - until you feed all constants, you cannot go with your commands. But if you fill your constants from another queue, you could do a parallel jobs of submitting the constants, not blocking the main queue (as far as I understand it).

Sound like a great efficiency enhancer...

We have some other exemple of this 2+2 multiple commands implementation or this is the first time that such customization has been introduced?
 
Sound like a great efficiency enhancer...

We have some other exemple of this 2+2 multiple commands implementation or this is the first time that such customization has been introduced?

Cerny said:

The original AMD GCN architecture allowed for one source of graphics commands, and two sources of compute commands.

http://www.extremetech.com/gaming/154924-secrets-of-the-ps4-heavily-modified-radeon-supercharged-apu-design

And the Hawaii block diagram shows one graphic command processor (and 8 ACEs).

Hawaii-Block-Diagram.jpg


So it should be new, right?
 
Last edited by a moderator:
Couldn't agree more with that list. Many very important features listed there.

3.10... "ballotAMD" :). For the people who don't see it right away (it wasn't mentioned in this article either), this allows you to do efficient inside wave prefix sums (without atomics / serialization). Super useful for many things.

Quote from the OpenGL 5 candidate feature list:
For example, having multiple command processors would allow rendering shadows at the same time as filling G-Buffers or shading the previous frame. Having such drastically different tasks live on the GPU at the same time could make a better usage of the GPU as both tasks will probably have different hardware bottleneck.
Good example of this is shadow map rendering. It is bound by fixed function hardware (ROPs and primitive engines) and uses very small amount of ALUs (simple vertex shader) and very small amount of bandwidth (compressed depth buffer output only, reads size optimized vertices that don't have UVs or tangents). This means that all TMUs and huge majority of the ALUs and bandwidth is just idling around while shadows get rendered. If you for example execute your compute shader based lighting simultaneously to shadow map rendering, you get it practically for free. Funny thing is that if this gets common, we will see games that are throttling more than Furmark, since the current GPU cooling designs just haven't been designed for constant near 100% GPU usage (all units doing productive work all the time).
 
im not quite sure i follow where the dual graphics command processors stuff is coming from, my understanding was that it had 1 graphics command processors and multiple compute command processors as is normal.

edit: It seems that that the second graphics command pipe is a HP system pipe for doing things such as overlays it is unlikely that is accessible by the developer.

To facilitate this, in addition to asynchronous compute queues, the Xbox One hardware supports two concurrent render pipes. The two render pipes can allow the hardware to render title content at high priority while concurrently rendering system content at low priority. The GPU hardware scheduler is designed to maximise throughput and automatically fills "holes" in the high-priority processing. This can allow the system rendering to make use of the ROPs for fill, for example, while the title is simultaneously doing synchronous compute operations on the Compute Units.

http://www.eurogamer.net/articles/digitalfoundry-the-complete-xbox-one-interview
 
Last edited by a moderator:
my understanding was that it had 1 graphics command processors and multiple compute command processors as is normal.
Yes, there's nothing new. Both AMD GCN and NVIDIA Kepler have sported multiple compute command queues and have supported asynchronous compute since their launch. However PC graphics APIs have not supported this feature yet (CUDA / OpenCL 2.0 do). So obviously people are speculating whether OpenGL 5.0 and/or DirectX 12 will finally bring the support for this important feature.
 
Yes, there's nothing new. Both AMD GCN and NVIDIA Kepler have sported multiple compute command queues and have supported asynchronous compute since their launch. However PC graphics APIs have not supported this feature yet (CUDA / OpenCL 2.0 do). So obviously people are speculating whether OpenGL 5.0 and/or DirectX 12 will finally bring the support for this important feature.

But in the Digital Foundry interview, the 2 architects seems to imply in at least 2 different occasions that their design of command processor is not the standard one:

"We also took the opportunity to go and highly customise the command processor on the GPU."

"We've got the SHAPE, the more efficient command processor [relative to the standard design], we've got the clock boost....".
 
Yes, there's nothing new. Both AMD GCN and NVIDIA Kepler have sported multiple compute command queues and have supported asynchronous compute since their launch. However PC graphics APIs have not supported this feature yet (CUDA / OpenCL 2.0 do). So obviously people are speculating whether OpenGL 5.0 and/or DirectX 12 will finally bring the support for this important feature.

So one of the two graphic command processors on XB1 is only for system (UI) and the other one is for games?!

1200x-1
 
So one of the two graphic command processors on XB1 is only for system (UI) and the other one is for games?!

1200x-1

Yes

To facilitate this, in addition to asynchronous compute queues, the Xbox One hardware supports two concurrent render pipes. The two render pipes can allow the hardware to render title content at high priority while concurrently rendering system content at low priority. The GPU hardware scheduler is designed to maximise throughput and automatically fills "holes" in the high-priority processing. This can allow the system rendering to make use of the ROPs for fill, for example, while the title is simultaneously doing synchronous compute operations on the Compute Units.

http://www.eurogamer.net/articles/digitalfoundry-the-complete-xbox-one-interview

But in the Digital Foundry interview, the 2 architects seems to imply in at least 2 different occasions that their design of command processor is not the standard one:

"We also took the opportunity to go and highly customise the command processor on the GPU."

"We've got the SHAPE, the more efficient command processor [relative to the standard design], we've got the clock boost....".

HP pipe -> customisation
 
So one of the two graphic command processors on XB1 is only for system (UI) and the other one is for games?![/IMG]
I wasn't talking about Xbox One. I was talking about PC. I was commenting on the OpenGL 5.0 feature list regarding to Kepler and GCN. OpenGL and Kepler are PC products.
 
I wasn't talking about Xbox One. I was talking about PC. I was commenting on the OpenGL 5.0 feature list regarding to Kepler and GCN. OpenGL and Kepler are PC products.

You think for X1 things are different from what Betanumerical believes?
 
sebbbi almost certainly isn't allowed to talk about the consoles' functionality/implementation. So quit asking.
 
You think for X1 things are different from what Betanumerical believes?
Betanumerical posted an official slide from the Microsoft Xbox One Hot Chips conference presentation. Should be legit info.
sebbbi almost certainly isn't allowed to talk about the consoles' functionality/implementation. So quit asking.
Yes. I will only ever quote official documents/presentations and known public information about the consoles.
 
From what i can see at this quote (from EG/DF):

To facilitate this, in addition to asynchronous compute queues, the Xbox One hardware supports two concurrent render pipes. The two render pipes can allow the hardware to render title content at high priority while concurrently rendering system content at low priority. The GPU hardware scheduler is designed to maximise throughput and automatically fills "holes" in the high-priority processing. This can allow the system rendering to make use of the ROPs for fill, for example, while the title is simultaneously doing synchronous compute operations on the Compute Units.

XB1 benefits from what sebbbi said at his post rigth now, But the difference is that in this example the system is using ROPs for rendering and title is using CUs for synchronous compute operations.

Good example of this is shadow map rendering. It is bound by fixed function hardware (ROPs and primitive engines) and uses very small amount of ALUs (simple vertex shader) and very small amount of bandwidth (compressed depth buffer output only, reads size optimized vertices that don't have UVs or tangents). This means that all TMUs and huge majority of the ALUs and bandwidth is just idling around while shadows get rendered. If you for example execute your compute shader based lighting simultaneously to shadow map rendering, you get it practically for free. Funny thing is that if this gets common, we will see games that are throttling more than Furmark, since the current GPU cooling designs just haven't been designed for constant near 100% GPU usage (all units doing productive work all the time).

But I have no idea about 2 graphics command processors on XB1. PS4 has 2 graphics command processors like XB1 but one of them is exclusive for system (VShell) and has no compute capabilities like the other one which is exclusive for games. But it seems that on XB1 both graphics command processors are the same. So it should be possible to use both of them for games (while the system didn't need them).

Betanumerical posted an official slide from the Microsoft Xbox One Hot Chips conference presentation. Should be legit info.

Actually I posted the official slide, not him. ;)
 
Last edited by a moderator:
Back
Top