DirectX 12: The future of it within the console gaming space (specifically the XB1)

Originally Posted by Christophe Riccio
it could allow thing we could do in cross fire / SLI like rendering simultaneously two frames at a time ... graphics programmers should thing twice when they want to submit multi command buffer simultaneously because it doesn't make any sense from a hardware design point of view
One obvious application for rendering two frames simultaneously would be stereoscopic rendering.

I'm not sure simultaneous two-view rendering really needs two completely independent command processors, since you are actually rendering the very same frame with the very same geometry and shaders, so this could be worked out by clever API/driver spec and corresponding graphics hardware design to support simultaneous stereoscopic rendering.

However it seems like the current brute force approach - rendering two independent frames in a sequence then presenting them as a stereoscopic framebuffer - makes life much easier for game programmers, driver programmers and API designers alike. Oh well..
 
One obvious application for rendering two frames simultaneously would be stereoscopic rendering.

I'm not sure simultaneous two-view rendering really needs two completely independent command processors, since you are actually rendering the very same frame with the very same geometry and shaders, so this could be worked out by clever API/driver spec and corresponding graphics hardware design to support simultaneous stereoscopic rendering.

However it seems like the current brute force approach - rendering two independent frames in a sequence then presenting them as a stereoscopic framebuffer - makes life much easier for game programmers, driver programmers and API designers alike. Oh well..

No, Riccio says that with multiple command processors it's possible to render two different part of two frame simultaneously. For example while rendering shadows for frame one, it's possible to do some shading for frame two simultaneously. He says doing two (or more) different tasks that have different GPU hardware bottlenecks would make better utilization of the hardware.

Also from XB1 architects interview with Eurogamer it seems that they are using both command processors simultaneously for rendering system and game content with different priority. There is no exclusive command processor for system or games. This is the scheduler task to finds out when is the best time for GPU to do low priority (system) rendering (low priority rendering has some restriction compared to high priority rendering). Even when there is a low priority rendering, GPU could continue it's high priority (game) rendering on different parts of the hardware.

The GPU hardware scheduler is designed to maximise throughput and automatically fills "holes" in the high-priority processing. This can allow the system rendering to make use of the ROPs for fill, for example, while the title is simultaneously doing synchronous compute operations on the Compute Units.

But it's not clear to me that is it possible for developers to use both command processors simultaneously or not? Or they have the option to use both of them but using them needs some changes in their engines?
 
But it's not clear to me that is it possible for developers to use both command processors simultaneously or not? Or they have the option to use both of them but using them needs some changes in their engines?

It seems something practically unprecedented in the industry.

So I believe that all the engines need to be re-written for it, and it will take time and effort to optimize its implementation.

Who knows if it is already accesible to developers or if it need DX12 ?
 
But it's not clear to me that is it possible for developers to use both command processors simultaneously or not? Or they have the option to use both of them but using them needs some changes in their engines?

A very good point, is it normal for developers to have programmatic access to things such as command schedulers on a GPU? Or are those more analagous to devices like cache controllers on the CPU where it 'just' happens?

The dual command queue sounds eerily similar to Intel's Netburst HyperThreading where two command queues shared one set of execution hardware to maximise throughput on memory load/store stalls. While there were other limitations to Netburst it was a monster at media encoding thanks to this IIRC.
 
A very good point, is it normal for developers to have programmatic access to things such as command schedulers on a GPU? Or are those more analagous to devices like cache controllers on the CPU where it 'just' happens?

The dual command queue sounds eerily similar to Intel's Netburst HyperThreading where two command queues shared one set of execution hardware to maximise throughput on memory load/store stalls. While there were other limitations to Netburst it was a monster at media encoding thanks to this IIRC.

I think developers don't need to access scheduler, they only need to set priority on their rendering pass so when the GPU is doing a higher priority rendering, the lower priority rendering works with some restrictions so it wont interfere with higher priority rendering.
 
I think developers don't need to access scheduler, they only need to set priority on their rendering pass so when the GPU is doing a higher priority rendering, the lower priority rendering works with some restrictions so it wont interfere with higher priority rendering.

Makes sense, is it a binary high/low thing or are there more gradations than that? Or would that even be useful?
 
Makes sense, is it a binary high/low thing or are there more gradations than that? Or would that even be useful?

I am not sure but seeing the patents that they filled it seems that there should be a gradations of priorities, and I think it could be useful. If you'r interested I can give you the patents links.
 
Patents aren't any indication of what's actually implemented - a good patent would cover the possibility of gradations even if your current technology can only implement two. To know what the hardware is capable of, you have to refer to developer docs,
 
One obvious application for rendering two frames simultaneously would be stereoscopic rendering.

I'm not sure simultaneous two-view rendering really needs two completely independent command processors, since you are actually rendering the very same frame with the very same geometry and shaders,

Is this something new ? Is it just new for consoles ?
With 1 pc, 1gfx card, 2 monitors I was able to set up and run 2 independent copies of UT (1 on each monitor) hopefully to set up single pc multiplayer but I had issues with only 1 game would accept input at a time.
 
For example while rendering shadows for frame one, it's possible to do some shading for frame two simultaneously. He says doing two (or more) different tasks that have different GPU hardware bottlenecks would make better utilization of the hardware.
So it's more like multiple passes on the same frame and not actually two different frames?

Because games typically start rendering a new frame only after previous frame has finished rendering, and I do not get it why and how would they start rendering two separate frames instead...

With 1 pc, 1gfx card, 2 monitors I was able to set up and run 2 independent copies of UT (1 on each monitor)
You are kidding, right?

(In case you aren't, two copies of UT on each separate monitor is not stereoscopy).
 
Patents aren't any indication of what's actually implemented - a good patent would cover the possibility of gradations even if your current technology can only implement two. To know what the hardware is capable of, you have to refer to developer docs,

You'r right, but yet they are better than nothing (we haven't any access to developer docs ). Also I said I'm not sure about that. But it's very natural to have a gradations of priorities on a GPU with two command processors, since they make processing and utilization easier.

I wont continue this discussion any farther, since it's based on patents. ;)

So it's more like multiple passes on the same frame and not actually two different frames?

Because games typically start rendering a new frame only after previous frame has finished rendering, and I do not get it why and how would they start rendering two separate frames instead...

You are kidding, right?

(In case you aren't, two copies of UT on each separate monitor is not stereoscopy).

It's about 100% GPU utilization 100% of the time (at it's best case). So if it's possible to do this on one frame then there should be no problem and if it's not possible to do it on one frame you have to use more than one frame (two frames) simultaneously to reach your goal (100% GPU utilization).

I'm not a developer but I think it shouldn't be that hard, if you know what's going to happen on next frame (controller input, objects movement, AI behavior, ...) it should be possible to start the processing of the next frame before the processing of the current frame ends. I'm not sure but it's how I look at it.
 
No, Riccio says that with multiple command processors it's possible to render two different part of two frame simultaneously. For example while rendering shadows for frame one, it's possible to do some shading for frame two simultaneously.
Two command processors don't magically allow you to render two things simultaneously. To go above what most (all?) GPUs have today (pipelined rendering) you need to duplicate parts of the render pipeline or allow newer work to pass older work in the pipeline whenever the older work can't make progress.
 
Two command processors don't magically allow you to render two things simultaneously. To go above what most (all?) GPUs have today (pipelined rendering) you need to duplicate parts of the render pipeline or allow newer work to pass older work in the pipeline whenever the older work can't make progress.

Why magically if each rendering pass has different GPU hardware bottlenecks?

Also it could allows to rendering independent rendering passes simultaneously which could provide a better utilization of the hardware as typically each rendering pass has different GPU hardware bottlenecks. For example if we could do the rendering of the shadows and the some shading simultaneously.

I asked what you'r saying and it's his response:

I am not completely sure about the consequence for the fixed function hardware. There are already different task live in a GPU. Most probably the multi command processor would have to share the pool of graphics context.
 
Why magically if each rendering pass has different GPU hardware bottlenecks?
The shaders are the place where it's easy to work on different things simultaneously. That's why async compute works well. If you want graphics workloads to benefit you need to duplicate all of the fixed function logic that exists prior to launching work into the shaders.

While possible this is usually more than just the command processors. Graphics hardware often has fixed function logic for index/vertex buffer fetches, vertex reuse and other tasks prior to launching a VS which is the first stage in the graphics pipeline.

My only point is anyone thinking doubling command processors gives the same workload overlap benefit of async compute is over simplifying things.
 
The shaders are the place where it's easy to work on different things simultaneously. That's why async compute works well. If you want graphics workloads to benefit you need to duplicate all of the fixed function logic that exists prior to launching work into the shaders.

While possible this is usually more than just the command processors. Graphics hardware often has fixed function logic for index/vertex buffer fetches, vertex reuse and other tasks prior to launching a VS which is the first stage in the graphics pipeline.

My only point is anyone thinking doubling command processors gives the same workload overlap benefit of async compute is over simplifying things.

I had the same concern (I thought it needs more than only adding additional command processors) so I asked that question from him.

But if you read XB1 architects answer to Eurogamer they talked about having synchronous compute operations (not async compute) simultaneously while the system rendering uses the ROPs for fill.

This can allow the system rendering to make use of the ROPs for fill, for example, while the title is simultaneously doing synchronous compute operations on the Compute Units.
What's your thought about this? They are using ACEs or second command processor for synchronous compute?
 
I had the same concern (I thought it needs more than only adding additional command processors) so I asked that question from him.

But if you read XB1 architects answer to Eurogamer they talked about having synchronous compute operations (not async compute) simultaneously while the system rendering uses the ROPs for fill.

What's your thought about this? They are using ACEs or second command processor for synchronous compute?
Compute doesn't use the graphics pipeline so once the draw enters the pipe synchronous compute can start. Synchronous means it has to wait until it's read out of a queue before it can start and all prior graphics work has been read out. It doesn't mean the graphics work had to finish before compute can start. Async compute just means compute doesn't even share the same queue and thus can bypass even more graphics work should it be backed up.
 
Compute doesn't use the graphics pipeline so once the draw enters the pipe synchronous compute can start. Synchronous means it has to wait until it's read out of a queue before it can start and all prior graphics work has been read out. It doesn't mean the graphics work had to finish before compute can start. Async compute just means compute doesn't even share the same queue and thus can bypass even more graphics work should it be backed up.

But if you want to use async compute for graphics (simultaneously with other graphics job) you need to sync it's beginning and end with graphics pipeline, right? (see Graham post here and BF4 presentation here, page 35-43).
 
But if you want to use async compute for graphics (simultaneously with other graphics job) you need to sync it's beginning and end with graphics pipeline, right? (see Graham post here and BF4 presentation here, page 35-43).
Yes, if tasks need to communicate at some point there will be synchronization. With async compute the sync point is dictated by software.
 
Back
Top