GPU work creation

Ronaldo8 · Feb 27, 2021

From the early days of the Xbox One, MS has been trying to offload more and more tasks onto the GPU. When the series X was announced, Digital Foundry glossed over a new ability of the GPU to dispatch/schedule work between shaders without data leaving the GPU, in its haste to gush over the shiny new thing that is ray-tracing (pun intended). They may have missed the forest for the trees.

GPU-driven rendering is, in my view, a more consequential development than ray-tracing in its possible ramifications and there are indications that MS/AMD are about to hit the jackpot. The hints are thee-fold:

(1) MS openly advertised this new capability of their consoles though it failed to catch on with tech blogs more concerned with ray-tracing and mesh shaders. Quoting MS PR:
"Xbox Series X and Xbox Series S add hardware, firmware and shader compiler support for GPU work creation that provides powerful capabilities for the GPU to efficiently handle new workloads without any CPU assistance. This provides more flexibility and performance for developers to deliver their graphics visions."

How is MS going to pull this off ? Two further hints:

(2) AMD filed a patent application for the use of a heavily modified command processor (described as a co-processor) to allow for the creation and execution of child threads concurrent with parent threads without the need of a round-trip to the CPU or even main memory. (https://www.freepatentsonline.com/y2020/0089528.html)

(3) The goal of not writing back to main memory implies that the management of task queues and cache behaviour have been massively modified somehow. MS tells us how in another filing:
(https://www.freepatentsonline.com/20200090298.pdf).

scently · Feb 28, 2021

Modifying/customizing the command processor is something MS has been doing since the Xbox One. The intent was to reduce waiting for the CPU to initiate tasks for the GPU, allowing the GPU to do more on its own. This was further expanded in the X1X. ExecuteIndirect is one of the results of this. @iroboto can speak more about it but essentially it allows the GPU to initiate and manage drawcalls on its own without invocation from the CPU every time. I would assume that this ability and more have been expanded on XSX and DX12U.

OlegSH · Feb 28, 2021

Ronaldo8 said:
AMD filed a patent application for the use of a heavily modified command processor (described as a co-processor) to allow for the creation and execution of child threads concurrent with parent threads without the need of a round-trip to the CPU or even main memory. (https://www.freepatentsonline.com/y2020/0089528.html)

Why do you need a separate command processor for this?
As of now, Execution indirect is done via compute shaders filling in or modifying draw call's arguments and you don't need the round-trip to the CPU, how is this stuff any different?
GPU work creation is an evolution of current Execution indirect and I am pretty sure it can be done without any additional hardware.

Ronaldo8 · Feb 28, 2021

OlegSH said:
Why do you need a separate command processor for this?
As of now, Execution indirect is done via compute shaders filling in or modifying draw call's arguments and you don't need the round-trip to the CPU, how is this stuff any different?
GPU work creation is an evolution of current Execution indirect and I am pretty sure it can be done without any additional hardware.

The patent explains exactly why a modified command processor is required (duh).

BRiT · Feb 28, 2021

And here I thought GPU workloads descend from heavens...

iroboto · Feb 28, 2021

scently said:
Modifying/customizing the command processor is something MS has been doing since the Xbox One. The intent was to reduce waiting for the CPU to initiate tasks for the GPU, allowing the GPU to do more on its own. This was further expanded in the X1X. ExecuteIndirect is one of the results of this. @iroboto can speak more about it but essentially it allows the GPU to initiate and manage drawcalls on its own without invocation from the CPU every time. I would assume that this ability and more have been expanded on XSX and DX12U.

As per Oleg, he is correct, the need for a customizing the command processor is not a requirement to support executeIndirect.
However, there are some conditions on things that executeIndirect can do, and such there are times in which if you are requiring the GPU to be in another PSO than the current one, you will suffer some penalty to switch PSO during a indirect call (though Xbox One command processors don't).

Final answer here:
https://stackoverflow.com/a/38130181

Indeed, everyone always likes to link back to Whilidal's presentation lol.
I largely suspect from interviews that in 1X they customized it a bit further for better performance and support, and I can only assume a little further for series consoles.

presentation link:
Optimizing the Graphics Pipeline with Compute, GDC 2016 (slideshare.net)
Page 90 or so

GPU work creation

Ronaldo8

scently

OlegSH

Ronaldo8

BRiT

(>• •)>⌐■-■ (⌐■-■)

iroboto

Daft Funk

Similar threads