What do you mean by this ? Open source driver developers actually DID implement task shaders WITHOUT firmware support ...
Now, it's definitely possible that I've read this wrong. So, don't take my writing as something I've completely understood. I read, but my knowledge in this area is extremely limited. But my comprehension of his sentence is of the following:
In the paragraph just before the one you quoted Timur discusses how Task Shaders actually work on AMD HW.
The task+mesh dispatch packets are different from a regular compute dispatch.
The compute and graphics queue firmwares work together in parallel:
- Compute queue launches up to as many task workgroups as it has space available in the ring buffer.
- Graphics queue waits until a task workgroup is finished and can launch mesh shader workgroups immediately. Execution of mesh dispatches from a finished task workgroup can therefore overlap with other task workgroups.
- When a mesh dispatch from the a task workgroup is finished, its slot in the ring buffer can be reused and a new task workgroup can be launched.
- When the ring buffer is full, the compute queue waits until a mesh dispatch is finished, before launching the next task workgroup.
Then he continues on to discuss the difficulty of the implementation
Side note, getting some implementation details wrong can easily cause a deadlock on the GPU. It is great fun to debug these.
The relevant details here are that most of the hard work is implemented in the firmware (good news, because that means I don’t have to implement it), and that task shaders are executed on an async compute queue and that the driver now has to submit compute and graphics work in parallel.
So my understanding of his writing here is that if the API is able to submit the task shaders on the async queue, and the driver (which Timur is developing) then submits both compute and graphics work in parallel, and leaves it to the firmware to manage the 2 queues working in parallel as quoted above. The latter being the hard part of the implementation.
He is then explicit in the following:
Keep in mind that the API hides this detail and pretends that the mesh shading pipeline is just another graphics pipeline that the application can submit to a graphics queue. So, once again we have a mismatch between the API programming model and what the HW actually does.
So with respect, my perspective is that if there is actual firmware required to do some of this work, if there is something on the firmware side that is allowing these queues to work together without stalling out the GPU, I think there is something there from a hardware perspective that older generations do not have. Otherwise we could have back ported amplification shaders to 5700XT for instance.
Task shaders have only one job and that's to generate an indirect draw buffer for consumption by mesh shaders but you don't need the firmware for this.
Now once again, this could be an incorrect understanding of mine, I'm not here to challenge you on the difference in our understanding of graphics rendering, I clearly know very little compared to yourself, based on how you write. I suspect you work in the mobile space at the very least, or PC indie scene. But Timur writes:
Squeezing a hidden compute pipeline in your graphics
In order to use this beautiful scheme provided by the firmware, the driver needs to do two things:
- Create a compute pipeline from the task shader.
- Submit the task shader work on the asyc compute queue while at the same time also submit the mesh and pixel shader work on the graphics queue.
At least from my perspective, combined with the highlights above, without the firmware for amplification shader support, I don't think there is a way for a developer to emulate a task shader & mesh shader combo without explicitly calling an API whose function is for a task shader. I don't disagree that there are other ways to do this however. I'm just saying, we haven't seen it officially leveraged, but IIRC, Remedy indicated that it would be on their next release and that they found amplification shaders to be useful.
It will take some time to move the entire geometry pipeline. I'm not expecting much until the end of the generation.