GPUOpen: All the Pipelines - Journey through the GPU

D3D12 Logical Pipeline: VS (Vertex Shader) -> HS (Hull Shader) -> DS (Domain Shader) -> GS (Geometry Shader)

GFX6-8 Hardware Pipeline: LS (LDS? shader) -> HS (Hull Shader) -> ES (Export Shader) -> GS (Geometry Shader) -> VS (Vertex Shader)

Logical -> Hardware mapping

(VS only): VS -> VS
(Tess enabled): VS -> LS | HS -> HS | DS -> VS
(GS enabled): VS -> ES | GS -> GS
(Both enabled): VS-> LS | HS -> HS | DS -> ES | GS -> GS

GFX9 Hardware Pipeline: HS (merged LS & HS) -> GS (merged ES & GS) -> VS

Logical -> Hardware mapping

(VS only): VS -> VS
(Tess enabled): VS + HS -> HS | DS -> VS
(GS enabled): VS + GS -> GS
(Both enabled): VS + HS -> HS | DS + GS -> GS

GFX10 Hardware Pipeline: HS -> GS (merged ES, GS & VS)

Logical -> Hardware mapping

(VS only): VS -> GS
(Tess enabled): VS + HS -> HS | DS -> GS
(GS enabled): VS + GS -> GS
(Both enabled): VS + HS -> HS | DS + GS -> GS

D3D12 Mesh Shading Pipeline: AS (Amplification Shader) -> MS (Mesh Shader)

Mesh Pipeline -> GFX10 Hardware mapping: AS -> CS (Compute Shader) | MS -> GS

Hopefully this clears things up a bit. Now for some interesting notes:

Despite our mental model of the vertex shader being executed at the very start of our logical pipeline, the vertex program itself can executed on 3 different HW stages on older AMD HW. I imagine the reason for this HW pipeline design where the HW's vertex shader is located at the end of the geometry pipeline is so the HW can avoid the additional overhead from tessellation and geometry shader stages in certain cases.

As we can identify in the comments from AMD's code, amplification shaders don't map onto anything in their graphics pipeline and mesh shaders are executed on GFX10's NGG geometry shaders.

Sources and code:
https://cgit.freedesktop.org/mesa/m...68957a82562d13b3f0d21a04ce633ffd236e6036#n165
https://github.com/GPUOpen-Drivers/pal/blob/dev/src/core/hw/gfxip/gfx6/gfx6GraphicsPipeline.cpp#L536
https://github.com/GPUOpen-Drivers/pal/blob/dev/src/core/hw/gfxip/gfx9/gfx9GraphicsPipeline.cpp#L569
https://github.com/GPUOpen-Drivers/.../gfxip/gfx9/gfx9IndirectCmdGenerator.cpp#L174
 
Last edited:
Despite our mental model of the vertex shader being executed at the very start of our logical pipeline, the vertex program itself can executed on 3 different HW stages on older AMD HW. I imagine the reason for this HW pipeline design where the HW's vertex shader is located at the end of the geometry pipeline is so the HW can avoid the additional overhead from tessellation and geometry shader stages in certain cases.
Also, the hardware VS (Primitive Shader in newer hardware) is what feeds the rasterizer. So it goes last, providing consistency in naming from a hardware perspective.
 
Also, the hardware VS (Primitive Shader in newer hardware) is what feeds the rasterizer. So it goes last, providing consistency in naming from a hardware perspective.

Which provides more context into this slide:

arch-20.jpg


GFX9: HS (surface shaders ?) -> GS (geometry shaders extended with per-vertex shading) -> VS (primitive shader)

GFX10: HS -> GS or CS (compute shader) -> GS (primitive shader)

What's unique to the NGG pipeline in comparison to the standard mesh shading pipeline is that they can exploit the fixed function tessellator hardware since it's output is used as an input for GFX9/10 geometry shaders. Did GFX9 have the capability to do per-meshlet shading or was that extended with GFX10 NGG GS ? When I compare GFX9 registers to the GFX10 registers, I see that GFX10 has added new registers with a GE prefix and are those related to the Geometry Engine by some chance ?

It would be much appreciated if you and the guys at AMD started releasing documentation about the hardware registers again!
 
Last edited:
Which provides more context into this slide:

arch-20.jpg


GFX9: HS (surface shaders ?) -> GS (geometry shaders extended with per-vertex shading) -> VS (primitive shader)

GFX10: HS -> GS or CS (compute shader) -> GS (primitive shader)

What's unique to the NGG pipeline in comparison to the standard mesh shading pipeline is that they can exploit the fixed function tessellator hardware since it's output is used as an input for GFX9/10 geometry shaders Did GFX9 have the capability to do per-meshlet shading or was that extended with GFX10 NGG GS ? When I compare GFX9 registers to the GFX10 registers, I see that GFX10 has added new registers with a GE prefix and are those related to the Geometry Engine by some chance ?

It would be much appreciated if you and the guys at AMD started releasing documentation about the hardware registers again!
The GE prefix came about because some things were moved around and it was decided to group multiple blocks under the Geometry Engine register naming. They were already developed by the same team.

GFX9 could do per-meshlet shading, but issues led to support being dropped in favor of focusing resources on GFX10. GFX10's implementation is different.
 
Back
Top