GPUOpen: All the Pipelines - Journey through the GPU

CarstenS · Jan 22, 2021

A very informative video from GPUOpen by AMDs Lou I. Kramer.
https://gpuopen.com/videos/graphics-pipeline/

Well worth the 55 Minutes of time, when you're at beginner or intermediate level B3D forum warrior, I'd say.

Rodéric · Jan 22, 2021

The hardware blocks to software blocks is a nice addition I've never seen before AFAIR.

Lurkmass · Feb 16, 2021

D3D12 Logical Pipeline: VS (Vertex Shader) -> HS (Hull Shader) -> DS (Domain Shader) -> GS (Geometry Shader)

GFX6-8 Hardware Pipeline: LS (LDS? shader) -> HS (Hull Shader) -> ES (Export Shader) -> GS (Geometry Shader) -> VS (Vertex Shader)

Logical -> Hardware mapping

(VS only): VS -> VS
(Tess enabled): VS -> LS | HS -> HS | DS -> VS
(GS enabled): VS -> ES | GS -> GS
(Both enabled): VS-> LS | HS -> HS | DS -> ES | GS -> GS

GFX9 Hardware Pipeline: HS (merged LS & HS) -> GS (merged ES & GS) -> VS

Logical -> Hardware mapping

(VS only): VS -> VS
(Tess enabled): VS + HS -> HS | DS -> VS
(GS enabled): VS + GS -> GS
(Both enabled): VS + HS -> HS | DS + GS -> GS

GFX10 Hardware Pipeline: HS -> GS (merged ES, GS & VS)

Logical -> Hardware mapping

(VS only): VS -> GS
(Tess enabled): VS + HS -> HS | DS -> GS
(GS enabled): VS + GS -> GS
(Both enabled): VS + HS -> HS | DS + GS -> GS

D3D12 Mesh Shading Pipeline: AS (Amplification Shader) -> MS (Mesh Shader)

Mesh Pipeline -> GFX10 Hardware mapping: AS -> CS (Compute Shader) | MS -> GS

Hopefully this clears things up a bit. Now for some interesting notes:

Despite our mental model of the vertex shader being executed at the very start of our logical pipeline, the vertex program itself can executed on 3 different HW stages on older AMD HW. I imagine the reason for this HW pipeline design where the HW's vertex shader is located at the end of the geometry pipeline is so the HW can avoid the additional overhead from tessellation and geometry shader stages in certain cases.

As we can identify in the comments from AMD's code, amplification shaders don't map onto anything in their graphics pipeline and mesh shaders are executed on GFX10's NGG geometry shaders.

Sources and code:
https://cgit.freedesktop.org/mesa/m...68957a82562d13b3f0d21a04ce633ffd236e6036#n165
https://github.com/GPUOpen-Drivers/pal/blob/dev/src/core/hw/gfxip/gfx6/gfx6GraphicsPipeline.cpp#L536
https://github.com/GPUOpen-Drivers/pal/blob/dev/src/core/hw/gfxip/gfx9/gfx9GraphicsPipeline.cpp#L569
https://github.com/GPUOpen-Drivers/.../gfxip/gfx9/gfx9IndirectCmdGenerator.cpp#L174

3dcgi · Feb 16, 2021

Lurkmass said:
Despite our mental model of the vertex shader being executed at the very start of our logical pipeline, the vertex program itself can executed on 3 different HW stages on older AMD HW. I imagine the reason for this HW pipeline design where the HW's vertex shader is located at the end of the geometry pipeline is so the HW can avoid the additional overhead from tessellation and geometry shader stages in certain cases.

Also, the hardware VS (Primitive Shader in newer hardware) is what feeds the rasterizer. So it goes last, providing consistency in naming from a hardware perspective.

Lurkmass · Feb 17, 2021

3dcgi said:
Also, the hardware VS (Primitive Shader in newer hardware) is what feeds the rasterizer. So it goes last, providing consistency in naming from a hardware perspective.

Which provides more context into this slide:

GFX9: HS (surface shaders ?) -> GS (geometry shaders extended with per-vertex shading) -> VS (primitive shader)

GFX10: HS -> GS or CS (compute shader) -> GS (primitive shader)

What's unique to the NGG pipeline in comparison to the standard mesh shading pipeline is that they can exploit the fixed function tessellator hardware since it's output is used as an input for GFX9/10 geometry shaders. Did GFX9 have the capability to do per-meshlet shading or was that extended with GFX10 NGG GS ? When I compare GFX9 registers to the GFX10 registers, I see that GFX10 has added new registers with a GE prefix and are those related to the Geometry Engine by some chance ?

It would be much appreciated if you and the guys at AMD started releasing documentation about the hardware registers again!

3dcgi · Feb 17, 2021

Lurkmass said:
Which provides more context into this slide:

GFX9: HS (surface shaders ?) -> GS (geometry shaders extended with per-vertex shading) -> VS (primitive shader)

GFX10: HS -> GS or CS (compute shader) -> GS (primitive shader)

What's unique to the NGG pipeline in comparison to the standard mesh shading pipeline is that they can exploit the fixed function tessellator hardware since it's output is used as an input for GFX9/10 geometry shaders Did GFX9 have the capability to do per-meshlet shading or was that extended with GFX10 NGG GS ? When I compare GFX9 registers to the GFX10 registers, I see that GFX10 has added new registers with a GE prefix and are those related to the Geometry Engine by some chance ?

It would be much appreciated if you and the guys at AMD started releasing documentation about the hardware registers again!

The GE prefix came about because some things were moved around and it was decided to group multiple blocks under the Geometry Engine register naming. They were already developed by the same team.

GFX9 could do per-meshlet shading, but issues led to support being dropped in favor of focusing resources on GFX10. GFX10's implementation is different.

Lurkmass · Feb 18, 2021

Other bits ...

VRS registers for GFX10.3
PRT+ ? Is this related to D3D sampler feedbacks ?

GPUOpen: All the Pipelines - Journey through the GPU

CarstenS

Moderator

Rodéric

a.k.a. Ingenu

Lurkmass

3dcgi

Lurkmass

3dcgi

Lurkmass