sergi.gonzalez
Newcomer
Hi all,
I'm trying to optimize my CSM implementation and I've got a question about pixel shader register constant indexing. In ShaderX5 CSM article, wolf shows how to make the shadowing pass of all splits (max. 4) at the same time indexing the proper light mvp in the pixel shader:
float4x4 lmvp[N]; // N is the number of splits
float zGreater = startShadows < CamDistance;
float mapToUse = dot(zGreater,1.0);
float4 shadowCoord = mul( lmvp[(int)(mapToUse-1)], positionLocalSpace);
The problem is that constant register indexing in pixel shader is not 1 assembler instruction, but 5 instructions per matrix row (4 cmp -used to copy the proper constant- and 1 dp4 -matrix row per vector mul-). This duplicates the number of arithmetic instructions of my pixel shader (from 19 to 38).
I've been thinking in an algorithm to reduce the number of instructions in the pixel shader:
1. Make N lmvp*positionLocalSpace in the vertex shader and put the results in N vertex shader output streams.
2. And later, to index the proper pixel shader input stream in the pixel shader.
My questions are:
1. Is there any way to reduce the number of arithmetic instructions in the pixel shader?
2. Is it possible to index pixel shader input streams?
Thanks in advance,
Sergi
I'm trying to optimize my CSM implementation and I've got a question about pixel shader register constant indexing. In ShaderX5 CSM article, wolf shows how to make the shadowing pass of all splits (max. 4) at the same time indexing the proper light mvp in the pixel shader:
float4x4 lmvp[N]; // N is the number of splits
float zGreater = startShadows < CamDistance;
float mapToUse = dot(zGreater,1.0);
float4 shadowCoord = mul( lmvp[(int)(mapToUse-1)], positionLocalSpace);
The problem is that constant register indexing in pixel shader is not 1 assembler instruction, but 5 instructions per matrix row (4 cmp -used to copy the proper constant- and 1 dp4 -matrix row per vector mul-). This duplicates the number of arithmetic instructions of my pixel shader (from 19 to 38).
I've been thinking in an algorithm to reduce the number of instructions in the pixel shader:
1. Make N lmvp*positionLocalSpace in the vertex shader and put the results in N vertex shader output streams.
2. And later, to index the proper pixel shader input stream in the pixel shader.
My questions are:
1. Is there any way to reduce the number of arithmetic instructions in the pixel shader?
2. Is it possible to index pixel shader input streams?
Thanks in advance,
Sergi