I remember reading somewhere that bilinear texture filtering is effectively "free" on modern GPUs. Is that basically correct? For example, I'm working with a scalar texture (L16) in HLSL and want to perform some specific weighting of the values along the x-axis but simple linear filtering in the y-axis.
Originally, I went with point sampling to gather 4 texels in row1 and 4 texels in row2, weighted them, then linearly interpolated between the two. This meant 8 tex2d() calls and a bunch of math. When I tried to expand the filtering to 8 texels in each row, the 16 tex2D() calls + one tex1d() color lookup call made fxc choke with a "texture dependency chain is too complex" error when compiling for SM2.0.
I wasn't happy with all the tex2D() calls and the fact that the 8X sampling would only work on SM3+ so I decided to turn on bilinear filtering on the texture, force the x-axis coord to be on the texel center, but leave the y-axis coord unmodified so that the bilinear filtering in the GPU would do it for me. This cut out quite a few instructions in the pixel shader along with half the tex2D calls.
Obviously, I would have to write a short test app to determine which approach offers the best performance. However, I have to think that a highly optimized GPU function (bilinear filtering) has to be better than all those extra arithmetic and point-sampled tex2D() instructions.
Any thoughts?
Thanks,
Mike
Originally, I went with point sampling to gather 4 texels in row1 and 4 texels in row2, weighted them, then linearly interpolated between the two. This meant 8 tex2d() calls and a bunch of math. When I tried to expand the filtering to 8 texels in each row, the 16 tex2D() calls + one tex1d() color lookup call made fxc choke with a "texture dependency chain is too complex" error when compiling for SM2.0.
I wasn't happy with all the tex2D() calls and the fact that the 8X sampling would only work on SM3+ so I decided to turn on bilinear filtering on the texture, force the x-axis coord to be on the texel center, but leave the y-axis coord unmodified so that the bilinear filtering in the GPU would do it for me. This cut out quite a few instructions in the pixel shader along with half the tex2D calls.
Obviously, I would have to write a short test app to determine which approach offers the best performance. However, I have to think that a highly optimized GPU function (bilinear filtering) has to be better than all those extra arithmetic and point-sampled tex2D() instructions.
Any thoughts?
Thanks,
Mike