Tridam said:It doesn't make sense :?
You mean that if a FP32 texture read needs 3-4 cycles, you can't do math during these cycles ???
Yes, and this get even more worse if you need AF with many samples. The reason for this is that the TMU is in full sync with the ALU/FPU. I am not sure if this is a nVidia design or if the used the old rampage design.
The NV3X have the same problem but to make it more complex only if you use PS >= 2.0 (maybe 1.4). If you use 1.1 shaders you can hide the read cycles with math.
Tridam said:FP32 texture reads could be cut into 3-4 single component texture reads.
Yes but only if you don't need the values for the next math instructions.