Most today's 3d chips can take 4 texture samples per TMU,
reaching "full fillrate" at bilinear filtering, but if doing trilinear,
they have either to use 2 TMU's or 2 clock cycles,
and if doing anisotrophic, use many clock cycles depending on the situation.
so, thinking about the performance effects on number of TMUs and number of texture samples..
1. if we add another TMU(even by "halving 8-sample-TMU into 2 4-sample TMU's" like GF2 did) , to take advantage of it, wouldn't our pixel shader also need to have enough power to reasonable handle the extra input,
aren't pixel shaders's function units (usually?) optimized/limited for the number of texture inputs we may have?
(at least parhelia has one "identical" pixel shader pipeline stage for all TMUs)
2. just adding the number of texture samples one "TMU" can read per clock cycles would increase the traffic requirements between the TMU's and texture cache dramatically, and also increase the size of the TMU quite lot, as it would have to do lot of more work..
am I correct on these?
reaching "full fillrate" at bilinear filtering, but if doing trilinear,
they have either to use 2 TMU's or 2 clock cycles,
and if doing anisotrophic, use many clock cycles depending on the situation.
so, thinking about the performance effects on number of TMUs and number of texture samples..
1. if we add another TMU(even by "halving 8-sample-TMU into 2 4-sample TMU's" like GF2 did) , to take advantage of it, wouldn't our pixel shader also need to have enough power to reasonable handle the extra input,
aren't pixel shaders's function units (usually?) optimized/limited for the number of texture inputs we may have?
(at least parhelia has one "identical" pixel shader pipeline stage for all TMUs)
2. just adding the number of texture samples one "TMU" can read per clock cycles would increase the traffic requirements between the TMU's and texture cache dramatically, and also increase the size of the TMU quite lot, as it would have to do lot of more work..
am I correct on these?