number of texture samples per TMU vs number of TMU's

hkultala

Regular
Most today's 3d chips can take 4 texture samples per TMU,
reaching "full fillrate" at bilinear filtering, but if doing trilinear,
they have either to use 2 TMU's or 2 clock cycles,
and if doing anisotrophic, use many clock cycles depending on the situation.

so, thinking about the performance effects on number of TMUs and number of texture samples..

1. if we add another TMU(even by "halving 8-sample-TMU into 2 4-sample TMU's" like GF2 did) , to take advantage of it, wouldn't our pixel shader also need to have enough power to reasonable handle the extra input,
aren't pixel shaders's function units (usually?) optimized/limited for the number of texture inputs we may have?

(at least parhelia has one "identical" pixel shader pipeline stage for all TMUs)

2. just adding the number of texture samples one "TMU" can read per clock cycles would increase the traffic requirements between the TMU's and texture cache dramatically, and also increase the size of the TMU quite lot, as it would have to do lot of more work..

am I correct on these?
 
1. Not necessarily, depending on the input restrictions of the pixel shader. If the pixel shader can go MUL (final colour), (texture0), (texture1) then it can combine the results from 2 texture reads in one ALU operation, as well as reducing the cost of anisotropic and trilinear filtering.

Of course, it doesn't mean to say that it would be a particularly well-balanced architecture if it did this...

2. That's pretty much right.
 
Back
Top