darkblu said:
am i reaing you correct here that all conditions being equal a greater bandwidth would increase the performance hit? how so?
Because on most hardware and most workloads, regular trilinear filtering will benefit more from increased bandwidth than anotropic filtering. Imagine rendering a single textured object on a single ROP, single TMU architecture. Suppose you need 6 bytes per pixel for colour/z (RGBA8, 4:1 z-compression), and texture data is 1 byte/pix with trilinear and 4 bytes/pix w/ 2x aniso (I'm purposely handicapping the aniso).
Then with infinite bandwidth on tap, the hardware takes 1 cycle/pix for the trilinear case and 2 cycles/pix for aniso. You get 1 pix/clk for tri, 0.5 pix/clk for aniso. With 4 bytes of BW/clock, you get 0.57 pix/clk for tri, and 0.4 pix/clk for aniso. So the infinite BW scenario has a 50% drop for aniso, and the limited BW scenario has a 30% drop. This is what I was claiming.
Now, I suppose there are some math-heavy cases where a decoupled TMU wouldn't take more cycles per pixel, and thus BW per clock would increase. But because you're math heavy, BW shouldn't really matter then. On Xenos there would be a very small chance of a shader saturating BW while aniso is enabled but not with trilinear. On RSX the TMUs aren't decoupled so I think enabling aniso always increases the cycle count (I'm not positive, but
this data suggests so).
ok, how about an aniso sampler that produces the sampling coords in a single clock and passes them to multiple isotropic units - how would that not be more bandwidth per fragment?
Since you're using multiple isotropic units, the number of pixels using those isotropic units simultaneously is reduced. In your system, more BW per fragment, but fewer fragments per clock. It's exactly the same as one isotropic unit per pixel being used over multiple cycles.
Either way, it's just one bilinear sample per TMU per clock, regardless of filtering method. The only reason aniso could use more BW per clock is mipmap level (see below).
keep in mind that with anisotropy you may easily get much less texel reuse than in a properly-mipmapped isotropic case where texel reuse between two adjacent fragments is 25-50% at non-magnification, count in the rest of the neightbours and you get ~100% reuse (poly edges notwithstanding). which is far not the case with aniso.
If you read
my post, I did keep that in mind, as I said I was comparing to a trilinear surface viewed head on. I think it's stupid to compare anisotropic filtering to blurred trilinear. For trilinear filtering you can reduce bandwidth by applying an LOD bias for a lower mipmap, but you don't see devs doing that.
We know Xenos is fine with looking at a surface head on, and the detail in aniso is the same as the detail of a surface viewed head on. So I prefer to compare aniso to head on trilinear, i.e. same mipmap, since it is more "apples-to-apples" in my book.