Mintmaster
Veteran
Chalnoth said:Actually, I got it from these message boards. I'm reasonably certain it was OpenGLguy.
So, it isn't certain, but I find it rather likely (particularly given the rather small anisotropic performance hit).
I've heard you say this several times now, but I don't know how reliable that info is. It seems pretty bizarre to handle 4 bilinear texture samples per pipe per clock JUST for aniso, especially since many reviews (most notably those in magazines which are directed at the general public) don't look at aniso performance, and many buyers wouldn't care either. You would still need the blending arithmetic and extra read ports on the texture cache to blend all four samples, so why not add the address arithmetic to make it general 8x4, or even general 8x2 to keep the transistor count similar to 8x1x(4 aniso) ? Most multitexturing is simple blending anyway. What I'm saying is that I don't see ATI putting in so much effort and silicon into enhancing just anisotropic performance, especially since the case is often that just one sample suffices.
As far as I know, the technical documents say that it's still only one anisotropic sample per clock, per pipe. I think if aniso is done right, like in the 9700, there doesn't need to be a big performance hit. The best way to find out would be to write a program with one big angled polygon and make sure that the texture res. is high enough (i.e. less than 1-1 pixel to texel ratio) that every pixel needs aniso. I've seen such numbers from a similar program, and there is no evidence of 4 aniso samples per pipe per clock. They weren't a whole lot different from the 8500's numbers (in terms of performance hit), but they were a lot better than the GF4's.
If you're basing your argument on the GF3/GF4 aniso hit, there is definately something wrong in the efficiency of their algorithm. The 3DMark2001 fillrate test alone shows this. Remember, the 8500 also had a small hit, and the only thing wrong in their implementation was the lack of trilinear, in which case you could compare bilinear scores. The Z-axis issue would hardly affect performance.
Nazgul: the reason the performance hit with trilinear isn't too big is because for the base mip-map covers a lot of the screen image. Much of the time there is texture magnification, not minification, especially at higher resolutions. In this case, only one bilinear sample is needed as opposed to the two needed for trilinear. Moreover, the commonly found lightmaps are usually very low res, so they will rarely need two texture samples for trilinear. The same arguments often applies to aniso.
This also is one of the arguments in favour of 1 TMU per pipe as opposed to 2 - they are very versatile. The big question is, would you rather have texture units sitting idle in single texture sample situations (w/ 2+ TMU's per pipe) or have the pixel pipes idle in multiple texture sample situations (1 TMU per pipe)? As the old adage goes, "it's all about balance".