Could someone explain me AF in a more simple fashion? a screen comparison would be greatly apreciated.
Kristof could. You can also look back to B3D's GF FX or Radeon 9x00 reviews to see varying levels of AF in Serious Sam screenshots.
Here's how I understand it. (Note that my understanding may differ from truth, but I bang these replies out once in a while to save others from replying and, more [self-]importantly, to leave myself open to correction.
) It's to enhance texture clarity (read: reduce blur) on surfaces that aren't perpendicular (Kristof's ideal "Type 1") to your in-game viewpoint (most noticable on floors and walls that slope away from you b/c they tend to be regular/flat surfaces than use regular/repeating textures). AFAIK, it means more texture samples taken in one axis (isotropic means the same in all axes, hence anisotropic means not the same).
That image in
the Wikipedia article is about as clear as it gets. You can see that ground texture is noticably sharper with AF than without, and that's because AF takes more texture samples in the axis that needs it. Look at Kristof's pics and realize that in the x-axis, the runway texture is close to Type 1, whereas as you look further into the y-axis, the runway texture is Type 3. If you isotropically sample both axes--that is, take the same # of texture samples for both x and y--what's going to happen with the y axis is that the further away from the camera you get, the lower the ratio of texture pixels to screen pixels. At some point, you cross under the Nyquist limit and the textures begin to look blurry onscreen. You use AF to counter this, to raise the ratio of texel to pixel to get back up to the Nyquist limit.
If there's one thing that Mintmaster's banged into my head, it's that AF isn't "extremely bandwidth intensive," as the Wikipedia author states, but clock intensive--at least on consumer GPUs with finite #s of texture samplers. A Radeon X1900, for instance, can only give you 16 bilinear filtered samples per clock. 16x AF would require 16x as many samples, but you're not getting that in the same clock (which would be bandwidth intensive indeed: 16x moreso), but rather in 16x more clocks. So AF doesn't require more bandwidth per clock, just more clocks to achieve the desired samples. This time spent waiting for the extra AF samples can be offset by increasing pixel shader complexity, so the rest of the GPU is kept usefully busy in the meantime. Or, if you think of it another way, more shaders makes crunching math the bottleneck, making AF close to "free" on otherwise idle texture units.
And, yeah, realizing that 16x AF requires 16x more clocks helps you realize why ATI and NV are so big into "adaptive" AF implementations, to speed things up by not applying AF on every single texture when it's forced via the drivers (rather than specified per-texture by the game).