Just to add a bit to Hyp-X's excellent posts (I'd tried a couple times to cover the same ground he did, but couldn't quite manage to come up with something concise, correct, and understandable to someone with no prior knowledge of aliasing/signal theory):
SSAA is a brute-force technique for spatial antialiasing. It works by oversampling color data in screen space, and then blending and downsampling the results. The upside is that it addresses all aspects of spatial aliasing, as opposed to MSAA (which just handles edge aliasing) and AF (which just handles aliasing in color textures, and then only depending on the degree of anisotropy (i.e. the angle to the screen)). As andypski pointed out, this is superior to MSAA + AF because it antialiases the results of pixel shaders, whether programmable or fixed-function techniques like bump-mapping. As some others have pointed out, it's also superior to MSAA + AF because it allows for greater detail in all textures, not just those at high degrees of anisotropy. Finally, SSAA is superior to MSAA + AF in terms of generating proper color values at edges, because the color samples are calculated to sub-pixel precision, rather than taken from the center of the pixel (centroid sampling is a technique to avoid many of the artifacts that can result; but if not used carefully it can result in artifacts of its own).
The downside of SSAA is that it is completely brute-force, and thus hideously expensive. SSAA oversamples all parts of the image equally, with no regard for whether they are the parts more or less likely to contribute to aliasing, or, in the case of textures at high degrees of anisotropy, to be extremely blurry because using a more detailed mipmap would result in aliasing. That's generally a pretty crummy way to go about things, and the result is that, for current practical purposes, you're unlikely to get more than 2x or maybe 4x SSAA and still maintain playable framerates. Meanwhile, as Dio pointed out, it would take 256x SSAA to allow the same LOD improvement that 16x AF allows for textures at extreme degrees of anisotropy.
Of course 256x SSAA would allow all sorts of other improvements to the entire image. But in a sense that's exactly the point: 16x AF doesn't change anything except for textures at an angle of >86.42 deg. to the screen (someone correct me if I did the math wrong; I tried to figure it out myself). But it also doesn't incur any extra workload except for those few textures at such a high degree of anisotropy; that's why the performance impact moving from 8x to 16x AF is usually only on the order of ~2-3%. Meanwhile--for those particular textures--the improvement allowed by going from 8x to 16x AF is definitely noticeable. Whereas, going from 64x to 256x SSAA will likely not result in a noticeable IQ improvement for any parts of the screen except those aforementioned surfaces at an angle of >86.42 deg. to the screen; but it will definitely result in a performance hit of ~75%. SSAA is doing a whole lot of work to the entire image, but only a very small portion of that work will actually result in a visible improvement.
Techniques like MSAA and AF aim to achieve most of the visible improvements of SSAA with a much lower cost. MSAA is a clever variation on SSAA that eliminates the fillrate cost, retains the memory footprint cost, allows the bandwidth cost to be efficiently addressed through color compression, and achieves nearly all the edge antialiasing (although none of the texture/shader antialiasing). Techniques like Matrox's fragment AA, or the Z3 technique from academia, explicitly only do antialiasing on identified edges, thus avoiding the workload costs of SSAA (and the memory footprint costs of MSAA).
AF is even more focused: it only addresses texture aliasing for textures at a high degree of anisotropy, and indeed adjusts its workload so that it doesn't do any more than necessary to avoid aliasing at each particular angle, while bringing the detail level for anisotropic textures up to the same level as for isotropic textures (those parallel to the screen), which is to say very roughly on the same level as the detail provided for by the display resolution. The most efficient way to do this is to oversample the texture only in the direction of anisotropy; which is exactly what AF does. The oversampling is done in texture space, not screen space, and by limiting it to one direction and only those portions of the screen where it is actually needed, AF achieves a very important antialiasing function at a very low cost.
So hopefully we've now gotten at all the questions in the original post. AF is indeed a form of antialiasing that only applies to textures: indeed, it only applies to textures depending on their degree of anisotropy. SSAA + AF is better than MSAA + AF because SSAA allows for antialiasing of shader results and improves the LOD for all textures, not just anisotropic ones. SSAA + AF is better than plain SSAA because the levels of SSAA required to achieve good results on the anisotropic blurriness issue are ridiculously high and thus extremely impractical.
And, as Hyp-X hinted at, the solution to the fact that MSAA + AF does not antialias shader results (just as more and more pixels are the result of shaders) is not to switch back to SSAA, but instead to use other, less costly, targeted techniques that can antialias shader results only. In general this means building analytical high-pass filters into the pixel shaders themselves. Ideal results are not always possible, but this is the way it's been done in offline renderers for quite some time, and a number of techniques exist to get decent results at a reasonable cost in workload.