Nyquist theorem
I wouldn't say that the differences are because bilinear is very bad, but rather because a sinc is very good, and Nyquist theorem is very theoretical.
Nyquist theorem talks about the relation between a time continuous signal and a sampled version of it (time discrete), both of them infinite in length. So if you want to reconstruct the sin(x) in Nick's image, you'd need more samples of it. It doesn't have to be sampled with a higher frequency, but can instead be sampled for a longer time. If you do that, you'll se that the signal got "beatings", places where samples are in phase and show a high amplitude, and other places where the samples are out of phase and just show the mean value.
Now it's time to use the mathemagical function "sinc", which is defined as sinc(x)=sin(x)/x (or sometimes sin(pi*x)/(pi*x)). Note that the envelope of the "tails" of that function is abs(1/x). The integrale of abs(1/x) from N to inf is infinite, and that hints that when reconstructing with a sinc, it can "collect information" from samples very far away from the point you're reconstructing. And in Nick's example (but sampled longer time), it will collect information from the areas where the sin(x) is more visible, and it will be able to reconstruct the sin(x) perfectly.
So as long as the frequencies in the signal is below the Nyquist frequency, it can be reconstructed, including the phase.
But that's all very theoretical, since practically we can't use a filter with infinite extent. And even if we had the computing power, would we want to? - I don't think so.
Nyquist theorem requires the input signal to be perfectly bandwidth limited before the sampling, and that's not the case with our textures. They most likely have some frequency components above the Nyquist frequency, so a sinc won't do the reconstruction anyway.
And what is there that say that reconstructing all frequency components up to a certain frequency perfectly, and cutting out the frequencies above completely, produces a visually pleasant image? If you've seen how a perfectly low pass filtered square wave looks (lots of ringing at the edges), you'd probably agree that a simpler filter could give it a much nicer look.
So Nyquist theorem, and the idea to get all frequencies up to a point and nothing above, is theoretically possible and a nice thumb rule for what you want. But it's not neccesarily the optimal solution, even when just optimizing for quality.
What about the phasal alignment problems (that darkblu talks about)?
It's of course very true for filters with smaller extent than a sinc(x), like bilinear. And then the out-of-phase areas will be much like a low pass filtered version of the texture. Or in other words, approximately like next mipmap but without the gamma correction when downsampling.
One solution is to convert to linear form before the bilinear filter, keep it in linear form through the PS, and gamma convert when it's stored in frame buffer (like Xmas said). The floating point in PS is plenty good to keep the precision until it's stored in the frame buffer. (I think you were a bit to negative Nick. It's of course nice to have a high precision frame buffer for fog/dust/smoke/HDR, but that's not directly related to the problem we're talking about now.)
Or you could introduce compact floating point textures, like a R9G9B9E5 format where E5 refers to a 5 bit block exponent. That way you could keep textures in linear format, and even get some HDR in there too.
What mip level is actually used?
I made a test with Xmas Texture Filter TestApp. (The original of the various AF testers floating around?) My R300 doesn't just go below 2x2 pixels per texel, it even goes slightly below 1x1 pixel per texel in trilinear. In bilinear filtering it of course vary over the mip band, and I've seen it as low as 0.7x0.7 pixels per texel.
I also tested how much lod bias I needed to get rid of all visible aliasing.
Gamma set to 2.2 (framebuffer in linear format). Texture #0 for maximal torture. 16xAF.
With negative LOD bias there's some massive moiré, most of those patterns is gone at bias 0.0, but there's some left to around 0.2. After that, there's some moiré left that won't disappear until LOD bias 1.3 though (espacially if you rotate the texture).
Doh! Another essay. Sorry 'bout that.