Now whatever you want to call it - effective fillrate, actual fillrate, bandwidth limited fillrate, real-world fillrate, whatever the **** you want to call it - the rate at which a GF4 Ti4600 outputs pixels drops 70% with MSAA
I think this is probably where the problems arise. In DX8 and DX9 titles, fillrate is/will be primarily limited by shaders (and therefore, clockspeed), rather than bandwidth. Bandwidth-limited fillrate is primarily an issue for DX7-era engines that the hardware can run at full speed.
I don't know what your math background is, but there is no way in hell you can calculate a derivative from a single value. You need at least neighbouring points to approximate the derivative. And even if NV30 by some miracle is smart enough to take neighbouring texture samples, who says we're limited to texture samples? A branching condition, if dynamic, will often be based on an mathematical expression, which can in turn involve texture inputs. Unless NV30 is given this expression, it can't calculate DDX/DDY. I think these functions are for the derivatives of the texture coordinates, but I'm not sure. I am 100% sure that you can't get the derivative of an arbitrary function from a single value.
I'm not sure what your background with hardware engineering is, but the hardware already knows neighboring values for expressions inside shaders - what do you think the other 3 pipes in a 4 pipe system (or 7 in an 8 pipe system) are calculating? This is precisely the recommended method for computing the MIP level (and line of anisotropy), extended to allow inputs from temporary registers.
Again, about my fillrate arguments:
All you've shown is that a GeForce 4, playing Quake III at 1600x1200, 32bpp (I assume) suffers a 70% performance hit with 4xMSAA.
However (hypothetically), what if ZCompression only works at resolutions less than 1600x1200? What if some option you've selected causes Z rejection to be disabled?
Unless you know exactly what the driver and hardware are doing behind the scenes, you can't derive any broad conclusions from your sample. What if 1280x1024 2xAA is significantly better optimized than 1600x1200 4x? The various execution units in the hardware interact in complex ways (which are completely undocumented to the public). Just because one arrangement of game settings causes a 70% drop in performance by enabling AA doesn't mean all will.