If NVidia so desperately wanted a different shader - why not lobby FM for a different style of coding? More likely they did and it was too questionably un-balanced and/or proprietary so FM rejected it.
I think it's not as simple as that. NVIDIA designed a GPU that simply cannot be used optimally with DX9-shaders. Namely, the FX has separate float and int units, which can be used in parallel. But DX9 does not support it, you either use full float (ps2.0+) or full int (1.0-1.3, not 1.4 because FX has to emulate that on the float units (in 32 bit mode, 16 bit does not have enough precision), since it seems the int-pipeline is more or less literally copy-paste work from the GF4-design).
Personally I would say that this actually means the card is not a DX9-card. "Real" DX9-cards, such as the R3x0, Volari and DeltaChrome have only one pipeline, which handles floats only, and int-shaders are simply emulated by these float-shaders. That's why they use 24 bit precision. The 16 bit mode of NVIDIA makes no sense, since it's not precise enough for int-emulation...
Sure, NVIDIA's dedicated int-pipeline is really fast, but they have only 4 of them on the card. 8 float pipelines are about as fast anyway, if not faster. Besides, the future is not int, it's float. So NVIDIA really took a wrong decision in the design. They chose to bolt-on a float-unit to basically a GF4-design... It's great for simple int-shaders, but it dies when you use too many floats. All other manufacturers did the opposite... A bit less int-performance, and full float-performance, this was also the intention of the DX9-standard I suppose. And since NVIDIA took so long to release the FX, the future is now... There already ARE full float-shaded games.
Anyway, to get to the point... Optimal shaders for the FX are a mix of int and float operations, and these simply cannot be expressed in DX9. So the shaders they replace the DX9-shaders with, are probably impossible to integrate into the original application anyway. They basically HAVE to do shader-replacement, the DX9-shaders don't suit the FX (or the other way around?). You can only do this in OpenGL, with their custom extensions. Doom3 also does this... Sadly, the performance still is not too great, and the quality is less aswell.