Pocketmoon_ has done the best investigation into this so far.
article
thread
His article clearly shows that going from FP32 to FP16 to int12 usually has a very large performance benefit
when using the proprietary NV_fragment_program extensions for OpenGL. The standard ARB_fragment_program OpenGL extensions do not allow precision hints at all. DX9's PS 2.0
does allow for a partial precision hint (_pp), but both the data you point to and all other data on the subject shows that
with current drivers it makes absolutely no performance difference. (Incidentally, pocketmoon_ didn't try to compile for _pp PS 2.0, presumably either because Cg does not currently allow it or because we had already established it makes no difference.)
So, the NV_fragment_program data demonstrates clearly that the NV30 architecture really is capable of executing FP16 faster than FP32. Presumably this functionality will eventually be available to PS 2.0, but it isn't with current drivers. One conspiracy theory as to why this is the case is that the current drivers force FP16 in
all cases; in other words, _pp "works", but standard PS 2.0--which calls for a minimum of FP24 for certain operations--does not.
Pocketmoon_'s results may actually be seen as preliminary evidence in favor of this theory, as the PS 2.0 path generally performs between FP32 NV_fragment_program and FP16 NV_fragment_program results (when they might be expected to do worse than NV_fragment_program, due to the latter shader language being "closer to the metal" of the actual NV30 architecture). But, AFAIK, there hasn't been any actual investigation as to what precision is being used by, e.g., examining output from a mandelbrot generating shader (which should be highly sensitive to changes in calculation precision). It bears mentioning that any WHQL drivers would presumably have to enable FP32 as the default fpr PS 2.0.