It has been confirmed to me by both MS and NVIDIA that except for constant registers, precision must be FP24 (constant register precision can be FP16) unless the _PP flag is set.
While I haven't seen the complete spec, Amar Patel (the guy at MS who does this stuff) posted to DirectXDev mailing list the relevant portion, which I cut and pasted below (yes I'm geeky enough to have the entire DirectXDev mailing list archive).
Basically the entire trouble was caused by me
I didn't agree with the presentation NVIDIA gave at Dusk To Dawn, so I brought it up and MS agreed with me. NVIDIA argued but MS stuck to there guns (much to NVIDIA annoyance), this was confirmed by a NVIDIA DevRel to me a couple of weeks ago (I won't post the email as it private). The matter is officially closed, PS_2_0 is FP24 minimum without the partial precision flag, the SDK update (coming soon) will have an optional FORCE_PARTIAL_PRECISION flag that can be passed to the HLSL compiler to force all floats into halfs but thats a developer's choice. If you use HLSL half or _pp you will get the FP16 paths on GFFX (but no difference on ATI R3x0 cards).
A quote from Amar Patal from MS
"Here's a cut&paste from our spec, with the one typo in it corrected
(noted with **).
[from ps_2_0 section]
---Begin Paste---
Internal Precision
- All hardware that support PS2.0 needs to set
D3DPTEXTURECAPS_TEXREPEATNOTSCALEDBYSIZE.
- MaxTextureRepeat is required to be at least (-128, +128).
- Implementations vary precision automatically based on precision of
inputs to a given op for optimal performance.
- For ps_2_0 compliance, the minimum level of internal precision for
temporary registers (r#) is s16e7** (this was incorrectly s10e5 in spec)
- The minimum internal precision level for constants (c#) is s10e5.
- The minimum internal precision level for input texture coordinates
(t#) is s16e7.
- Diffuse and specular (v#) are only required to support [0-1] range,
and high-precision is not required. ---End Paste ---
For ps_3_0 the requirements are the same, however interpolated input
registers are now defined by semantic names. Inputs here behave like t#
registers in ps_2_0: they default to s16e7 unless _pp is specified
(s10e5).
Note that specifying _pp on an input register only affects how they are
read into temp registers or what precision ALU math might run on an op
reading an input as a parameter. However texld* instructions that take
in unmodified texture coordinates will not be affected by the _pp
modifier, as the texture coordinate iterators are of fixed precision.
amar
"