The way I think of it is different (you know about this, I think, Mufu):
I think what we see here is the way the vertex processing pipeline works more completely transferred to the fragment processing pipeline.
Namely, I view both pipelines as architectured as one "uber" scheduling unit attached to a floating point unit (all the branching and special "2.0+" functionality) + one processing unit (set?) with a much narrower range of simple calculation functionality attached to a simpler processing handler (register combiner). The difference was that the vertex processing pipeline had the second unit with fp32 precision processing, and the fragment processing pipeline was limited to fx12 (for the NV30).
I also thought there were 2 tex op units but that the "uber" unit was tied up when using them for anything other than fixed texture fetching (in NV3x, limited to fragment processing usage).
I then thought that this design of the NV3x facilitated that the NV40 would achieve effective symmetry between the vertex and fragment processing pipelines, and then possibly remove redundancy by being able to use the resources for either dynamically, lending itself easily to a unified shading model.
What it seems like now is that the NV35 took the first step in this direction ...the mystery is not how it did this, but how the NV30 failed to do it with its transistor budget, as it is only the NV30 transistor count and restricted capabilities that hid this possibility AFAICS. Given the ability to achieve this in the NV35, it opens up the possibilities for NV40 again...
Simply allowing the "tex op" resources to be used by the vertex programming pipelines would move a great deal in this direction, wouldn't it?
What would be needed for a primitive processor...some sort of expanded "tex op"-alike unit treating vertices in a texture-like fashion? What else?
Anyways, unless we see functionality or peak ("low" precision) performance dropped in the NV35 in relation to the NV30, it seems, IMO, that the NV30 holds the record for the most wasteful chip design released, and that people who bought into the NV3x hype before the NV35 are getting burned in a major way
. It has been clear that the worst of nVidia has been evident in full force in the handling of the NV3x, but atleast this indicates that engineering competitiveness is no longer absent
.