Inside Geforce FX's fragment shader !!?

Nvidia stated they have given pixel shaders of the NV30 abilities above and beyond the vertex shaders of NV2X architectures. If you view the following:

http://developer.nvidia.com/docs/IO/1310/ATT/AUserProgrammableVertexEngine.pdf

and scroll to the section of hardware implementation, you will notice the organization of the NV2X architecture's geometry engine. Anyone surmise the fragment shaders of the NV30 are arranged in a similar/different manner? What is the possiblity of the NV30 having these 5 units per shader pipe. Per cycle shader efficiency with 32-bit floats would be similar to the R300 if true. Do any of you think the pixel shader could simoultaneously thread a scalar op with a 4 component vector op, as in the NV2X vertex architecture? Would there be any peformance cons for this type of configuration?

Any sort of knowledge/speculation/expertise would be appreciated.

Thankyou.
 
I know it seems to be a rather irrelevant thread, but we should not observe AA/Aniso, alongside other user friendly features, as the only relevant measuring sticks for vpu's. The day will come when we dissect these visual processing creatures and evaluate their processing shcemes, units, implementations; something like reviews introducing Intel/Amd cores. Aside from admiring branch prediction, caches, memory management, possible applications, etc. the alu processing architecture is closely examined . I wish reviewers/enthusiasts would get into this a little more (although it is hard to scrape up information without a Beyond 3D interview).
 
That's interesting. Given that it says in nVidia's docs that they have merged the functionality of the FP and VP units (fragment processor and vertex processor...I think those names are better-suited, so I'll try to use them...but anyway...), so they may indeed have copied large sections of the NV2x vertex processor over to the NV30's fragment processor. At the same time, they may well have dropped pieces that were deemed unecessary. After all, we're talking about a move from two units in the NV25 to eight units in the NV30.
 
Similarities, such as conditionals and other instructions were also present in the the vertex shader of old. Seems to be the same in the NV30 fragment processor. Nvidia has just added support for partial derivitives and some other things. What I question is whether or not the units are organized the same way. I remember kirk stating there were 32 total units in the pixel processor of the NV30 . With 8 arrays of these (8 "virtual" pipelines), we are looking at 4 units per pipeline and not 5. Where did Nvidia add the funcionality of the "special function unit" (noted in the above document, in reference to the NV2X vertex shader) which deals direclty with complex functions. Maybe they went with units similar to the NV30's vertex processing elements, which seem to be general enough to take care of all these things without special function support. I mean the fragment units are flexible enought to predicate, pack/unpack, and run more than 1000 instructions per pass. They do seem a little more capable than yesteryear's vertex shader.
 
Anyone else care to comment? It seems to me now that Nvidia may have gone with 4 more general-purpose fpu's for its fragment pipes, as in the Vertex shader. With this approach, it could carry out lighting calcs which do not require high precision alongside other functions, which do. This would seem a more flexible arrangement, although it maybe more inifficient in executing vector and scalar ops in parallel.
 
Back
Top