Luminescent
Veteran
In the analysis of nvidia's flop number, it was observed that the pixel pipeline of the NV30 is capable of executing 2 Fmads or 2 alu instructions per cycle. I was curious as to the precision of each fmad. Are the units working on full floats or half-float data to obtain this number (51 gflops)? Whether it is working on half-float data for full float data to achieve the stated flop number, is the NV30 possibly issuing two distinct instructions for the alu's or is it just one alu working on two groups of data? For example, let us say that the NV30 can only execute 2 Alu instructions on half-float data, for each of the 2 sets of half floats (16 per vector unit), would the pixel unit be able to issue a sin/cos instruction for 1 alu while issuing a dp4 instruction for the other?
We know that the R300 can execute both a scalar and a vector operation in one cyle (also a texture op). Although the vector unit in the pixel program processor is for just 3-components (RGB), there is also a scalar unit (I'm guessing for the A channel), which works in parallel with the vec3 processor. Would the R300 hold any advantage over the NV30 because it can execute a scalar and vector simultaneously, or would the NV30 be able to use a vector unit for scalar processing (math functions) and have the other vector unit for RGBA calculations (assuming it can execute 2 arbitrary half-float instructions per cycle in 64-bit mode)?
Thankyou if you can be of any help.
We know that the R300 can execute both a scalar and a vector operation in one cyle (also a texture op). Although the vector unit in the pixel program processor is for just 3-components (RGB), there is also a scalar unit (I'm guessing for the A channel), which works in parallel with the vec3 processor. Would the R300 hold any advantage over the NV30 because it can execute a scalar and vector simultaneously, or would the NV30 be able to use a vector unit for scalar processing (math functions) and have the other vector unit for RGBA calculations (assuming it can execute 2 arbitrary half-float instructions per cycle in 64-bit mode)?
Thankyou if you can be of any help.