Ah, I see. But for nVidia hardware, you'd want to add an extra MUL for each TEX, making it:DemoCoder said:I'm just counting the norms. (2 norms expanded out, norm++ means "+1 norm") There are 4 norms being done. These potentially execute on NV HW with single cycle throughput if the compiler can recognize them. I'm not saying anythng about how you could rewrite them with macros or not (although using NRM would probably be superior since it is easier for the compiler to map it)
I think it is a relevant observation that roughly 1/3 of the instructions are performing norms.
3:27 for ATI
3:19 for NV
(I think you're off by one on your 3:15 number, because MAD would become ADD, not taken away entirely)
Still a big difference, of course.