I really doubt that nVidia would spend time on optimizing something as relatively unknown/unused as Shadermarks and the shaders: Fixed Function - Gouraud seems way to low anyway. I always thought they ran those in FX12 since this was the only part where FX5800 would beat the R300 initially.
Maybe Thomas Bruckschlegel evoked a performance bug in the already twichy FX drivers?
Yes, but then it wouldn't run that fast anyway, those speeds without the the shuffling are simply insane considering what the NV3x is supposed to be capable of doing theorically.
As for yoru first point - that nVidia wouldn't spend time optimizing something relatively unknown/unused...
That has actually crossed my mind too. The numbers of place where nVidia cheat are just too big for it to be possible to be done by a human workforce, and even more so considering how fast they go from "begin using benchmark on review sites" to "optimized benchmark".
So either they've got workers in another dimension, or they developped a way to do cheating automatically - I mean, a script as insane as you'd nearly run a program with special drivers, and the drivers would automatically output "optimization" information usable by real drivers.
For shaders, that isn't too hard. Considering a LOT of performance metrics, and the possibility to not determine the optimizations in real time at all, you could do an awful lot better than the default compiler which got to run real fast.
Of course, I'm getting into my conspiracy theories here. But it's either that, or nVidia's driver team is on proactive drugs. Eh...
Uttar