ShootMyMonkey
Veteran
With MS' HLSL compiler, it's gone in the opposite direction, FWIW. It used to be around 400-600 msec a year ago for a vs+ps of that size, but now that's around 650-950 msecs. Though flow control really throws it into hell -- loops bring it from ms to secs. Though I suppose that any case where you feel the need for flow control, you also have something pretty large (and I do have to deal with generated shaders that amount to a few thousand instructions). All the same, though, the end results are better than they were back then. The 2.0 targets were pretty darn good a year ago, but the 3.0 targets produced some really laughable excuses for optimization. Nowadays, they're about even.from memory ~1-2 years ago with nvidia drivers (donno 50.xx-60.xx)
an average vs + fs program ~100 instructions were taking 600-700msec
now with 90.xx thats down to ~50msec on the same machine