Humus said:
Well, be prepared to pay then. 8)
One example is that removing common subexpressions is a good optimization for the R300 since register usage is for free. On the NV30 on the other hand it's not an optimization at all, rather the opposite since register usage is costly. There's no way a common intermediate version can be optimal for both, so the compiler needs to either unfairly favor one, or come up with a good compromise that's decent but not optimal for either card.
Also, LRP and CMP are expensive on the NV3x, but SINCOS is very cheap. FXC has an affinity for choosing these over other constructs. IF_PRED would be more optimal for the NV30, and SINCOS is way cheaper than a power series expansion. The SINCOS expansion is devastatingly inefficient on the NV3x because it eats up multiple extra registers. Another example is BIAS, and SHIFT operations, which on DX8 HW and some DX9 HW are hardware supported. But DX9 can't represent them, so code for "(x - 0.5)*2" generates code like
def c0, 0.5,2.0, 0, 0
add r0, v0, c0.xxxx
mul r0, r0, c0.yyyy
Which requires the driver to do some real heavy lifting to figure out what the hell is going on, since it will have to inspect the content of the constant registers themselves to figure out if it could generate a HW BIAS/SCALE modifier or not. And if the code is 2*X - 1, a GLSLANG compiler could still figure out how to use HW bias/scale via strength reduction techniques, but FXC will merily generate raw code for this.
Oh, did I mention that FXC doesn't do constant folding correctly and that I noticed that sometimes it would actually waste a register to add two constants together that could have been rewritten with a fold?
FXC is more efficient for an register-combiner-like phased pipeline (e.g. R300), and unfortunately, doesn't take kindly to NV3x's choices of going with specialized SINCOS and predication HW.
I don't believe the NV3x's pipeline will ever beat an R300, if both are optimized to max. The issue is not whether the R300 isn't a killer card that destroys the NV3x. The issue is whether or not DX9 will restrict pipelines in the future that have more flexibility. It's hurt the NV3x already, and I'm just worried that when they try to introduce real HW branching into the R300 successor, we are going to run into significant problems.