they've used the Stanford Bunny. SIGH. What's the matter with a full screen quad?
Because everyone loves bunnies! It's rather amusing that they didn't use the bunny for the fur shader.
Oops. I thought the 4870 was clocked at 725 MHz, not 750.
Anyway, the 4670 won't be 2.5 times as fast drawing the background and there will be some per-frame overhead. Assume the latter is 100 microseconds, the 4870 took 75 microseconds to draw the background, and the 4670 took twice as long. Then the bunny was shaded 2.60x faster on the 4870. The overhead may be even longer.
But why won't RV730 benefit from this? Thus opening up the question, is it due simply to some interaction with ALU:TEX or is it because RV730 has less register file per ALU?
I'm guessing the latter, or maybe it's just a carryover from texture limited shader optimizations.
You may recall that compute-limited shaders let you get away with less register space (I hope you learned that from the program I made for you
). However, in order for RV730 to be 80% as fast as RV770 in all texture limited shaders (as per the TMU count), it needs 80% of the total register space of RV770. Even if per-ALU space is the same, it only has 40%, so it would benefit more from low register use.
Again, this only holds true for texture bound shaders, so such an optimization isn't needed for compute bound shaders like the two you're talking about. Nonetheless, it's possible that the shader compiler isn't tuned to this degree.