DegustatoR said:
I don't think it's just memory bandwidth. It looks like R420 architecture is designed for high clocks while NV40's - for high efficiency. Much like P4 vs Athlon...
I wouldn't agree with that. I mean, I do agree that Athlon is far more efficient than P4 per clock, but would not use such an analogy for nV4x/R4x0. Look at nV3x vs. R3x0, for example: nV3x had the higher clocks and a much higher transistor count, but at the same time had fewer pixel pipes and was less efficient per clock than R3x0 in many other ways besides. Most importantly there, the clock-rate distinction was mainly the result of the difference in manufacturing processes (.13 for nV3x and .15 for R3x0.)
I'd color the distinction this way: nV4x is in some respects top-heavy and over engineered, whereas R4x0 comparatively is lean and mean with a much more streamlined design. It's not necessarily because nV4x can do more work per clock that it's clock is lower than R4x0, but it's because the basic design is less efficient for the tasks at hand than you'll find in R4x0, and a lot of extraneous, less-efficient "clutter circuitry" balls up the works and creates yield problems and lower clocks and higher temps, etc., at the same time. The top-heaviness of nV40 is reminiscent of that found in nV3x, though presumably not as problematic in terms of general performance. But it's still top heavy, imo, compared to R4x0.
Also, these new high-end cards are so fast that they spend a lot of time waiting on the cpus in games like D3. Factor that in as well as specific engine-code & driver optimizations for specific "benchmark showpiece games," the various levels of cpu dependency in those games, and making generalizations like "nV4x clocks lower than R4x0 because it does more work per clock" are just not as accurate as they may at first appear. Just my opinion, of course...