I think some people would prefer if it would overclock a bit less well but use 30W or so less (because the default voltage could be lower instead). Of course you could have both (just increase voltage a bit again for OC...).
Anyway, the architecture looks sound, even if it doesn't look like much of an improvement considering transistor count (and shrink-adjusted die size). I bet it would be better with 4 geometry engines and 48 ROPs but can't have anything
.
computerbase.de has run some interesting frequency scaling benchmarks, among others some with reducing bandwidth to HD6970 levels. For 50% more memory bandwidth they actually didn't get that much more performance, though Metro 2033 was quite a bandwidth hog apparently (27% perf improvement for that 50% more memory bandwidth). Unfortunately they didn't run a similar test on HD6970 which might have allowed to make some educated guesses if Tahiti is indeed limited by the lack of ROPs (I tend to think HD6970 would show a larger performance difference with the same downscaled memory clock, hence indicating that Tahiti indeed would benefit from more ROPs but I've got no data to back it up).
Looking back it's actually amazing how accurate some of the early speculations were (or maybe not considering groups of 4 CUs were a given...). The biggest surprise to me was that AMD didn't scale the amount of ROPs along the memory bandwidth (or at least improve the ROP z rate in some other way).