I still think that when you're given an engineering task that it should be relatively optimal, especially for ASICs.
R100 was quite good considering that it had dependent texturing. From my work at ATI I know it was just barely short of GF3 functionality and had similar perf/mm2. R200 was decent too, if a bit buggy. R300 was fairly optimal as the first DX9 chip, and thus so was R420 (though ATI really should have put FP blending in it). R520 wasn't so great, but R580 was decent considering the DB granularity, although that was a misguided design criteria for the time. Xenos was quite optimal, too.
R600 was pathetic, though, and RV630 even worse. Look at what NVidia did with G80, which was their first unified and first DX10 GPU, and they got it done much earlier than R600. Two architecture refinements later and they've barely improved perf/transistor/clock, which tells you that they optimized quite well right from the beginning. That's how it should be.
It really said something to me when I saw slides saying perf/mm2 was a design goal with RV770. Duh! WTF were the hw architects smoking when this wasn't a design goal for R600?
Anyway, I can definitely say that if I led the R600 team and saw people in the same company produce such a vastly better design later on, I'd be ashamed of myself. You expect 10-20% from design refinement, but RV730 is often 3x the speed of RV635 with only 25% more die space and BW.