Hmm this isn't very convincing. There's some good stuff but overall it doesn't look like a very efficient chip, compared to in-house competition (it still does ok against competition).
Most sites have about a 10% difference between HD 6870 and HD 6950 overall (with HD5870 being as fast as HD 6950), and another 10% between HD 6950 and HD 6970 - about 20% between full Barts and full Cayman then. Cayman cards are definitely priced to reflect performance, though.
Despite Cayman having a 50% larger die size, 31% more memory bandwidth, way more simds (and also more peak alu rate), and also a power draw which is definitely larger than the increased performance would indicate (probably directly related to the increased die size / transistors). Granted it has definitely improved geometry setup / tesselation (with the latter still giving errorneous results in some tests where Barts is actually faster probably due to drivers).
Also the 10% difference between HD 6950 and HD 6970 is very small, corresponding exactly to clock increase (core and mem). If Cypress had very bad simd scaling, Cayman seems to have non-existent simd scaling - anyone bench the cards at same clock? I think I'll stick to the theory that once you go past 8 or so simds per graphics engine (or rasterizer in case of Evergreen) things don't really improve much. Also maybe speculation about not quite sufficient internal bandwidth could be true, it would certainly only get worse if you add more simds (I haven't seen anything indicating bandwidth has improved for Cayman). So maybe the VLIW-4 simds would be more efficient than VLIW-5, but since the simds hardly scale at all it is a wasted effort for this chip to have more (but smaller) simds.
Compared to Cypress, it isn't that bad, but still die area and transistors increased more than performance. Granted, the two graphics engine are definitely warranted for increased tesselation performance (and it pays off in some titles using tesselation) but overall I just don't think it's very efficient.
There's also some good stuff, Powertune imho has tremendous potential in the mobile space I think, but for desktop it's not nearly as important.
Cayman was initially planned for 32nm right? If so I can only wonder what (if anything) was sacrificed for 40nm - I think on 32nm there would be room for some more things even when not exceeding 300mm² (why not 4 GE with 8 simds each and doubled internal bandwidth
).