The numbers seem a bit slower in geneal than benchmarks I've seen on other sites, but I don't know German and it looks like the settings are not identical.
You are right, they changed an additional setting (FXAA), which influences the shader load. So
here is another set (only changing MSAA).
But the trend is the same. The distance between Pitcairn and Tahiti doesn't decrease when activating MSAA. It is between about 25% and 35%, generally increases with higher resolution (higher pixel shader ROP and bandwidth load combined, that can be expected) and slightly increases (0% to 2%) when activating 4xMSAA (
maybe almost isolated ROP and bandwidth load addition, but frankly I have no idea how the Frostbite2 engine used by BF3 actually works, probably there are also other contributing factors like increased shader load for some steps).
I'm focusing on its lack of distance from the 580, and statements concerning BF3 where building the g-buffer takes an inordinately long time to complete on AMD chips versus Nvidia.
And why do you think the dominating reason for that is the ROP count? Could also be some scheduling/work distribution problem, where nVidia has an upper hand, isn't it?
Pitcairn's theoretical ROP throughput isn't much higher than Tahiti's. Plus, in that benchmark you quoted, HD 7970 gains an amazing 3% on HD 7870 when activating MSAA @ 2560x1600.
The point is, that Tahiti does not lose more than Pitcairn when ROP limitations should set in (activating MSAA) while it maintains a consistently higher performance than Pitcairn also with MSAA.
Problem is that HD 7970 has about 72% [!] more memory bandwidth than HD 7870. So it actually shouldn't even be a competition. That's what trinibwoy pointed out: If a 264.000 MB/s card can't shake off an 153.600 MB/s card by a significant margin in those bandwidth-heavy szenarios, what gives?
Is it a very bandwidth heavy scenario?
Or look at it from the other side:
The performance relation between Tahiti and Pitcairn stays basically almost the same. If you activate MSAA or not, Tahiti is always the same amount faster than Pitcairn (it even gains a percent or two with MSAA). And that with 8% less ROP capacity. Doesn't it tell us, that Tahiti shows consistent perormance in comparison to Pitcairn and is therefore not completely off-balance? They just use different means to get there. But the performance picture is actually quite consistent between Pitcairn and Tahiti.
Of course you can always say that if you would have added 50% more ROPs in a certain game you would be 10% faster (and in a non bandwidth limited fillrate test even 50%). But that would also come at a cost (die size, power consumption and ultimately clockspeed). You could also say that a triple setup/raster engine plus 48 ROPs on a 1.5GHz 384Bit interface may have bought them even 25% performance in some games. A quad setup/raster with 8 pixels/clock per rasterizer would also improve the performance in setup limited scenaries considerably while staying at the 32 pixel/clock raster and ROP limit. But with the available evidence I don't think it is justified to say that Tahiti is mainly ROP limited.