AMD: Southern Islands (7*** series) Speculation/ Rumour Thread

I don't know what exactly is holding Tahiti back (I suspected ROPs until Gipsel made some interesting points in that respect) - but it doesn't exactly perform like it should compared to the way smaller, low-bandwidth Pitcairn.
I put my lazyness to sleep, open techpowerup and Excel and plotted an interesting chart about % 7870 performance at 2560 per pixel fill rate:
Rop_Chart.png


Maybe ROPs are a bottleneck, but then why Taithi performs so well? Are the ROPs and memory controllers tied in other parts? What associativity for others? What else could blottleneck? For sure it isn't shader processing :cool:
 
Given the alledged TMSC production stoppage and Tahiti's splendid overclocking headroom, I wouldn't be surprised if AMD just scrapped dual-GPU cards for the HD7*** series and gave us a "1.2 Ghz edition" (1Ghz won't cut it against already existing OC models) single-Tahiti-based HD7990.

Corresponding binning and testing has probably been under way for some time now.

The branding (GHz Edition) leaves room for 1GHz+ 7970, and if GK104 is dropping at $550 then it could replace the 7970 which moves down to, say, $499.

I don't see how a GHz edition 7970 precludes the existence of a 7990 'New Zealand', it could be a bit difficult if it's XT ASIC's at Pro clocks but theres a ton of room between single SKU pricing and two boards to fit a 375W card in there.

If GK104 is indeed redefining perf/w then AMD needs a New Zealand, as GK104 dual will be the product to beat (especially if NV have licked power monitoring and got working hard TDP cap plus turbo, neatly leapfrogging PowerTune).
 
I put my lazyness to sleep, open techpowerup and Excel and plotted an interesting chart about % 7870 performance at 2560 per pixel fill rate:
Maybe your picture will appear eventually. Until then:

http://hexus.net/tech/reviews/graphics/36269-amd-radeon-hd-7850-vs-6850-vs-5850-clocks/

A question bandied around the office during the run-up to the HD 7870/50 launch centred on what improvements, if any, AMD has made with successive graphics cards based on the same family? Also, with a view to understanding architecture potency and just how the same-model VLIW and GCN architectures perform against one another, what would happen if the last three iterations of a particular family were run at the same clockspeeds, bringing architecture very much to the fore. Food for thought, huh?
 
The postimage.org website just doesn't seem to be working. Anyone else see the picture?

The picture in EduardoS's earlier post has been showing up just fine on my screen ever since it was posted, same with the thumbnail he just posted
 
What I find most interesting about EduardoS graph ist that the perf/fillrate ratio seems to be strikingly similar for Pitcairn and Cape Verde - two chips that share a very similar balance of ROPs/CUs/TMUs/bandwidth.

As for Tahiti's 35% gain in perf/fillrate: That's actually kind of disappointing given that anything but ROP count was upped by at least 50% over Pitcairn. Perf/fillrate should have been at least 50% better than Pitcairn if ROPs weren't a limiting factor at all.

Reading the graph that way, one could actually assume that Tahiti would have been about 15% faster with 50% more ROPs - as this would have put the perf/fillrate ratio right in line with the very nice scaling of the other two GCN chips. :D

with a 8 RB to 12 Channel mapping there has to be a crossbar somewhere - whether it is from the engine to the RB's or the RB's to the memory channels. You also have to consider how the engine upstream is working and how you tie in RB redundancy as well.

Note, with Tahiti there is not a full RB to memory channel Crossbar - each RB can access just 3 memory channels in order to keep the crossbar complexibility and size down, but still providing some flexibility.

Ahh! So to keep things flexible in Tahiti you've actually got a configuration in which 3 physical RBs share 3 Channels over a crossbar - and for the current HD 79** series you decided to deactivate one RB per crossbar for redundancy? That would explain a lot. ;)

2zsz9zq.jpg
 
Last edited by a moderator:
That would be quite a bit of redundancy. Would it make sense to include that in the reevaluation of the binning for Tahiti? At least assumed it doesn't cause aliasing issues with the assignement of render target tiles for the rasterizers and the ROP partitions.

I wonder what it would do performance wise. While it could help in some situations (8x MSAA for instance), I would almost think the hypothetical combination of a triple setup/rasterizer with 32 ROPs could be more effective for "modern" workloads.
 
I'm going to go with the idea that the smiley at the end is how seriously that diagram should be taken.

BF3 showed that there were scenarios where AMD's ROP throughput was hurt significantly more than Nvidia's. I'd expect those settings to be pushed aggressively in reviews going forward.
 
BF3 showed that there were scenarios where AMD's ROP throughput was hurt significantly more than Nvidia's. I'd expect those settings to be pushed aggressively in reviews going forward.
Comparisons to nV are not very useful considering the different ROP capabilities and different limits which apply.

From the numbers I looked at, I got a different impression when comparing HD7870 with HD7970. When activating MSAA in BF3, the performance delta between Pitcairn and Tahiti does not decrease (as it would if it were seriously ROP limited). On the contrary, it tends to increase, despite the higher ROP throughput of Pitcairn. The higher memory bandwidth of Tahiti overcompensates the (possible) limitation by the ROP throughput. Therefore, Tahiti is not in a hard ROP limit in BF3.
 
Comparisons to nV are not very useful considering the different ROP capabilities and different limits which apply.
That would be one way to find areas where GCN could use some improvement.

From the numbers I looked at, I got a different impression when comparing HD7870 with HD7970.
The numbers seem a bit slower in geneal than benchmarks I've seen on other sites, but I don't know German and it looks like the settings are not identical.

When activating MSAA in BF3, the performance delta between Pitcairn and Tahiti does not decrease (as it would if it were seriously ROP limited). On the contrary, it tends to increase, despite the higher ROP throughput of Pitcairn. The higher memory bandwidth of Tahiti overcompensates the (possible) limitation by the ROP throughput. Therefore, Tahiti is not in a hard ROP limit in BF3.
I'm focusing on its lack of distance from the 580, and statements concerning BF3 where building the g-buffer takes an inordinately long time to complete on AMD chips versus Nvidia.
 
When activating MSAA in BF3, the performance delta between Pitcairn and Tahiti does not decrease (as it would if it were seriously ROP limited). On the contrary, it tends to increase, despite the higher ROP throughput of Pitcairn.
Pitcairn's theoretical ROP throughput isn't much higher than Tahiti's. Plus, in that benchmark you quoted, HD 7970 gains an amazing 3% on HD 7870 when activating MSAA @ 2560x1600.

Problem is that HD 7970 has about 72% [!] more memory bandwidth than HD 7870. So it actually shouldn't even be a competition. That's what trinibwoy pointed out: If a 264.000 MB/s card can't shake off an 153.600 MB/s card by a significant margin in those bandwidth-heavy szenarios, what gives?
 
The numbers seem a bit slower in geneal than benchmarks I've seen on other sites, but I don't know German and it looks like the settings are not identical.
You are right, they changed an additional setting (FXAA), which influences the shader load. So here is another set (only changing MSAA).
But the trend is the same. The distance between Pitcairn and Tahiti doesn't decrease when activating MSAA. It is between about 25% and 35%, generally increases with higher resolution (higher pixel shader ROP and bandwidth load combined, that can be expected) and slightly increases (0% to 2%) when activating 4xMSAA (maybe almost isolated ROP and bandwidth load addition, but frankly I have no idea how the Frostbite2 engine used by BF3 actually works, probably there are also other contributing factors like increased shader load for some steps).
I'm focusing on its lack of distance from the 580, and statements concerning BF3 where building the g-buffer takes an inordinately long time to complete on AMD chips versus Nvidia.
And why do you think the dominating reason for that is the ROP count? Could also be some scheduling/work distribution problem, where nVidia has an upper hand, isn't it?

Pitcairn's theoretical ROP throughput isn't much higher than Tahiti's. Plus, in that benchmark you quoted, HD 7970 gains an amazing 3% on HD 7870 when activating MSAA @ 2560x1600.
The point is, that Tahiti does not lose more than Pitcairn when ROP limitations should set in (activating MSAA) while it maintains a consistently higher performance than Pitcairn also with MSAA.
Problem is that HD 7970 has about 72% [!] more memory bandwidth than HD 7870. So it actually shouldn't even be a competition. That's what trinibwoy pointed out: If a 264.000 MB/s card can't shake off an 153.600 MB/s card by a significant margin in those bandwidth-heavy szenarios, what gives?
Is it a very bandwidth heavy scenario?

Or look at it from the other side:
The performance relation between Tahiti and Pitcairn stays basically almost the same. If you activate MSAA or not, Tahiti is always the same amount faster than Pitcairn (it even gains a percent or two with MSAA). And that with 8% less ROP capacity. Doesn't it tell us, that Tahiti shows consistent perormance in comparison to Pitcairn and is therefore not completely off-balance? They just use different means to get there. But the performance picture is actually quite consistent between Pitcairn and Tahiti.

Of course you can always say that if you would have added 50% more ROPs in a certain game you would be 10% faster (and in a non bandwidth limited fillrate test even 50%). But that would also come at a cost (die size, power consumption and ultimately clockspeed). You could also say that a triple setup/raster engine plus 48 ROPs on a 1.5GHz 384Bit interface may have bought them even 25% performance in some games. A quad setup/raster with 8 pixels/clock per rasterizer would also improve the performance in setup limited scenaries considerably while staying at the 32 pixel/clock raster and ROP limit. But with the available evidence I don't think it is justified to say that Tahiti is mainly ROP limited.
 
And why do you think the dominating reason for that is the ROP count?
For BF3, it wasn't the ROP count but how the design handles wide MRTs. The cost is apparently higher than the total bandwidth needed.
If they did not want to modify that part of the pipeline to reduce the impact of this corner case, upping the peak level of performance with additional ROPs could provide a higher peak come down from.

If Nvidia's design remains as consistent in Kepler as it is in Fermi, then I'd expect that Nvidia's reviews will focus on workloads like that.
 
What I find most interesting about EduardoS graph ist that the perf/fillrate ratio seems to be strikingly similar for Pitcairn and Cape Verde - two chips that share a very similar balance of ROPs/CUs/TMUs/bandwidth.
Actually, since I included the HD7*50 models the CUs/TMUs ratio increase by 25%, with no meaningful increase in performance.

As for Tahiti's 35% gain in perf/fillrate: That's actually kind of disappointing given that anything but ROP count was upped by at least 50% over Pitcairn. Perf/fillrate should have been at least 50% better than Pitcairn if ROPs weren't a limiting factor at all.

Reading the graph that way, one could actually assume that Tahiti would have been about 15% faster with 50% more ROPs - as this would have put the perf/fillrate ratio right in line with the very nice scaling of the other two GCN chips. :D
Since Tahiti have 3-way associative ROPs and the other chips doesn't (direct mapped?) this may increase Taihiti ROP's efficiency.

Ahh! So to keep things flexible in Tahiti you've actually got a configuration in which 3 physical RBs share 3 Channels over a crossbar - and for the current HD 79** series you decided to deactivate one RB per crossbar for redundancy? That would explain a lot. ;)

2zsz9zq.jpg
Initially I read this as 4 partitions each with 8 ROPs and 3 channels, then I realized that maybe Dave forgot to mention something...

Could be two partitions and on each partition 8 ROPs that access the high part of a dual channel controller and 8 ROPs that access the lower part? A wild guess, in fact I have no idea...
 
From the numbers I looked at, I got a different impression when comparing HD7870 with HD7970. When activating MSAA in BF3, the performance delta between Pitcairn and Tahiti does not decrease (as it would if it were seriously ROP limited). On the contrary, it tends to increase, despite the higher ROP throughput of Pitcairn. The higher memory bandwidth of Tahiti overcompensates the (possible) limitation by the ROP throughput. Therefore, Tahiti is not in a hard ROP limit in BF3.
From your link:


„Treiberversionen
* AMD Catalyst 11.11c Performance-Treiber
* AMD 8.921.2-111215a (HD 7970)
* AMD 8.921.2-120119a (HD 7950)
* AMD 8.932.2 (HD 7700)
* AMD 8.95.5-120224a (HD 7900)
* Nvidia GeForce 290.36“
That seems to indicate, no new benchmarks were conducted for the older cards, thus not including the new FXAA path for GCN in effect with the recent patch.
 
@Carsten:
The FXAA thing (together with the driver versions) was the reason why I linked a different test one post later. This second test doesn't have these problems (all cards testes with the 12.3 preview driver) but shows the same trend. ;)
 
BF3 is a system seller, or rather an NVidia system seller, since AMD is so bad in this game. As far as I'm aware there is no insight as to the key architectural features of these chips and how they affect performance in this game.

In summary, Tahiti looks like rubbish, particularly in terms of performance per mm²:

HD7970 0.136986
HD7870 0.188679
HD6970 0.089974
HD6870 0.098039
GTX580 0.084615

from

http://www.hardware.fr/articles/856-13/benchmark-battlefield-3.html

based upon the 1920 ultra results.
 
Back
Top