AMD desperately needs to address some architecture bottlenecks, otherwise a increase from 24 simds to 32 simds (at the same clock) would give a 10% perf increase at best. So IMHO SI can't really be almost the same as NI. For that matter, I doubt the original plan actually had more simds, unless there were other changes (which seems unlikely given that Dave said the design was just back-ported to 40nm).
I pretty much agree, yet vehemently disagree.
I agree Cayman would be pretty well maxed out at 1792sp (<10% more perf from SIMDs), but that is more than 24. Anything above that appears pointless as far as I can tell...and I've come at it from A LOT of different angles. I don't doubt the possibility I'm overlooking
something, but at this moment it looks like the buck stops there for absolute efficiency. I hardly believe that makes the architecture broken though, or non-scalable.
That said, I think 6970 is a woefully-damaged product compared to what may have been (on 32nm or not). I have little doubt at 900/6000, a quad more SIMDs would have put 'RV970' eye-to-eye with GTX580. GTX580 seems clocked as if they were still expecting that product, it's that close (according to my math anyway).
Going by TPU's latest graph at 1920x1200:
GTX580 avg 1.11363636_ faster than 6970 (+-1%).
580 has 9.2% greater BW.
9.2%*16% = 1.472% performance difference from BW.
Difference: 1.09891636363636_
On Cayman, 1% SIMD theoretical increase = .5% RW performance increase on avg. (Judging by overclocking results & 6970/6950 difference adjusting for BW)
2703660*1.1978327272727_ = 3238532 Flops
3238532/(1792*2) = 903.6mhz
903.6/900 = .4%/2 = .2%
SRSLY.
Now, is that PERFECTLY EXACT & REFLECTIVE FOR ALL SCENARIOS? No. Is there possibly a bottleneck between 1536-1792 (and probably 1664-1792)? Sure. I think it's pretty close though, as-in within a couple/few percent.
_____________________________________________________________________
I think the next stop is 48 ROPs. 40-42 SIMDs. 40 makes the most sense for lots of reasons, but 42 meshes nice with 1792/32, plus it's the answer to everything in the universe, why not a future GPU SIMD count. At this point my opinion (which seems to change randomly) is that I would be surprised if neither is the case. If '32nm Cayman' was going to replace Cypress at a similar size, something has to do it in-turn on 28nm. As such, something was always going to replace Barts, and that previous 32nm part was probably going to be shrank to 28nm. A Cypress size die, going from 32->28nm would fill that spot nicely.
I hope the 28nm stack has both those discrete chips TBH, I think they'd be shoe-in replacements for Cypress and Barts. I'd be happy with half-specs as Juniper and Redwood replacements too. You'll notice they would oddly pair up with past parts (896 VLIW4 = 1120 VLIW5, 1280 VLIW4 = 1600 VLIW5.) Strange, that...only not really at all.