Hmm, so this means ALU:TEX is 5:1 (in terms of cycles) rather than 4:1 as it has been for years now. So perhaps there's something in those patent applications that I've linked several times
I expect this will be fine for games, 80 TMUs in Cypress seem to be wasted anyway.
Compute applications which depend on L1->ALU bandwidth might be a bit constrained. Though there's always the possibility that TEX->ALUs could be beefed-up. If, as one of the patent applications seems to suggest, ALU's can write to the L1s, then that'll be more interesting...
2 polys per clock is definitely what we want to see.
After Barts's revealing that 16 ROPs ~ 32 ROPs as far as performance goes, I think it's reasonable to expect Cayman to be significantly more bandwidth efficient, and for 32 Cayman ROPs to be worth significantly more than 32 Cypress ROPs.
I can't see anything here that looks faked, and I'm cautiously optimistic it'll work out well...
One possible arrangement?:
- 30 SIMDs - each 16 ALUs with 64 ALU lanes
- 12 octo-TMUs - totalling 96 TMUs
- Each set of 10 SIMDs has 4 octo-TMUs
Or?:
- 30 SIMDs - each 16 ALUs with 64 ALU lanes
- 12 octo-TMUs - totalling 96 TMUs
- Each set of 15 SIMDs has 6 octo-TMUs
I dare say the latter accords with 2 polys per clock.