jaredpace: I don't believe they would make 256bit part with 960 SPs, especially if 6GHz modules are available.
As for Cayman, 6GHz modules would boost bandwidth by 25% while staying with 256bit bus. I think that could be enough for 1920 SPs / 32 ROPs part.
But how many TMUs? There are 3 basic alternatives:
1. SI will change ALUs from 5D to 4D, ratios will stay the same. That would mean 640 SPs (160 4D ALUs) and 40 TMUs per block. Bart would consist of two blocks (1280 SPs + 80 TMUs), Cayman of three blocks (1920 SPs + 120 TMUs)
2. ALUs will be 4D + ALU:TEX will be boosted from 4:1 to 5:1 to compensate. It would make the GPU smaller and maybe more effective in terms of performance per area, but branching granularity would be worse (and batch size too, if I am not mistaken). Single block would consist of 640 SPs (160 4D ALUs) and 32 TMUs then. Bart ~ 1280 SPs + 64 TMUs, Cayman ~ 1920 SPs + 96 TMUs.
3. ALUs will stay 5D, ratio unchanged, block will be smaller: 640 SPs (128 5D ALUs) + 32 TMUs. Bart ~ 1280 SPs + 64 TMUs, Cayman ~ 1920 SPs + 96 TMUs
As for Cayman, 6GHz modules would boost bandwidth by 25% while staying with 256bit bus. I think that could be enough for 1920 SPs / 32 ROPs part.
But how many TMUs? There are 3 basic alternatives:
1. SI will change ALUs from 5D to 4D, ratios will stay the same. That would mean 640 SPs (160 4D ALUs) and 40 TMUs per block. Bart would consist of two blocks (1280 SPs + 80 TMUs), Cayman of three blocks (1920 SPs + 120 TMUs)
2. ALUs will be 4D + ALU:TEX will be boosted from 4:1 to 5:1 to compensate. It would make the GPU smaller and maybe more effective in terms of performance per area, but branching granularity would be worse (and batch size too, if I am not mistaken). Single block would consist of 640 SPs (160 4D ALUs) and 32 TMUs then. Bart ~ 1280 SPs + 64 TMUs, Cayman ~ 1920 SPs + 96 TMUs.
3. ALUs will stay 5D, ratio unchanged, block will be smaller: 640 SPs (128 5D ALUs) + 32 TMUs. Bart ~ 1280 SPs + 64 TMUs, Cayman ~ 1920 SPs + 96 TMUs