Here's some numbers to think about..
Redwood: 627 MT, 104mm², 400 ALUs, 20 TMUs, 8 ROPs, 128 bit, GDDR5
Juniper:, 1040 MT, 170mm², 800 ALUs, 40 TMUs, 16 ROPs, 128 bit, GDDR5
(Juniper - Redwood) = (170 - 104 = 66mm²) and (1040 - 627 = 413 MT) for 400 ALUs, 20 TMUs, and 8 ROPs.
(627 - 413) = 214 MT for Command Processor, Setup Engine, Tesselator, UVD, I/O, ...
So Juniper use 826 MT for 800 ALUs, 40 TMUs, and 16 ROPs.
24 SIMD = 384 SP 5D = 1920 ALUs, 96 TMUs,
"38 ROPs" = (4.8 x 66mm² = 317mm²) + (38mm² x 1.5) for the "dual-engine" uncore = ~374mm².
30 SIMD = 480 SP 5D = 2400 ALUs, 120 TMUs, 48 ROPs = (6 x 66mm² = 396mm²) + (38mm² x 2) for the "tri-engine" uncore = ~472mm².
24 SIMD or 1920 ALUs is 20% more than Cypress, so this chip needs 20% more bandwidth with the same 256 bit bus.
If we look around we can find 6 GHz GDDR5: Elpida EDW1032BABG60F, Hynix H5GQ2H24MFR-R0C, Samsung K4G20325FC-HC03.
6 GHz is 20% more than 5 GHz, so a 1920 ALUs chip with 6 GHz GDDDR5 on a 256 bit bus seems viable.
A 30 SIMD chip at 800 MHz will deliver 3840 GFLOPs (41% more than Cypress) of peak performance with a ~260W TDP.
This chip may have a 384 bit bus (48 ROPs) with 4.8 GHz GDDR5, and 472mm² is sufficient for that. (R600: 512 bit, 420 mm²)
And that's without any architecture optimizations vs Evergreen.
(Who's said Fermi killer ?!)
24 SIMD for Bart, and 30 SIMD for Cayman ?
These new GPUs could supplement the Evergreen family.
PS: Antilles = 2xBart = 48 SIMD = 3840 ALUs = 5000 GFLOPS @ 651 MHz.