You should it compare with more recent versions of AMD GPUs. They changed the layout quite a bit (one has now only two groups per full size SIMD, each containing 40 [VLIW5] or 32 SPs [VLIW4]) as AlStrong already mentioned (but he deleted his post afterwards for whatever reason [edit]Ah, I just see he posted an expanded version[/edit]). Just compare with Llano and Trinity. Or with the Brazos die shot (40nm!). The Wii U resembles much more the newer layouts than the old 55nm RV770. Both, visual comparison with newer versions and scaling the RV770 size result in the same. Full size SIMDs are very likely (i.e. 320 Sps in total).In this R770 die shot a row of 80 shaders (arranged to the right of the TMUs in a 4 x 20 fashion) takes up ~13 mm^2:
http://techreport.com/r.x/radeon-hd-4870/die-shot.jpg
(Got link from AIStrong post on the GAF).
In the Wii U die shot a similar row of four physical blocks of shaders takes up ~6 mm ^2.
RV770 was on 55nm, and Wii U is almost certainly on 40nm (going by the edram). This appears to show perfect (or perhaps slightly better than) scaling from 55nm to 40nm. I *think* this means that it is safe to say that the Wii U has 2 rows of 4 x 20 shaders.
So, in summary, I think:
- 40 nm
- 32 MB edram
- 16 TMUs
- 160 shaders
- 8 ROPs
If you don't think that fits, you have to realize that the SPs are actually quite small compared to the total size of the chip. As mentioned, the SP portion of a half size SIMD (40 SPs) of Brazos measures a measly ~1.8mm² or something. In the Wii U GPU it's ~3mm² for a full SIMD (or 1.5mm² for a half). And the SP portion of a full size SIMD in 55nm measured about 6.4 mm² (RV770 actually supports DP, which the Wii U certainly lacks). A full node shrink (as from 55 to 40 nm) should bring this down to 3.x mm². Combine this with the very low clock target (enables a slightly denser layout) and one is perfectly fine.