On that note Entropy, I think it is time to post some of my observations/speculation on the
Wii GP
U.
First off, I'm going to assume that the GPU is fab'ed on TSMC's 40nm process, because it should be quite mature by now and not in as high of a demand. Now if we look at Barts, we can see it has nearly the same area versus R770, (the chip GP
U is rumored to be based on, and the same chip that powered the early dev kits, RV770 LE, if I'm not mistaken.) Namely, 255mm^2 (Barts) vs 256mm^2 (R770). Barts has 77.8% more transistors as well. Let's round that to 78% transistors to account for the 1mm^2 area difference. So, in theory, and assuming my math isn't completely off, AMD should be able to pack 78% more transistors into the same area chip.
Now if we consider that Flipper was
106 mm^2, lets assume again that Nin. is shooting for a chip that is 120mm^2 or under, including eDRAM. (That 106mm^2 figure did include Flipper's 3MB of eDRAM, but not the 24MB 1T-SRAM, which I would say equates to the 2GB of main RAM in the Wii U). Using that 78% figure, at 40nm
Wii GP
U could be ~752 million transistors and be 120mm^2 on the nose.
eDRAM should be 1 transistor-per-bit, just like most/(all?) DRAM, plus some extra misc. transistors. So, say, 280-300 million transistors for 32MB of eDRAM alone. This leaves you with ~462 million transistors for the GPU. (This came from the average of 280 and 300, subtracted from the total of 752)
Looking at the transistor count for other AMD GPUs, I would guess the GPU would look like this.
Frontend:
4 SIMDs
40 ALUs, arranged into 8 VLIW5 pipelines per SIMD
4 TU per SIMD
16KB L1 Texture Cache per SIMD
16KB Local Data Share
16KB Global Data Share
Backend:
16 ROPs (Minus hardware MSAA resolve)
Basically, each SIMD would the same as RV770, but with half the ALUs. The ROPs would hale more from R670 than RV7xx, in an attempt to have more ROPs at the expense of reverting back to software MSAA resolve. (This is my attempt to account for the fact that there is 32MB of eDRAM onboard, yet without any footage of any game using MSAA)
I guessed smaller, yet more SIMDs because this is supposed to be a more GPGPU-centric design. We saw Cayman perform better than Cypress in compute functions with less ALUs, but more total SIMDs, so I figure a similar approach may work here as well.
There is a lot of speculation and assumptions in this post, so I ask anyone with accurate specifics to point these out. To those more knowledgeable than I, does this seem reasonable? Is my math and approximations correct/reasonable?