I've been thinking about what you've said here and I've got a couple of questions about it.
Regarding IO, I know you specifically mention that it isn't likely to be an issue, but I'm not sure why. If we were to assume that Broadway was pad limited for IO based on its scaling from Gekko, couldn't it also be that the WiiU CPU is IO limited? It could potentially need 6 times the data that Broadway did (3 cores x twice the speed, even assuming no other increases). Couldn't that mean a potentially greater number of pads that overwhelmed the benefits of only needing on package communication?
No, not really. The my argument was two-fold, with the first being that a chip like Gekko/Broadway needs off-chip connection, but if you make a tricore version, the number of connections aren't going to triple. As you point out, the off-chip data communications needs
would increase but that part is addressed by keeping signals on-package.
On that subject, what is it about on package communication reduces IO area requirements? Is it that fewer pads are needed because you can signal faster over the shorter distance, or that smaller contact points are needed because you use less power per 'pin'? Or something else?
Bearing in mind that I'm no IC designer, but a computational scientist, to the best of my knowledge both of your points above are correct. What I don't have is hard numbers, that is, if you want to really want to push the signaling speed per connection, how does that affect the necessary area for the associated drive circuitry? On the other hand, I can't really see that it would be an issue here, and in the cases where I've heard it described in more detail, they've claimed both benefits - much faster signaling at lower cost in die area.
Finally, what do you think Nintendo have added to the cores, or are they different cores entirely? I think you're probably correct and I've been wondering what the changes might be. Some kind of beefed up SIMD / Vector support seems desirable, especially given the expected low clocks.
Sorry for the all the questions, but this is quite an interesting topic!
Although the thread title says GPU, I'm inclined to agree.
As to your question, I'll be damned if I know. No developer has yet been heard gnashing his teeth about having to rewrite all SIMD code so Nintendo/IBM adding SIMD blocks to facilitate ports is a possibility. On the other hand Iwata has publicly made vague noises that could be interpreted as that the GPU would be the way to go for parallel FP. Or not. They could also have made a complete rework of the core, a la how different manufacturers produce ARMv7 cores of differing complexity. That would cost a bit though. Or they could have spent gates to beef up only what they deem to be key areas - after all, they have quite a bit of experience by now with where the bottlenecks have proven to be for their particular application space.
While the lack of information is frustrating for the curious, we do know a few things. We know the that the die area is 33mm2 on 45nmSOI, and that the power draw is in the ballpark of 5W. We also know that is going to be compatible with Wii titles, which makes it an open question (but not impossible) if IBM has used a completely unrelated PPC core with sufficient performance headroom per core that performance corner cases can be avoided. "Enhanced" Broadway may indeed be the case.
It's not going to be a powerhouse under any circumstances in raw ALU capabilities compared to contemporary processors. It spends roughly a fifth of the process adjusted die size per core (logic+cache) as the Apple A6 for instance. On the other hand the Cell PPE or the Xenos cores aren't particularly strong either for anything but vector-parallel FP codes that fit into local storage or L1 cache respectively. (Imperfect example could be that for instance the iPhone5 trumps the PS3 in both Geekbench integer and floating point tests). The take home message being that even if the WiiU CPU isn't a powerhouse, it isn't necessarily at much of a disadvantage vs. the current HD twins in general processing tasks even if we think of it as a tweaked Broadway design. If the more modern GPU architecture of the WiiU indeed makes some of the applications that the SIMD units were used for unnecessary, maybe it is a better call to simply skip CPU SIMD. This is a game console, after all.
I have to say though that given what we know today, it seems to punch above its weight even at this point in time. There are a number of multi platform ports on the system, at launch day with what that implies, that perform roughly on par with the established competitors. And those games are not developed with the greater programmability, nor the memory organization of the WiiU in mind. So even without having its strengths fully exploited, it does a similar job at
less than half the power draw of its competitors at similar lithographic processes!
And its backwards compatible. To what extent its greater programmability and substantial pool of eDRAM can be exploited to improve visuals further down the line will be interesting to follow.
How what we have seen so far can be construed as demonstrating hardware design incompetence on the part of Nintendo is an enigma to me.