Wii U hardware discussion and investigation *rename

Status
Not open for further replies.
Damn, I can't help myself.
It is impossible unless you have access to both the WIiU and either HD-twin to make direct benchmark comparisons. But if we use geekbench, and the PS3 and PPC7447 (similar to the 750, but with Altivec and improved bus interface), Their integer and floating point scores at 3.2GHz and 1.25GHz respectively is:
INTEGER AGGREGATE:
PS3:920
PPC7447:879
FLOATING POINT AGGREGATE:
PS3:702
PPC7447:925

So... even IF the Broadway core is largely untouched, at 1.25GHz it should still be roughly at the level of the PPE and by extension the Xenos at 3.2GHz, excluding SIMD FP. And of course, the WiiU CPU has a much more sympathetic memory subsystem than the mac mini I used for this comparison, and I still believe IBM has done a bit more for its $1 billion than just tack a new L2 cache interface and rudimentary SMP support.

From what I understand, PPC 7400 meets your description but the 7450 (and 7447 which is a derivative of it) instead fall under the "G4e" family which is a substantially different CPU core. Motorola was employing pretty dumb naming at the time. Ars had an excellent article on this, if you haven't caught it before: http://arstechnica.com/features/2004/10/ppc-2/3/

Since "G4e" has much improved reordering capabilities and much wider dispatch and execution width I wouldn't consider it a suitable proxy for Broadway (which from a core uarch point of view changes close to nothing over 750 outside of adding paired singles to the double precision FPU)
 
it's nearly impossible to make good comparisons on Espresso and Xenon on paper specs alone, the two architecture are radically different so it's hard to know how they will perform.

I remember years ago when PS3 vs 360 was all the rage, I thought naively that the PPU from Cell was basically equivalent to one core from Xenon. Both cores were really really similar from a high level point of view, same units, same frequency, same latencies for instructions... I had a hard time seing how they would not perform the same (excepted for the enhanced VMX unit on the Xenon). But at that time someone on this forum with knowledge on the matter (I think it was ERP but I can be mistaken) told me that on their benchmarks they were seing really different performance from both core. So even with two cores really similar, the reality was not so easy to guess from paper specs alone.

Oh, definitely. And the WiiU and and the HD twins are far more different from an architectural point of view. The exercise was mostly to demonstrate that scalar code is not likely to be a big performance issue for ports. There are areas where the performance level is bound to differ a lot more.
Also, remember that even if we believe the rumors, we don't know more than that cores support the Broadway ISA. But it could still be a superset, and even if it is identical that still doesn't say much about whether the core is modified. x86 is an obvious example. Regarding OoO execution, I've already ranted a bit about the benefits being dependent on the overall system, rather than just the core, but it also bears mentioning that both for instance the ARM11 and the Cortex A9 are correctly classified as OoO, but the A9 core has roughly twice the performance/clock - nevermind the Cortex A15!
Moral of the story is that OoO vs. in order can't be summed up in a convenient performance factor.
 
From what I understand, PPC 7400 meets your description but the 7450 (and 7447 which is a derivative of it) instead fall under the "G4e" family which is a substantially different CPU core. Motorola was employing pretty dumb naming at the time. Ars had an excellent article on this, if you haven't caught it before: http://arstechnica.com/features/2004/10/ppc-2/3/

Since "G4e" has much improved reordering capabilities and much wider dispatch and execution width I wouldn't consider it a suitable proxy for Broadway (which from a core uarch point of view changes close to nothing over 750 outside of adding paired singles to the double precision FPU)

I took a short cut.
The G4e had a lengthened pipeline and was intended to scale higher in frequency. It compensated for the lengthened as you describe. Overall it was a win. But it actually had slightly lower IPC than its predecessor, so I think it serves well enough. The differences are swamped by other errors - it is a ballpark estimate.
It bears mentioning that if we look at geek bench or even a benchmark such as the SPEC suite, the differences between the individual tests can easily be a factor of three or more. Cross architecture benchmarking is really, really tricky.
 
Could you back up these IPC claims? I find it dubious that moving to 3+1 over 2+1 decode, 3 simple ALU + 1 MUL/DIV over 1 simple ALU + 1 simple + MUL/DIV, and far more reordering capability would result in a lower IPC due to nothing more than gaining 2 cycles in branch misprediction penalty. Especially when it also gained superior branch prediction.
 
Oh, definitely. And the WiiU and and the HD twins are far more different from an architectural point of view. The exercise was mostly to demonstrate that scalar code is not likely to be a big performance issue for ports.
Excepting where engines are heavily biased towards vector processing over scalar processing. If modern games need strong SIMD capabilities, Espresso is going to struggle regardless of its scalar performance. That's perhaps outside the scope of this thread though. The CPU seems a relatively known quantity - we're not going to be out by a massive factor in our estimates of its capabilities, unlike the GPU which is a bit of an unknown.
 
Will Havok need to be refactored for GPGPU work? In which case how much does that take from the graphics potential of the machine?

FYI it's my understanding that Havok aren't moving any of the base physics stuff to GPGPU, they are instead looking at additional "physics effects" to host there. Despite the earlier Havok on GPU demos they don't believe it's worth moving the core physics work there.

IMO people should pretty much forget about GPGPU bailing out the WiiU.
 
FYI it's my understanding that Havok aren't moving any of the base physics stuff to GPGPU, they are instead looking at additional "physics effects" to host there. Despite the earlier Havok on GPU demos they don't believe it's worth moving the core physics work there.

IMO people should pretty much forget about GPGPU bailing out the WiiU.
The whole GPGPU focus was Nintendo's idea. If Havok won't do it, I guess they'll port it themselves and bundle their own, custom version of Havok with the SDK. They have a distribution deal with Havok and a few guys working on optimizing 3rd party middleware for their machine.
 
The whole GPGPU focus was Nintendo's idea. If Havok won't do it, I guess they'll port it themselves and bundle their own, custom version of Havok with the SDK. They have a distribution deal with Havok and a few guys working on 3rd party middleware.

Last I checked Havok still don't release the core solver source code under any license.
And Havok are not doing it because they don't think it makes any sense to do it. I don't think Nintendo are going to reimplement Havok.
GPGPU is not a good fit for a wide class of problems, anywhere you have unpredictable memory access, you have an issue, Havok believes that there general physics solver is not a good fit.
 
Last I checked Havok still don't release the core solver source code under any license.
And Havok are not doing it because they don't think it makes any sense to do it. I don't think Nintendo are going to reimplement Havok.
GPGPU is not a good fit for a wide class of problems, anywhere you have unpredictable memory access, you have an issue, Havok believes that there general physics solver is not a good fit.
Well, the deal with Nintendo is unprecedented as far as I can tell. We'll see. I don't know what they have in mind either. But like I wrote: Nintendo focussed on GPGPU for a reason, and their engineers aren't nearly as stupid as many people seem to believe. I think they thought that through, and probably made changes to the GPU as required.
 
Could you back up these IPC claims? I find it dubious that moving to 3+1 over 2+1 decode, 3 simple ALU + 1 MUL/DIV over 1 simple ALU + 1 simple + MUL/DIV, and far more reordering capability would result in a lower IPC due to nothing more than gaining 2 cycles in branch misprediction penalty. Especially when it also gained superior branch prediction.

No, not really. Mind though, I'm talking of IPC for actual applications. I'm operating from memory on this one, but remember that the first version also supported up to 2MB of L3 cache. Overall it is a similar situation to what we are discussing here today - the lower clocked core enjoying the benefit of lower latency memory access through the memory subsystem, when access time is counted in processor cycles, and with larger cache to boot. In actual applications (as opposed to L1 resident benchmarking of the core ALUs), it balanced out as I described it. I remember reading about it in application benchmarks, but I also have some first hand computational chemistry codes. The differences though, were minor.

It is of limited use to regard the core in isolation from its memory hierarchy. Geekbench and other such tools should be referenced with caution.

And as I mentioned, if you go to the individual subtests for the comparisons, you will find huge variations - factors of three or even five. Which means that the differences in scores are very dependent on the how the few individual subtests happen to fit the underlying architecture. (And then we haven't even touched on the compiler can of worms.) Cross architecture benchmarking can't do better than ballpark estimates under the best of circumstances - it's the nature of the beast. But everything I have seen (initial comparisons of the HD twins vs. low end PC processors of the day, the geekbench data, back of the envelope core size comparisons and doodling :) - it all comes out as "roughly comparable". Which given the differences in approach between the cores is pretty much as high precision as you're likely to get, unless you're interested in a specific code and can run it on both systems.
 
Excepting where engines are heavily biased towards vector processing over scalar processing. If modern games need strong SIMD capabilities, Espresso is going to struggle regardless of its scalar performance.
Indeed, which is why it is remarkable that the launch games are as similar as they are, because the differences in peak vector/parallel FP throughput could be an order of magnitude.
This could be either because the average utilization of the SIMD units in game codes is low, or that some of the SIMD tasks are offloaded to the audio DSP and the GPU respectively. Or a combination of the two.
The outlier option being that they've added SIMD units, god knows IBM has designs on the shelf, but it seems like a power-inefficient choice.

That's perhaps outside the scope of this thread though. The CPU seems a relatively known quantity - we're not going to be out by a massive factor in our estimates of its capabilities, unlike the GPU which is a bit of an unknown.

The GPU is unknown both in terms of the number of ALUs and their capabilities. Some eDRAM access details would be nice too. For games, the vagaries of the GPU indeed seems more significant than the implementation details of the CPU.
 
Nintendo focussed on GPGPU for a reason, and their engineers aren't nearly as stupid as many people seem to believe. I think they thought that through, and probably made changes to the GPU as required.

Stop. Please stop. Just stop.
 
Well, the deal with Nintendo is unprecedented as far as I can tell. We'll see. I don't know what they have in mind either. But like I wrote: Nintendo focussed on GPGPU for a reason, and their engineers aren't nearly as stupid as many people seem to believe. I think they thought that through, and probably made changes to the GPU as required.
I think you are vastly overrating Nintendo's engineering abilities. GPGPU has been a WIP for an age. It existed long before PS360 as a concept. We are seeing GPU's evolve to support GPGPU functions, but the concept is broad and has presented significant issues to the best in the business (AMD and nVidia engineers) and software engineers trying to get GPUs to do non-graphics tasks. I doubt there's anything Nintendo can bring to the hardware design table that's going to be a huge enabler here. Whatever GPGPU capabilities are in their processor are from AMD. I also doubt Nintendo have the smarts to execute GPGPU code better than everyone else; they have zero experience other than whatever their RnD teams may have come up with.

I've just spent a moment Googling "Nintendo GPGPU" and all I've found is there was a "Nintendo Direct" episode where we're told the GPU can do GPGPU. I'm not finding anything to suggest Nintendo are writing core GPGPU code, recoding middleware, or even helping developers code their games. Indeed, as mentioned before, I even find Iwata saying Nintendo aren't helping developers code because developers have just as much ability as Nintendo have.

I can't even find anything in support of your view that the Havok relationship is unprecedented. Sony licensed Havok for inclusion in the PS3 SDK in 2005.
 
Stop. Please stop. Just stop.
Why should I? The whole thing isn't based on rumors or made up by delusional fanboys for god's sake. See the September 13th video presentation by Iwata or check the leaked specifications. GPGPU was the only GPU feature Iwata highlighted in the presentation. It's mentioned in the developer specifications, and is the only feature added to the AMD R700 feature list. Everything else was copied and pasted, except for this one bullet point.
 
Stop. Please stop. Just stop.

You don't understand the new GPU centric paradigigimn that Nintendo have created. When you factor in the CGPU, the edram, the low latency, the edram and the CGPU you can see that the WiiU is actually *too* forward looking, and will come into its own against the PS4720, where it should handle ports more easily.
 
I can't even find anything in support of your view that the Havok relationship is unprecedented. Sony licensed Havok for inclusion in the PS3 SDK in 2005.

Not to mention that Havok was heavily optimised for SPEs back just before Motorstorm 2's release (late 2007?) to the point where Havok had a press release saying that after the optimisation the Cell processor ran Havok stuff something like 5x faster than the 'best' Intel Desktop CPU at the time (probably not really, but still). If I remember correctly that is, but anyway, Havok's everywhere.

And PhysX is still also widely licenced, for that matter. It's just that occasionally for the PC market Nvidia will volunteer their can of coders to a high-profile game and build some special effects in a game, which is then limited to PC NVidia cards artificially (again, as far as I understand ... ) as part of the deal.

Anyway, currently it seems that problems with getting the CPU to do what needs to be done seem to be the primary cause for multi-platform titles to at times perform worse than both 360 and PS3, despite a more modern GPU and more RAM. I'm sure things will smooth out over time, but with the low power consumption of the Wii U for what it is, I think parity with the 360 would already quite a feat - same performance at less than half the power.

You don't understand the new GPU centric paradigigimn that Nintendo have created. When you factor in the CGPU, the edram, the low latency, the edram and the CGPU you can see that the WiiU is actually *too* forward looking, and will come into its own against the PS4720, where it should handle ports more easily.

Such a shame then that it will most likely only ever get ports of PS3/360 titles (or game engines). ;)
 
I think you are vastly overrating Nintendo's engineering abilities. GPGPU has been a WIP for an age. It existed long before PS360 as a concept. We are seeing GPU's evolve to support GPGPU functions, but the concept is broad and has presented significant issues to the best in the business (AMD and nVidia engineers) and software engineers trying to get GPUs to do non-graphics tasks. I doubt there's anything Nintendo can bring to the hardware design table that's going to be a huge enabler here. Whatever GPGPU capabilities are in their processor are from AMD. I also doubt Nintendo have the smarts to execute GPGPU code better than everyone else; they have zero experience other than whatever their RnD teams may have come up with.

I've just spent a moment Googling "Nintendo GPGPU" and all I've found is there was a "Nintendo Direct" episode where we're told the GPU can do GPGPU. I'm not finding anything to suggest Nintendo are writing core GPGPU code, recoding middleware, or even helping developers code their games. Indeed, as mentioned before, I even find Iwata saying Nintendo aren't helping developers code because developers have just as much ability as Nintendo have.

I can't even find anything in support of your view that the Havok relationship is unprecedented. Sony licensed Havok for inclusion in the PS3 SDK in 2005.
Did Sony also give Havok away for free? Well, I didn't know that. My mistake.

Anyway, the problem I see here is that, as stated several times, Nintendo themselves highlights the feature. Why should they? They usually simply don't talk about tech, this was a very rare exception. Do you think they fell for some AMD snake oil? That's very hard to believe.

I tend to believe that hardware manufacturers like AMD or Nvidia usually have to follow PC paradigms, even more so considering neither of the two is the market leader. It makes no sense putting too much time and money in a feature nobody will use, especially not if it requires costly and rarely used additions or compatibility breaking changes to the hardware. In the embedded space, there's no reason to hold back.

And I'm afraid a few Linkedin profiles got changed/ erased after one or two got too much attention, so you probably won't find the sources for my claims anymore. But Nintendo has (or had) people working on that stuff. 3rd party middleware optimizations I mean, I never found a concrete mention of GPGPU either.
 
Last edited by a moderator:
It's just that occasionally for the PC market Nvidia will volunteer their can of coders to a high-profile game and build some special effects in a game, which is then limited to PC NVidia cards artificially (again, as far as I understand ... ) as part of the deal.

There was a quote from one of the top people at nvidia where he said "we will port your physics engine to physx for you" (or something along those lines)
 
You don't understand the new GPU centric paradigigimn that Nintendo have created. When you factor in the CGPU, the edram, the low latency, the edram and the CGPU you can see that the WiiU is actually *too* forward looking, and will come into its own against the PS4720, where it should handle ports more easily.

No, you don't understand the reality of the situation. I suggest you reread the posts of other developers and well informed posters such as Sebbi, Erp, and Shifty Geezer. GPGPU is no magic bullet. Please stop with the silly dreams. Yours will just get crushed as it will not amount to anything to vastly improve the performance of the WiiU.

[EDIT: Function, I truely hope you were being tongue-in-cheek sarcastic. If so, it was missed the first time I read your post.]
 
Status
Not open for further replies.
Back
Top