Wii U hardware discussion and investigation *rename

Status
Not open for further replies.
Lump, I'm not sure what your point in this hair-splitting exercise is, but wuu being a poor piece of kit most probably results from bad decisions made by management. It's probably Iwata and his gang that decided to recycle that ancient gamecube CPU yet another time (with a bit more cache tacked on for show this round), but it could also have been on the advice of senior hardware engineering people. Who can say for sure! We would have needed to be a fly on the wall at the time to get the true answer to this.

Wuu is incredibly well built - like every nintendo gadget I've ever held I think (ok, not the N64 analog stick - it was fragile and not a great design), but the architecture is just terrible. Half a decade ago, it'd been serviceable, but half the main RAM bandwidth of competing consoles, terribly aged CPU cores that lack modern SIMD extensions, half RAM held back in reserve (with absolutely no benefit to show for it - slow menu loadtimes are infamous on wuu even after the current update) - it all makes wuu a real blunder of a games console.


Like I said in the above post (the one above yours in reply to Shifty) - I probably misunderstood your point. I've read far too much of this baseless "Nintendo dont know how to make hardware, lol" nonesense in other forums and was assuming thats what you were saying. My bad. Imo its not that they dont know how to compete with the high end hardware - its that they dont think they need to even slightly. Which is where the incompetency of the management comes in.

I agree Nintendo has miscalculated here and have chased up the wrong path. I like my WiiU and have been enjoying it a lot, but I think they should have sacrificed hardware BC in favour of an architecture more closely resembling the other two next-gen consoles in order to future proof it and attract 3rd parties. I'm not saying they should have been on par with Xbone, just alot closer in terms of power and architecture. There's still time for them to turn it around (sales wise) somewhat - but I'm struggling to see how they can at the moment. At best I think this will be just like the GC....which is fine by me, but probably not fine for their sharehodlers coming off the Wii's highs.

Regarding the CPU. I dont know how they arrived at that conclusion. Back compatibility and power efficiency obviously took precendent in their plans. I mean they could have stuck with what they've got but added another core/doubled the cores and have been fairly close to XBone's CPU, couldn't they?

How does Xbones Memory architecture compare?
 
Last edited by a moderator:
Is there any tangible benefit to having the CPU and GPU on an MCM? We don't know what the bus is between those two components, but in the Iwata Asks on the hardware it was mentioned that the connection was "sped up." If the FSB is running at 1/3 the CPU clock rate (as in Gekko), then there must be a 128-bit connection there to get bandwidth at anything near usable, I would imagine.

I'm just wondering if this setup is a better idea than something like the original Xbox 360 design. They could have put the eDRAM on a separate chip and had it on an MCM with the GPU - thus freeing up some room on the GPU for more ALUs, TMUs, etc. Perhaps the added latency between CPU and GPU would have exacerbated the CPU's slow clock even more, however.
 
Is there any tangible benefit to having the CPU and GPU on an MCM? We don't know what the bus is between those two components, but in the Iwata Asks on the hardware it was mentioned that the connection was "sped up." If the FSB is running at 1/3 the CPU clock rate (as in Gekko), then there must be a 128-bit connection there to get bandwidth at anything near usable, I would imagine.



I'm just wondering if this setup is a better idea than something like the original Xbox 360 design. They could have put the eDRAM on a separate chip and had it on an MCM with the GPU - thus freeing up some room on the GPU for more ALUs, TMUs, etc. Perhaps the added latency between CPU and GPU would have exacerbated the CPU's slow clock even more, however.

Its not like the jaguars are clocked 3x or even 2x higher than espresso. Perhaps we should stop referring to the general clock speed of the cpu as slow when all 3 cpu's are pretty much just as 'slow'. Its not like we are looking at 3.2 Ghz systems.

Doesnt embedding the ram have a pretty huge impact on bandwidth, latency and power consumption? So we have more texture and shader units but can we feed them now? there is also the manufacturing cost of another die.

Well. One way or another next week we will have some manner of answers. Iwata himself has stated his intention of 'proving the wii u is not underpowered' regaurdless of the power or lack of the system (or any system) actually has, the solution is the same: high production value visuals. If these arent shown, any discussion i feel id possibly want to take part in would be put on hold indefinately. I love this stuff. Even if its low powered. ESPECIALLY, if its low powered. Impressive feats are far more impressive and intruiging to me in that context. But if nothing is ever shown, then does it even really matter?

Sorry. Idle thoughts. Six days, right?

Back on track. This 'slow' biz actually reminds me. The actual dmips per mhz of the 750 series @90nm are actually still pretty dang good. 2.32 dmips per mhz (Cross referenced through multiple documents). I have no documentation on anything below 90nm. I dont think any exists. Would shrinking to 45nm have a marked impact on this performance?
 
Actually a member I can't remember his name (sorry :( ) ran tests (or made a guestimate based on broadway performances) and the results were far from "sucky".

The real issue is that the system only has 3 of those cores. Another issue I have with the system is the choice of 40nm lithography, would have been fine for a launch in 2011, I think that it was a bad decision to launch in 2012 a system not using 28nm process. For the ref the MARS AMD GPU line is pretty impressive.
 
Actually a member I can't remember his name (sorry :( ) ran tests (or made a guestimate based on broadway performances) and the results were far from "sucky".

That would have been darkblu.

The real issue is that the system only has 3 of those cores. Another issue I have with the system is the choice of 40nm lithography, would have been fine for a launch in 2011, I think that it was a bad decision to launch in 2012 a system not using 28nm process. For the ref the MARS AMD GPU line is pretty impressive.

I agree. There must have been a reason but having it smaller, even with no performance gain would have lightened the load on the cooling at least. I do find the fan to be much noiser than I expected it to be. In fact I don't think I've heard my 360 slim anywhere near as much as my Wii U.
 
People are asking devs that only worked with very different "HD" architectures (third party) to adapt existing software or devs how worked with radically different "SD" architecture (Nintendo) to create software in 6 months that others took years to learn.

According to the leaks and the article at the first post of this thread:

1- Wii u gpu should be better than ps360 in raw numbers (1,5-2x);
2- Wii u cpu should be on roughly on par with PS360, some stuff should be much better;
3- Wii u can offload stuff from the cpu to audio dsp (up to 1 thread or 5-15% of cpu in the ps360 according to know number from ps360 devs);
4- Wii u gpu being "DX"10 should offload some of the cpu work without penalty;
5- It have 2x the RAM just for games;
6- It does have much more than the need edram for 720p;
7- There is a big part of the gpu that we dont know what it is;
8- It is quite probably much more driven for real world performance than big numbers.

It should be a nice upgrade in visuals and fremerate IMO, nothing major but noticeable enough. Still I doubt that engines that have been optimized the last years to use in order cpus small L2, little or no Edram, 1 screen, major latency from RAM, audio on cpu and the list goes on (or nothing at all like Nintendo engines)... will take sometime to show it.

That said games like Bayoneta 2 with ~2x the poly counts on the main character or X with massive landscapes will probably start to show the HW.

Being very different but not a a major upgrade will mean that they will need to learn this machine too, so they can show in what is better.
 
2. Only when properly optimised for, and nearly in all tasks.
3. The PS3 actually has an Audio chip embedded in RSX as far as I know. Some devs still choose to use a fraction of a single SPE for this though, which is extremely efficient for such work afaik
4. Yes, but we don't really know how efficient it can do that, as DDR3 for instance is far slower than in the PS360. Not to mention that for these kinds of things the PS3 has plenty of SPE cycles to spare vs the Wii CPU, and the 360's GPU can do GPGPU reasonably well too.
5. Yes, but again, slower ... But there are some advantages when set up properly, texture handling in particular should be more efficient in combination with the more modern GPU. Criterion have shown as much.
6. But in some aspects, also much slower edram.

Also, the devtools have been very bad, holding developers back.

In general, I think a few games have shown that the Wii U should be able to match PS360 in most areas, and outperform it in several graphics related tasks. Games that are not CPU bound should do well. But not all games that are CPU bound can be easily 'fixed' - transferring some work to GPGPU is not always trivial, and/or may need use of EDRAM to prevent latency being an issue, which however is also needed to help alleviate bandwidth issues caused by the slower DDR3 in the Wii U (13GB/s wasn't it?)
 
1- Wii u gpu should be better than ps360 in raw numbers (1,5-2x);
2- Wii u cpu should be on roughly on par with PS360, some stuff should be much better;
3- Wii u can offload stuff from the cpu to audio dsp (up to 1 thread or 5-15% of cpu in the ps360 according to know number from ps360 devs);
4- Wii u gpu being "DX"10 should offload some of the cpu work without penalty;
5- It have 2x the RAM just for games;
6- It does have much more than the need edram for 720p;

1) Probably not. Possibly less in some raw numbers actually.
2) Possibly not. A number of devs have slated the cpu and Blu's own tests have shown that a PPU would annihilate a wuucore in terms of peak achievable flops (something like 3+ times faster).
3) We don't know what the audio DSP can do
4) XGPU isn't limited to "DX9" and PS360 won't suffer from the DX draw call overheads. There may not be a CPU win for Wii U in this regard, and who knows, there may even be a loss depending on "thick API" overheads.
5) True. And that is a definite win for the Wii U in terms of textures.
6) Wii U may have to use the edram for more than just colour and depth buffers, indeed that may have been the intention when it was designed, so as to allow for a meagre main memory bus without crippling the system.
 
With regards to DX10 isn't he talking about being to clone meshes in hardware for little performance penalty or something like that?
 
Last edited by a moderator:
2. Only when properly optimised for, and nearly in all tasks.
3. The PS3 actually has an Audio chip embedded in RSX as far as I know. Some devs still choose to use a fraction of a single SPE for this though, which is extremely efficient for such work afaik
4. Yes, but we don't really know how efficient it can do that, as DDR3 for instance is far slower than in the PS360. Not to mention that for these kinds of things the PS3 has plenty of SPE cycles to spare vs the Wii CPU, and the 360's GPU can do GPGPU reasonably well too.


3- Is it a DSP or is just or DAC? W had quite a few devs giving audio one full thread, even in the audio spin off thread you have ERP confirming it.

4- I didnt mean gpgpu stuff like physycs (although we have seen quite a few implementations) or AI, but there is gfx stuff that is done on SPUs that should be done on a DX10 gpu, geometry shaders, better AA or just higher precision shaders come to mind. And why shouldn't wiiu gpu do gpgpu better than a xcpu, AFAIK the xgpu should be the ancients from 4xxx gpu architecture. It should do anything that the other do but more efficiently.

Also IIRC on of the advantages of a DX10 gpu (besides higher quality) is that it should be able to do what a DX9 do but a litle faster, at least according to what they said at the time.

1) Probably not. Possibly less in some raw numbers actually.
2) Possibly not. A number of devs have slated the cpu and Blu's own tests have shown that a PPU would annihilate a wuucore in terms of peak achievable flops (something like 3+ times faster).
3) We don't know what the audio DSP can do
4) XGPU isn't limited to "DX9" and PS360 won't suffer from the DX draw call overheads. There may not be a CPU win for Wii U in this regard, and who knows, there may even be a loss depending on "thick API" overheads.
5) True. And that is a definite win for the Wii U in terms of textures.
6) Wii U may have to use the edram for more than just colour and depth buffers, indeed that may have been the intention when it was designed, so as to allow for a meagre main memory bus without crippling the system.

1) It is very hard to accept that in light of the leaked specs and (eg) the article in the first page of this thread.
2) Very different architectures and probably even roles in the console too, it is wait and see.
3) Indeed, but probably better than not having one.
4) Hard to say too, much in play, but what is valid for one could be be for the other too specially on the optimization side and there is still a big chunk of the gpu that we dont know what it is. Plus like I said before AFAIK the xgpu should be the ancients from 4xxx gpu architecture.


What I say is that there is no big reason to think that is under or even equal powered to ps360 but it should overpower it at least a little bit, more the 360 than the PS3.
 
Actually a member I can't remember his name (sorry :( ) ran tests (or made a guestimate based on broadway performances) and the results were far from "sucky".

darkblu's test was only multiplying L1 resident matrices, it shouldn't be taken as broadly representative of CPU performance, even specifically for vector float code. Matrix multiplication is influenced by subtle things like ability to broadcast or do scalar/vector multiplications. That aside, the tests were against Bobcat, it'd run a lot better on Jaguar if the code is good.
 
darkblu's test was only multiplying L1 resident matrices, it shouldn't be taken as broadly representative of CPU performance, even specifically for vector float code. Matrix multiplication is influenced by subtle things like ability to broadcast or do scalar/vector multiplications. That aside, the tests were against Bobcat, it'd run a lot better on Jaguar if the code is good.
Indeed Jaguar are better though, broadway/expresso "core" is not that bad by self.
On top of it Jaguar was not an option for Nintendo, though I wonder what is your pov or other members with significant knowledge on the overall cache hierarchy: I would think that a shared cache could have been more useful than the split, uneven set-up Nintendo chose.

Honestly Nintendo did not have a much better choice if they wanted to stick to POWER ISA.
Actually to give them some credit I wonder if the choice of the PPC 750 CL was a pretty good one, better than Sony and MSFT ones. We can discuss the cache hierarchy, the number of cores, etc. but that CPU core power the GC, the Wii and could still have done well in the WiiU (/need more cores to sit in between this gen and the upcoming one). We have to aknowledge that Nintendo decision were more insightful than both MSFT and Sony, they chose a CPU with "sane" single thread performances, honest throughput (for its time but also silicon and power footprint), whereas MSFT bet on a IO speed demon and Sony on a pretty complicated set-up. Ultimately Sony and MSFT are back using CPU though up to date but that belong in the same category as Gekko/Broadway/expresso.
 
Also IIRC on of the advantages of a DX10 gpu (besides higher quality) is that it should be able to do what a DX9 do but a litle faster, at least according to what they said at the time.

The 360 isn't DX9 though. It's DX10.1 + some - some.

1) It is very hard to accept that in light of the leaked specs and (eg) the article in the first page of this thread.

A lot has changed in the last two years! Take a look at the GPU die shot and play "hunt the shaders". Then take a look at the games and play "hunt the performance."

It's a PS360 level machine, probably with some advantages and disadvantages, that manages PS360 level performance using old nodes and older tech while maintaining HW backwards compatibility and using less than half the power. That's what it does and that's what it was designed to do.

But you should clear from your head any idea that it's "2X 360" because it isn't.

2) Very different architectures and probably even roles in the console too, it is wait and see.

I think we've seen more than enough to say the the Wii U has some CPU issues compared to PS360. Clearly there are multiplatform developers who've seen everything and run code on the Wii U who are absolutely clear about this.

It's unfortunate that Nintendo didn't use six cores to match the six threads of the 360. It would still have been a tiny CPU and would still have used a lot less power than the XCPU.
 
Indeed Jaguar are better though, broadway/expresso "core" is not that bad by self.
On top of it Jaguar was not an option for Nintendo, though I wonder what is your pov or other members with significant knowledge on the overall cache hierarchy: I would think that a shared cache could have been more useful than the split, uneven set-up Nintendo chose.

Surely a uarch first commercially released 12 years ago with almost no modification that we know of (not counting shrinks) couldn't be all that competitive today. Even if they were restricted to this power budget, which they shouldn't have been to this degree, there's so much they could have done with more transistors and newer ideas/better fitting to modern coding practices.

Honestly Nintendo did not have a much better choice if they wanted to stick to POWER ISA.

The only especially good (rational) reason for sticking with the ISA is BC. They could have very possibly designed good BC with completely separate dedicated logic, without taking on too much of a die hit and not compromising the rest of the system. But that's kind of just a guess, I don't know what the real cost of an embedded Wii die would have been.

Actually to give them some credit I wonder if the choice of the PPC 750 CL was a pretty good one, better than Sony and MSFT ones. We can discuss the cache hierarchy, the number of cores, etc. but that CPU core power the GC, the Wii and could still have done well in the WiiU (/need more cores to sit in between this gen and the upcoming one). We have to aknowledge that Nintendo decision were more insightful than both MSFT and Sony, they chose a CPU with "sane" single thread performances, honest throughput (for its time but also silicon and power footprint), whereas MSFT bet on a IO speed demon and Sony on a pretty complicated set-up. Ultimately Sony and MSFT are back using CPU though up to date but that belong in the same category as Gekko/Broadway/expresso.

I agree, in a lot of ways you could call Broadway a more sensible CPU design than what PS3 or XBox 360 got. A lot like how in a lot of ways other processors contemporary to and even predating Pentium 4 were more sensible. They made some similar decisions that look less than ideal in hindsight, with the consoles going further in pushing strong vector performance for very cache/stream friendly code at the expense of everything else. On the other hand, I give IBM credit for trying new things (and Sony/MS adopting them) even if they ended up being bad ideas. Maybe it's Nintendo's longer term play it safe attitude that became more damaging.
 
I agree, in a lot of ways you could call Broadway a more sensible CPU design than what PS3 or XBox 360 got.

While I also agree, I think if you take into account clockspeed and the possibility that 1.25 gHz wasn't realistic on 90nm, Xenon might not be as bad as Wii U CPU makes it look.

Would a quad core Broadway with VMX 128 and, say, a ~1gHz clockspeed have been realistic on 90nm though? It seems like it would have been smaller than Xenon, cooler, and probably easier to extract performance from.
 
Would a quad core Broadway with VMX 128 and, say, a ~1gHz clockspeed have been realistic on 90nm though? It seems like it would have been smaller than Xenon, cooler, and probably easier to extract performance from.

1GHz is attainable, IBM actually rates the processor for up to 1GHz in their datasheets and probably sells it this way.

Broadway was already quite old even when Wii was released so there's no way I could argue it was a great design choice. In terms of CPU/GPU Wii's level of laziness was totally unprecedented. No one before would have dreamed of releasing a 5 year newer update with nothing but a shrunk processor with 1.5x higher clockspeeds. Wii U is substantially less lazy.

If you're going to redesign the processor to add more SIMD you'd may as well change it in other ways. Maybe something G4e derived (but shrunk to 90nm) would have made more sense. They could have done a dual core variant that clocked to 1.5-1.7GHz or so. That would have been a more respectable alternative to Xenon. But it'd have been meaningless without some better GPU to balance it.
 
What WiiU needs right now is more titles to put the question of power into perspective. Let's see whether Deus Ex: HR can run with solid framerates.
 
While I also agree, I think if you take into account clockspeed and the possibility that 1.25 gHz wasn't realistic on 90nm, Xenon might not be as bad as Wii U CPU makes it look.

Would a quad core Broadway with VMX 128 and, say, a ~1gHz clockspeed have been realistic on 90nm though? It seems like it would havel been smaller than Xenon, cooler, and probably easier to extract performance from.
Indeed that's the whole issue with the Wii/WiiU is that to be competitive with the ps360 would have needed more cores, 4 sounds a like minimum. I'm not sure that the Wii/WiiU needed to match Xenon or more the Cell FP throughput to enable a competitive platform.
The terrible part about this is that Broadway with its 256KB L2 has been measured 19mm^2, wiki has the PPC 750CL at 16mm^2, power consumption is way lower than the 3.2GHz Xenon cores.
A quad-core might have been doable @ 90nm and it would have been a cheaper, both power and silicon footprint, alternative to both Xenon and the Cell. Leaving Nintendo aside and the decisions they made, it would have free quiet some silicon (and power) to be invest elsewhere on the system one (Sony or MSFT) could have built around such a CPU set-up. Putting 4 broadway together and adding a beefy 25% overhead we are still only speaking of 100mm^2 vs 176mm^2 for Xenon and 235mm^2 for the Cell.

A posteriori thinking for the win, I think that a quad core broadway @~800MHz (or looking at Sony history a matching MIPS CPU) along with a 4 SIMD xenos +24 Texture units and a sane number of up to date (for the time, ala RSX) ROPs running @400Mhz linked by a 256 bit bus offering ~45GB/s of bandwidth to 512 MB of GDDR3, all that on a single chip (TSMC 90nm bulk process) would have not only be competitive but may have best Sony and MSFT solution with significantly better power characteristics. I don't think that fitting the 256 bit bus would have been troublesome at least till the 55nm node but the thing is that at this point (I guesstimate the starting size somewhere around 350mm^2) the chip would have been ~200mm^2 with really good power characteristics / further shrinking would no longer be a priority (valhalla is 170 +the edram).

That is OT but console manufacturers used to be way too conservative with memory set-up overall from external bandwidth to the amount of RAM, they definitely let that habit behind this time around.
I believe costs associated to a wider bus are over estimated, versus the workaround bandwidth constrain, I remember how opposed every one was opposed to the idea of the next gen systems using 256 bit bus. I think another bias of the console manufacturers was to try to reinvent the wheel (custom CPU designs, workaround bandwidth constrains, etc.) vs doing a more forward looking work of integration, that is a benefit of a posteriori thinking (/it is easy, it doesn't have that much merit...). It is obvious in this SoC era but wasn't at the time those systems were designed.
Xenos capability and a fast link connecting the GPU and CPU (on a SoC) could have enable interesting approaches... back in 2005.

Anyway there is no merit to it, the people working at IBM, MSFT, Sony, Nintendo are among the brightest there is a world between doing something, with the pressure of timelines, enormous budgets, the paradigm of the time and come 7 years (more if you look at design time...) and say well it could have been done in a better way. Actually the later require next to no knowledge, again there is no merit to it.

Broadway was already quite old even when Wii was released so there's no way I could argue it was a great design choice. In terms of CPU/GPU Wii's level of laziness was totally unprecedented. No one before would have dreamed of releasing a 5 year newer update with nothing but a shrunk processor with 1.5x higher clockspeeds. Wii U is substantially less lazy.
Indeed the Wii was lazy, on top of the speed bump going for a dual core sounded like a minimum though Nintendo did not plan to be competitive with that product, WiiU is a way more bothering case as they clearly tried to get some core gamers and games on the system.
 
Last edited by a moderator:
1GHz is attainable, IBM actually rates the processor for up to 1GHz in their datasheets and probably sells it this way.



Broadway was already quite old even when Wii was released so there's no way I could argue it was a great design choice. In terms of CPU/GPU Wii's level of laziness was totally unprecedented. No one before would have dreamed of releasing a 5 year newer update with nothing but a shrunk processor with 1.5x higher clockspeeds. Wii U is substantially less lazy.



If you're going to redesign the processor to add more SIMD you'd may as well change it in other ways. Maybe something G4e derived (but shrunk to 90nm) would have made more sense. They could have done a dual core variant that clocked to 1.5-1.7GHz or so. That would have been a more respectable alternative to Xenon. But it'd have been meaningless without some better GPU to balance it.


Broadway was a brand new entry in the 750 line.... As a lower powered version of an older processor. There were far more powerful processors to choose from at the time even in the 750 series.

Its important to remember the 750 isnt a single processor, particularly since you are pointing out the difference between jaguar and bobcat which have a 15% performance difference at the core, mainly from shrinking and clock increases. With a little cache improvement, more registers...

AMD-Jaguar-vs-Bobcat.jpg


Its a modest difference.

The 750 gx and fx curb stomp the cle (broadway). It is a much, MUCH larger improvement than bobcat to jaguar, even on the same process size, and at the same clock as the 750 cx/cle. I want to attach this ibm documentation on the improvements of the various 750 processors, but i dont see any such feature here.

espresso does not have 3 cle (broadway) cores. Those are 750 Gx's with nintendos custom extensions. Big difference. Huge difference, and no, its not just cache size.

Big enough to beat an 8 core jaguar? Well no, thats silly. But its a big enough difference for it to be just as silly thinking its just a tricore arthur 750 (the twelve year old processor you were talking about, might as well call jaguar or an icore or any x86 platform a 1978 processor since its based off the 8086. Its a silly practice.) Or even broadway.
 
B
The 750 gx and fx curb stomp the cle (broadway). It is a much, MUCH larger improvement than bobcat to jaguar, even on the same process size, and at the same clock as the 750 cx/cle. I want to attach this ibm documentation on the improvements of the various 750 processors, but i dont see any such feature here.

Drop box? Or something similar?
 
Status
Not open for further replies.
Back
Top