Wii U hardware discussion and investigation *rename

Status
Not open for further replies.
Even with this low clock some high profile PS360 games like Assassins Creed 3 runs as good on Wii U as on PS360. There must be really some magic in Wii U :)
Well, if the CPU in wuu is broadway-like, then it has out-of-order execution, which helps a lot with raising efficiency when running integer code as is found in game engines, AI routines and similar. The CPU cores in PS360 are in-order execution and are quite slow at such code even though they are clocked much faster. Larger caches also help some; 360 only has 1MB cache for six hardware threads, and that cache runs at half CPU clockrate, causing extra latency when accessed. Wuu CPU cache should run at core clock, theoretically allowing lower latency on access.

Wuu CPU's major major weakness appears to be floating-point/vector performance, which is really really bad compared to xenon CPU in 360, and PS3's cell in particular (which has monster float performance for a chip this old.)
 
Devs will have to put some time into the WiiU to ensure parity with current gen ports.

However, for next gen ports, I believe the WiiU will be left behind.

Once again, the Nintendo console will be for those looking at Nintendo's first and 2nd party efforts primarily.
 
Well, if the CPU in wuu is broadway-like, then it has out-of-order execution, which helps a lot with raising efficiency when running integer code as is found in game engines, AI routines and similar. The CPU cores in PS360 are in-order execution and are quite slow at such code even though they are clocked much faster. Larger caches also help some; 360 only has 1MB cache for six hardware threads, and that cache runs at half CPU clockrate, causing extra latency when accessed. Wuu CPU cache should run at core clock, theoretically allowing lower latency on access.


On parity with current gen? better? worse? What will happen with next gen software?
 
More tweets @marcan42:

It's worth noting that Espresso is *not* comparable clock per clock to a Xenon or a Cell. Think P4 vs. P3-derived Core series.

The Espresso is an out of order design with a much shorter pipeline. It should win big on IPC on most code, but it has weak SIMD.

No hardware threads. One per core. No new SIMD, just paired singles. But it's a saner core than the P4esque stuff in 360/PS3.

I don't know how it compares at the actual clock speeds, but at the same clock the 750 wins hands down except on pure SIMD.

And I'm sure it's not an "idle" clock speed. 1.24G is exactly in line with what we expected for a 750-based design.

So yes, the Wii U CPU is nothing to write home about, but don't compare it clock per clock with a 360 and claim it's much worse. It isn't.

--

Was I living under the rock or is the "no hardware threads. One per core. No new SIMD, just paired singles" part new info?
 
A colleague of mine installed something on his pc that can basically stream anything to almost any Android or iOS device So not sure why beig able to do it to the GamePad would be such a big win ...

Wouldn't you like to be able to play some low-priced PC games through a tablet?
That way, you'd be able to play Assassin's Creed 3, The Witcher 2, XCOM, GTA 4/5, Saints Row, etc. with all bells&wistles without having to be inside the man-cave or occupying the TV.



I've seen that app I think, it can transcode movies on the fly and he can view them wherever there's a fast enough internet connection. But watching a movie on a portable device is one thing - when you also want to use it as a controller, latency will suddenly become very important.

The THD (Tegra-exclusive) version of Splashtop does have a game-controller mode where you create a windows gamepad with buttons/sticks that you create within the tablet/phone's own touchsreen. From the demos I've seen in youtube, it seems to do well in Tegra 3 devices if the WiFi signal is good enough.

Furthermore, maybe this archos gamepad could connect to the PC through bluetooth as a gamepad. WiFi for receiving the video stream from the PC, bluetooth for sending the button/stick signals to the PC. That could greatly decrease latencies, right?
Unfortunately, one would have to be close to the PC since bluetooth is being used.


My job may require some Android programming eventually. More specifically, a body-sensor-network that connects to an Android smartphone.
If I ever get the knowledge, I'll be sure to try and make something for the archos gamepad (the bluetooth controller output thing, no way I'll be sticking my head into low-latency video coding/decoding).

Turns out that the wireless technology is awfully depressing :rolleyes:
I totally thought that they should and would steer clear from wi-fi channels, but instead. Oh well.

I eat my words.

Well at least they're using 5GHz. That may be the one and only up-to-date technology being used in the whole console, tough.
 
Well, if the CPU in wuu is broadway-like, then it has out-of-order execution, which helps a lot with raising efficiency when running integer code as is found in game engines, AI routines and similar. The CPU cores in PS360 are in-order execution and are quite slow at such code even though they are clocked much faster. Larger caches also help some; 360 only has 1MB cache for six hardware threads, and that cache runs at half CPU clockrate, causing extra latency when accessed. Wuu CPU cache should run at core clock, theoretically allowing lower latency on access.

Wuu CPU's major major weakness appears to be floating-point/vector performance, which is really really bad compared to xenon CPU in 360, and PS3's cell in particular (which has monster float performance for a chip this old.)

As someone pointed out (maybe you), it could be a PowerPC 476.
http://en.wikipedia.org/wiki/PowerPC_400#PowerPC_470

The 470 embedded and customizable core, adhering to the Power ISA v2.05 Book III-E, was designed by IBM together with LSI and implemented in the PowerPC 476FP in 2009.[16] The 476FP core has 32/32 kB L1 cache, dual integer units and a SIMD capable double precision FPU that handles DSP instructions. Emitting 1.6 W at 1.6 GHz on a 45 nm fabrication process. The 9 stage out of order, 5-issue pipeline handles speeds up to 2 GHz, supports the PLB6 bus, up to 1 MB L2 cache and up to 16 cores in SMP configurations.

Maybe it's that core, but it can be slightly tweaked, could run eDram L2 similar to the PowerPC A2. I had suggested it was kind of an A2 CPU, but I was wrong, the A2 is in-order and with four-way multithreading, so it relies on a lot of parallelism. That would be probably bad for games.

I don't know if the PPC 476FP has full speed L2, but A2 has half speed. The Wii U CPU might well have half speed L2 as well. Clock speed of the L2 cache is not all that makes a L2 good, there are implementation details (associativity and effective latency, but I don't have a very good understanding of that).
Yeah, probably full speed L2, at least if the CPU only runs at 1.24GHz asking for L2 at the same speed isn't too much.
 
Last edited by a moderator:
Broadway has some OoO, but it's still a pretty modest core that's mostly two-wide + folded branching and with a single load/store unit. It's closer to Cortex-A9 level (probably a little better) than something more aggressive like even a Pentium Pro. So while the IPC should be better than what's on PS3 or XBox 360 it shouldn't be a huge difference for code that's well scheduled and carefully avoids those CPUs big glass jaws like load-hit-store penalties, poorly prepared branches, and poor cache locality. It's certainly not enough enough to make up for a > 2.5x difference in clock speed and lack of SMT, never mind lack of FP SIMD.
 
Hey folks!

The 476fp's L2 runs at half speed. I'm not sure about Broadway's. There is a user's manual floating around the internet somewhere, though. Of course, since they are using eDRAM, the L2 controller is certainly different.
 
More tweets @marcan42:

It's worth noting that Espresso is *not* comparable clock per clock to a Xenon or a Cell. Think P4 vs. P3-derived Core series.

The Espresso is an out of order design with a much shorter pipeline. It should win big on IPC on most code, but it has weak SIMD.

No hardware threads. One per core. No new SIMD, just paired singles. But it's a saner core than the P4esque stuff in 360/PS3.

I don't know how it compares at the actual clock speeds, but at the same clock the 750 wins hands down except on pure SIMD.

And I'm sure it's not an "idle" clock speed. 1.24G is exactly in line with what we expected for a 750-based design.

So yes, the Wii U CPU is nothing to write home about, but don't compare it clock per clock with a 360 and claim it's much worse. It isn't.

--

Was I living under the rock or is the "no hardware threads. One per core. No new SIMD, just paired singles" part new info?

Is he saying that Wii U CPU isn't worse than Xenon?
 
http://raidenii.net/files/datasheets/cpu/ppc_broadway.pdf < Broadway user manual if you want it. An L2 cache line (32 bytes) takes 4 cycles to transfer over a 64-bit interface so we can take it to be full speed.

But if the L2 cache is eDRAM now then there's no guarantee that anything about the original cache design still applies. Broadway's was only 2-way set associative in addition to only being 256KB so it'd be pretty susceptible to way conflicts.

Is he saying that Wii U CPU isn't worse than Xenon?

No, he's saying "since you can't compare the two by their clock speeds alone you can't use clock speed to claim that it's much worse." He says outright that he doesn't know how it compares given the clock discrepancy. I however am saying that if it's a 1.24GHz Broadway with only relatively minor changes there's no way it can beat a 3.2GHz Xenon core for core unless the code is a particularly bad fit for the latter.
 
Last edited by a moderator:
How does bobcat compare to xenon?

Bobcat feels similar to the Wii U CPU, but it at least runs at 1.6+GHz.
For running a web browser or a game engine, a Bobcat may be better, but Xenon is better with a lot of vectorized and multi-threaded code. Afterall this is the only reason Xenon made some sense. Had the Xbox 360 been released a year later, they could have come up with a CPU design that sucks less.
 
No, he's saying "since you can't compare the two by their clock speeds alone you can't use clock speed to claim that it's much worse." He says outright that he doesn't know how it compares given the clock discrepancy. I however am saying that if it's a 1.24GHz Broadway with only relatively minor changes there's no way it can beat a 3.2GHz Xenon core for core unless the code is a particularly bad fit for the latter.

Let's say about 50% more efficiency per thread. That would still be just over half the speed of a Xenon core. So it would be a matter of hoping that you don't have to bother the CPU with as much stuff as you do on the other platforms because you have a decent GPU and more memory as well as optimising your stuff to run on three cores, and even then I can see situations where the CPU just can't keep up with what the game needs. I think they may find ways around it - it's still early days, and people are getting more comfortable with stuff like GPGPU, but it does mean that we'll see a lot of variation at best, where there will be as many games that are worse on Wii U as there are games that are better.

Fortunately for Nintendo they have a very strong first party and good relations with Japanese developers at the moment.
 
Let's say about 50% more efficiency per thread. That would still be just over half the speed of a Xenon core. So it would be a matter of hoping that you don't have to bother the CPU with as much stuff as you do on the other platforms because you have a decent GPU and more memory as well as optimising your stuff to run on three cores, and even then I can see situations where the CPU just can't keep up with what the game needs. I think they may find ways around it - it's still early days, and people are getting more comfortable with stuff like GPGPU, but it does mean that we'll see a lot of variation at best, where there will be as many games that are worse on Wii U as there are games that are better.

Fortunately for Nintendo they have a very strong first party and good relations with Japanese developers at the moment.

Then, could be harder to see "better" multiplat versions on Wii U?
 
One assumes that PS360 CPU code is reasonably optimised for in-order execution somewhat and makes extensive use of formidable SIMD capabilities. In which case Wuu is going to struggle with ports; the only way it can't is if devs have been really inefficient in using Xenon and Cell.
 
Then, could be harder to see "better" multiplat versions on Wii U?

Well, many games are not CPU but GPU bound. Those could still be better. And as said before, we don't really know, but it is certainly not unlikely the GPU in the Wii U would be better for GPGPU applications, that can take care of some stuff currently done on CPU in the PS360s. Look at the FarCry 3 discussion of how Spurs jobs are triggered from the GPU for instance on the PS3, and then run on SPEs. The Wii U GPU could perhaps just keep doing all that easily just on the GPU side, leaving the CPU with a lot less work.

Launch games are relatively meaningless, but they do show that currently there are bottlenecks that developers are struggling with.
 
Yeah but port teams are not going to give it the PS3 SPU care. I really doubt it.

Nintendo tools for that are probably not even up to par vs made by Sony ICE
 
Well, many games are not CPU but GPU bound. Those could still be better. And as said before, we don't really know, but it is certainly not unlikely the GPU in the Wii U would be better for GPGPU applications, that can take care of some stuff currently done on CPU in the PS360s. Look at the FarCry 3 discussion of how Spurs jobs are triggered from the GPU for instance on the PS3, and then run on SPEs. The Wii U GPU could perhaps just keep doing all that easily just on the GPU side, leaving the CPU with a lot less work.

Launch games are relatively meaningless, but they do show that currently there are bottlenecks that developers are struggling with.

GPU bound games? for example?
 
Status
Not open for further replies.
Back
Top