Devs were somewhat happy, because they compared the WiiU CPU to the other last gen consoles. Which had no cache prefetch (had to manually prefetch even linear arrays, otherwise 600+ cycle stalls), no store forwarding (40+ cycle stalls when writing to same memory location and reading it... e.g. every function call), no direct path between int<->float<->vector register files (making it even harder to avoid store forwarding stalls), extremely long SIMD pipelines (had to unroll loops heavily in order to fill the pipelines). I am not even going to be talking about branches... In-order execution is unable to hide any of these stalls. It's all hard manual work.Wii U cpu is slow without a good SIMD, overclocked gamecube x 3. Devs were somewhat happy only because it is out-of-order.
But WiiU didn't fare well in SIMD heavy code, as it only had 2-wide paired math vs 4-wide SIMD (with single cycle multiply-add) on Xbox One and SPUs on Cell. Modern ARM chips have 4-wide SIMD (NEON). NEON is actually a pretty good instruction set. Better than SSE3 for sure (most PC games are still limited to SSE3).
Last edited: