AI agents wouldn't need to be aware of other AI's decision making (in fact, that'd go beyond AI to precognition!). Each agent only needs evaluate its place. This is serial processing. And I'm sure clever algorithms can create fancy multi-dimensional datasets that encapsulate the various state of play for key objects for fast evaluation, calculating them all individually and then creating an overarching representation.
But even if not, Wii U's maths powers are poop, but these are important in AI. Calculating distances, intercepts, collisions, mathematical weightings, etc. Cell may not be the best at churning through finite state machines, but Wii U certainly has its own fair share of AI shortcomings such that it shouldn't be assumed Espresso can handle everything AI that PS360 can.
You assume that Wii U's CPU can't do it and you probably ignore advantages that it has...
Espresso is Out-of-Order-Execution in comparison Xenon and Cell are In-Order-Execution
"The key concept of OoOE processing is to allow the processor to avoid a class of stalls that occur when the data needed to perform an operation are unavailable. In the outline above, the OoOE processor avoids the stall that occurs in step (2) of the in-order processor when the instruction is not completely ready to be processed due to missing data.
OoOE processors fill these "slots" in time with other instructions that are ready, then re-order the results at the end to make it appear that the instructions were processed as normal. The way the instructions are ordered in the original computer code is known as program order, in the processor they are handled in data order, the order in which the data, operands, become available in the processor's registers. Fairly complex circuitry is needed to convert from one ordering to the other and maintain a logical ordering of the output; the processor itself runs the instructions in seemingly random order.
The benefit of OoOE processing grows as the instruction pipeline deepens and the speed difference between main memory(or cache memory) and the processor widens. On modern machines, the processor runs many times faster than the memory, so during the time an in-order processor spends waiting for data to arrive, it could have processed a large number of instructions."
http://en.wikipedia.org/wiki/Out-of-order_execution#Basic_concept
Espresso has 3MB of L2 cache compared to Xenon's 1MB and Cell's 512KB
~All three have 64KB (32/32 kB instruction/data) L1 Cache
Each Espresso core has own L2 Cache
Core 0 512KB - Core 1 2MB - Core 2 512KB
Each L2 Cache is 4-way, set associative compared to 2-way on Gamecube/Wii
Each L2 Cache is 2-sectored
8-way, set associative L1 Cache
6 Execution Units per Core, total of 18 Execution Units
Sectored Cache
"As Raf suggests, the SPG manuals volumes 1 and 2 are great resources for gleaning details such as these, and at the risk of repeating some of the useful information he passed along, here's an overview of memorycaching (as opposed to some of the other caching done in the processor). Cache types fall in a spectrum from Direct Mapped (where every line in memory has its own cache line) to Fully Associative (where every cache line can hold locally the contents of every memory line). N-Way Set-Associative caches lie somewhere in the middle: each cache line can hold the contents of some set of memory lines, which are evenly dispersed and interleaved through memory, and as noted in the text Adrien quoted, reduces the number of cache lines the processor must examine in order to determine whether there was a hit.. Cache line size may vary per architecture as can the number of ways in the caches (and whether the caches are Write Back or Write Through and other esoteric features). Pentium 4 and Intel Xeon processors have a sectored L2 (and L3 if present) cache, which for all practical purposes means that if adjacent sector prefetch is enabled, a request for one cache line of an associated pair of cache lines (Intel implementations all use 128-byte/ 2 cache-line sectors) will also generate a prefetch for the other cache line in the pair on the speculation that it will be needed eventually anyway. This is one of at least four kinds of hardware prefetch supported in current processors. There are a few specialized cases where application of software prefetch (in the form of an actual instruction in the stream) can hide some memory latency by starting the fetch early, but generally it is better to let the machine figure out when to prefetch, since optimal conditions vary from architecture to architecture."
https://software.intel.com/en-us/forums/topic/302355
Xenon has shared Dynamic L2 Cache between cores
Evenly split would mean 341,3KB per core or 170,6KB per thread
L2 Cache is 8-way, set associative
L1 2-way, set associative instruction cache
L1 4-way, set associative data cache
5? Execution Unit per Core, total of 15? Execution Units
Cell is likely the same Dynamic L2 Cache as Xenon
Evenly split would mean 170,6KB per core or 85,3KB per thread
L2 Cache is ?-way, set associative
L1 ?-way, set associative instruction cache
L1 ?-way, set associative data cache
...continuing...
Espresso has 4-6 stage pipeline compared to Xenon's and Cell's 32-40 stage pipeline
"It's hard to reduce power consumption in a deeply pipelined processor. Xbox 360 CPU had the longest pipeline in history, approaching 40 stages. Microsoft had to cut this down to 13~15 pipes to reduce power consumption in order for the SOC to fit into a Roku like box, meaning it is basically a new CPU sharing instruction set, not a die shrunk version of old CPU"
http://www.psu.com/forums/showthrea...i-and-Durango(MisterXmedia-Being-Vindicated-)
"I believe if you program only against one main CPU (like we do for pretty much most emus), you would find that the PS3/Xenon CPUs in practice are only about 20% faster than the Wii CPU.
I've ported the same code over to enough platforms by now to state this with confidence - the PS3 and 360 at 3.2GHz are only (at best - I would stress) 20% faster than the 729Mhz out-of-order Wii CPU without multithreading (and multithreading isn't a be-all end-all solution and isn't a 'one size fits all' magic wand either). That's pretty pathetic considering the vast differences in clock speed, the increase in L2/L1 cache and other things considered - even for in-order CPUs, they shouldn't be this abysmally slow and should be totally leaving the Wii in the dust by at least 50/70% difference - but they don't."
http://gbatemp.net/threads/retroarch-a-new-multi-system-emulator.333126/page-7#post-4365165
http://www.avsforum.com/forum/141-xbox-area/758390-xbox-360-vs-ps3-processor-comparison.html
http://forums.macrumors.com/showpost.php?p=1633076&postcount=3
Wii U's CPU is SMP which it has own minor advantages over multi processing.
https://software.intel.com/en-us/bl...rence-between-multi-core-and-multi-processing
http://en.wikipedia.org/wiki/Symmetric_multiprocessing
It can handle it and do it better, it is painfully obvious despite you trying to convince/persuade everyone that it "apparently can't do it"... Its like saying a Core 2 Duo can't beat Pentium D which is two Pentium 4 duck taped together.
You can play SIMD and SPE card if you want as a last resort... It can be done on Wii U's GPU.
"Next you would think that the PS3 (just like the 360) would be able to segment the game control plus AI code into one core and the graphics rendering code into another core. However that is not possible! Since the total application code may be about 100 MB and the SPE only has 256KB of memory, only about 1/400 of the total code can fit in one SPE memory. Also since there isn't any branch prediction capabilities in an SPE, branching should be done as little as possible (although I believe that the complier can insert code to cause pre-fetches so there may not be a big issue with branching).
Therefore the developer has to find code that is less than 256KB (including needed data space) that will execute in parallel.
Even if code can be found that can be segmented, data between the PPE and the SPE has to be passed back and forth via DMA which very slow compared of a pointer to the data like the 360.
If we assume that enough segment code was found that could use all the 6 SPE cores assigned to the game application, now the developer would try to balance the power among the cores. Like the 360, some or all the cores may have a very low utilization. Adding more hardware threads are not possible since each core has only one hardware thread. Adding software threads probably will not work due to the memory constraint. So the only option is an overlay scheme where the PPE will transfer new code using DMA to the SPE when the last overlay finishes processing. This is very time consuming and code has to be found that does not overlap in the same time frame."
Wii U GPU has Wii GPU in itself thus inherited its 1MB SRAM texture cache and 2.25MB Framebuffer that likely can serve different role otherwise it would be a waste of silicon for Nintendo.