With all the peeping at die shots (which has been tremendous fun) I think we might have gotten tunnel vision and be losing the "big picture". The question of "320 vs 160" shaders is still unanswered and stepping back should help us answer it.
The current popular hypothesis that Latte is a 16:320:8 part @ 550 mHz. Fortunately, we can see how such a part runs games on the PC. You know, the PC, that inefficient beast that's held back by Windows, thick APIs, Direct X draw-call bottlencks that break the back of even fast CPUs, and all that stuff. Here is a HD 5550, a VLIW5 GPU with a 16:320:8 configuration running at @550 mhz:
http://www.techpowerup.com/reviews/HIS/Radeon_HD_5550/7.html
And it blows past the 360 without any problems. It's not even close. And that's despite being on the PC!
Now lets scale things back a bit. This is the Llano A3500M w/ Radeon 6620G - a 20:400:8 configuration GPU, but it runs @ 444 MHz meaning it has exactly the same number of gflops and TMU ops as the HD 5550, only it's got about 20% lower triangle setup and fillrate *and* it's crippled by a 128 bit DDR 1333 memory pool *and* it's linked to a slower CPU than the above benchmark (so more likely to suffer from Windows/DX bottlenecks). No super fast pool of edram for this poor boy!
http://www.anandtech.com/show/4444/amd-llano-notebook-review-a-series-fusion-apu-a8-3500m/11
http://www.anandtech.com/show/4444/amd-llano-notebook-review-a-series-fusion-apu-a8-3500m/12
And it *still* comfortably exceeds the 360 in terms of the performance that it delivers. Now lets look again at the Wii U. Does it blow past the 360? Does it even comfortably exceed the 360? No, it:
keeps
losing
marginally
to
the
Xbox
360
... and that's despite it *not* being born into the performance wheelchair that is the Windows PC ecosystem. Even if the Wii U can crawl past the 360 - marginally - in a game like Trine 2 it's still far below what we'd expect from a HD5550 or even the slower and BW crippled 6620G. So why is this?
It appears that there two options. Either Latte is horrendously crippled by something (API? memory? documentation? "drivers"?) to the point that even equivalent or less-than equivalent PC part can bounce its ass around the field, or ... it's not actually a 16:320:8 part.
TL: DR version:
Latte seems to be either:
1) a horrendously crippled part compared to equivalent (or lower) PC GPUs, or
2) actually a rather efficient 160 shader part
Aaaaaaand I'll go with the latte(r) as the most likely option. Face it dawgs, the word on the street just don't jive with the scenes on the screens.