Could there be a connection between the performance degradation in CoD:BO2 vs XBox 360 and the fact that it uses 2xMSAA where no other Wii U game is using it (or is this part true)?
This is one of the things I was thinking. BO2 might be "sub HD" but at 60fps it's pushing significantly more pixels per second than a comparable 1280 x 720 30fps version would, plus there's the MSAA thing as you say. Along with explosions and other effects generating transparencies it would seem a good place to look for bandwidth issues.
Xenos' 8 ROPs, built in to the eDRAM, can do 4x MSAA in one cycle. With 32-bit color and 32-bit depth/stencil over 32 samples (or 64 depth-only) we're looking at a really really wide internal datapath of 2048-bits after the ROPs, or 128GB/s at 500MHz. With only 2xMSAA you would need 64GB/s. But this still leaves a huge range above the main RAM bandwidth of 17GB/s where you could still be bandwidth limited on fill.
The figure MS give for edram bandwidth is 256GB/s, as it can do a full-speed alpha blend. The PS3 does a surprisingly good job of keeping up, under the circumstances!
So given a scenario where the external interface on Wii U's eDRAM is just as fast on Xenos's, that gives you 32GB/s (it could be that like on Xenos Nintendo went with a synchronous core and RAM clock, which would give you a slightly higher 35.2GB/s). Now let's say that this time around you can texture from eDRAM. With Wii U's limited main RAM bandwidth this is desirable. In fact, it could be entirely possible that you must texture from eDRAM, which AFAICR was the case in Flipper/Hollyoowd AFAIK, which would make a big bandwidth hit on it unavoidable. In this case it's easy to see how Wii U's GPU eDRAM bandwidth could end up a limiter long before Xenos's, while still providing tremendously more than what main RAM alone does.
I was originally thinking of the 35.2 GB/s figure based on a 550 mHz synchronous clock (it was the starting point for my speculation), but then along with main memory bandwidth that'd give you ~48 GB/s aggregate bandwidth - higher than the PS3's total and double Trinity. The video memory bandwidth alone would be 50% higher than the PS3's vram and Trinity's unified ram. This doesn't fit with what were saw in BO2 (or the missing trees in Darksiders 2, which I suspect may also be bandwidth related).
This 35.2 GB/s bandwidth would also require Nintendo paying to engineer a high bandwidth internal bus, more advanced than the 360s unidirectional bus, and to do so for a system with rather unimpressive performance. That's what got me thinking about simple and cheap alternatives based on existing technology that would seem to fit better with what appears to be the WiiU's performance profile.
I hadn't actually considered the possibility of the WiiU only being able to texture out of edram though. That's very interesting, and would seem to demand more bandwidth from the edram than a 1600 mhz 64-bit bus could provide. In such a scenario, where the GPU couldn't see outside its pool of edram, perhaps the "master core" with its much bigger cache might be designed to feed the GPU with any data it requires from main memory? Such a feature might also make sense with the two "unganged" memory channels I was speculating about earlier actually, if the GPU were to normally be busy on the edram channel ...
The FXAA in games like Arkham City is probably done in GPGPU; I don't think they'd be able to work that into the already overburdened CPU and main RAM load. This suggests at least a possibility of being able to texture from eDRAM. I don't think they'd add it if it hurt framerates further on a game that is already compromised; in this case the GPU probably had enough spare fillrate and ALU power but there could be additional overhead introduced in the extra resolve plus texturing bandwidth from main RAM that'd be involved if it can't texture from eDRAM.
I think you're right and that it can texture from edram. In my speculation on the previous page I was assuming the edram could be read from and written to by the GPU (and perhaps the entire system) like main ram or video ram on a PC GPU. You'll know more about this than me, but I'd have thought that the FXAA would be done after rasterising and texturing is complete and so you wouldn't be contenting with other processes for bandwidth (so you'd be shader limited). If so it wouldn't hurt performance in the way MSAA could, and if it's true that the WiiU has 320+ shaders at 550 mHz then it should be faster at it than the 360 at that form of AA.
Due to it's lower resolution BO2 really needs some kind of sub-pixel AA though I think, so they probably didn't have the option.
If Wii U's GPU eDRAM were as capable as Xenos's then we should be seeing at least 2xMSAA on ports that didn't have it on XBox 360; it'd be free given that Wii U's GPU has more than 2x the eDRAM.
Not all engines support MSAA (like Unreal) but on many games and on Nintendo's first party games it should be a really nice way to improve IQ. Maybe the pixel counters of the IQ thread could draw up a list of MSAA WiiU games?