Not sure about the bandwidth that might be required to feed the CPU and audio.
CPU would vary. Instruction caching is quite effective in modern CPU designs (cue joke about the ancient 1990s-era nature of wuucpu...
), but data consumption can get from zero to the sky's the limit, depending on workload. Audio won't be much, granted, but would add a lot of scattered accesses causing additional DRAM page miss penalties.
Video scanout at 1080p / 60 Hz should take ~0.5 GB/s
Don't forget the framebuffer resolve to main memory... If running at 60fps (yeah, right!
) we've now blown roughly a gig per second of precious, precious wuu RAM bandwidth just to put an image up on the TV.
That's assuming you can't keep front buffer in eDRAM the whole time of course - and that you have room in eDRAM to keep it there - in which case scanout would be deducted from eDRAM bandwidth budget instead of main RAM...or that wuugpu doesn't use some kind of YUV compression trick for the front buffer like gamecube/wii did. It's hard to estimate these kind of things when you don't have any friggin' clue how the hardware actually works, or what its specific capabilities are! Nintendo secrecy is damn frustrating, wouldn't you say?
Considering the limited speed of the mass storage device and of the disc drive I expect the bandwidth required for I/O to be quite negligible.
Yeah, except for the tablet of course... 800*640 or whatever the hell weirdo rez it uses, times however many FPS it updates at, times bits per pixel (probably 24, packed to save space/BW...unless the software has use for Z and/or destination alpha in which case it would be more but I dunno if that'd ever be the case; most games will probably avoid rendering heavy stuff for the tablet as much as possible.)
According to the calculation above (assuming I'm not missing something obvious), even with only half the bandwidth available for texturing (6.4 GB/s), a 60 fps game at 720p can do ~31 texture reads per pixel shaded
Well we haven't accounted for memory efficiency, which for DRAM in PCs is quite low. Even straight linear access benchmarks typically only cracks about 50% of theoretical performance, if that much, and in a unified memory architecture you'll have tons of accesses going on all the time for every device in the system. Your 50% estimate up there probably isn't even worth the paper it's written on.
Hard to say how well just a GPU and nothing else would utilize its framebuffer; I ran an OpenCL memory test program on my old-ish Radeon 6970s and efficiency was LOW as hell. But grud knows how effective OpenCL really is running on my boards or how well optimized that program was so it might not mean much.