4x the resolution requires 4x the processing power, unless you're using something like an O(n^2) algorithm.
You're kinda talking twaddle now. A 1080p frame is ~2 million pixels. At 24 bits RGS, that's 6 megabytes a frame. 60 FPS would be 360 MBps bandwidth consumed for shifting through every pixel. A comparison between two 1080p60 framebuffers would be 720 MBps. PS3 has >45 GBs. Next gen will have more. A couple of GB/s from the total RAM pool is no great loss, and certainly nothing needing a whole extra memory system as you are suggesting. You are over simplifying the process.
Multitasking OSes for decades have been able to run concurrent, independent task on the same RAM and processors by shifting tasks on the fly. Video processing is no different. The current Kinect PC demos are doing exactly that, using the same processor to evaluate the Kinect data, and then run whatever tasks are happening concurrently. Gesture, Face recognition and Camera are eventually going to be modularized functions that will spin off to CE equipment. I believe it will be easier to program a console if these processes are separated and will result in cost saving for the Main CPU hardware if the processes are not part of general CPU pool processes.
Again, this isn't at all accurate. You'd either have a posterized image losing all the information that denotes objects, or you'd have to dither it, making it nigh impossible to do optical processing on. And it'd still look like crap. If you're using the video feed in game, you'll need the full colour image. Good optical recognition wants as little noise and as much information as possible. JPEG compressing a video stream is bad enough, let alone throwing away most of the image information! And it's uneccessary. Future ports will be able to cope with higher camera resolutions. Maybe they'll be limited to 720p. Regardless, that's all covered by the general IO choices of the console and don't need any special attention, unless you feel a brand new port needs to be designed specifically for high-speed cameras because the likes of USB3 aren't up to it. IF you look at Kinect outputs they provide depth Z axis as a number and using mostly Z input the Xbox creates posterized image that is converted to a wire frame model that is compared to templates.
No different to every other system out there. We don't break PCs up into a processing component with CPU and RAM for audio, another for video, another for physics, another for browser, etc. We take one pool of resources and use it dynamically. Yes they do, Video in PCs has it's own memory for a very good reason. You are taking a PC that's designed as a general purpose machine and is hardware and software upgradeable. A game console is not hardware upgradeable. Again, you're looking at a few GB/s maximum. In systems with likely well in excess of 50GB/s, that's not a problem that needs special attention.
An 8x increase in needs will match a natural 8x increase in performance that comes with the next generation of console hardware. The impact will be no more than the current requirements are on this gen. XB360 wasn't designed with a memory and processing subsystem for a future 3D camera. Instead the Kinect works by using a fraction of the system's available resource pool, with no song-or-dance complications about it crippling the running of other applications because it's getting in the way of their memory accesses. See above and the process used has issues that will limit the uses both from a overhead point of view and accuracy.
Only if the features are specialst. Everything you say can fit into the possible processing choices we've outlined before in this discussion. Only if you are doing something extraordinary that conventional processors can't cope with (in the same way a 2005 tri-core PPC and GPU can't cope with 2010 cutting-edge 3D vision tracking) would you need to consider extraordinary CPU soltuions, but that would be cost prohibitive meaning you'd drop that feature and go for a lesser one that works within budget-constrained hardware choices. Wii didn't include a gyro when it launched, even though that'd have provided the full features Nintendo wanted, because the cost was too high. They reduced features to match a price target. Next-gen consoles will have a CPU and GPU made to a price, and the interface options will be built around those knowing that they'll consume a small fraction of resources.