Because each eye sees completely different images (no shared pixels) the X + Y information is also added. That is why all 1280 columns are unique.
Let's be clear on what resolution is, and hopefully you'll see this is wrong. Our displays present an image as an organised array of light values. The scene is mathematicallly modelled in 3D and then transformed, projected and resolved, sampling discrete points of the scene to get the light value. At a resolution of 300x200, there will be 60,000 samples. If you render the same scene now at 600x200, the new samples lie
in between the others. You are harvesting information from the
same scene, the
same projection. You could also render 300x200 pixels, then move the camera a half pixel to the right and render 300x200 pixels, that information corresponding to the even columns of the 600x200 image, and then interweave the two sample sets to generate an identical 600x200 image, again both odd and even columns coming from the same scene, same projection.
However, if you sample half the pixels from one camera, and the other pixels from a different camera looking at a different scene, you have broken the ordered grid deconstruction of the image. As an extreme case to prove the point, consider rendering 300x200 of a camera looking down a street in GTA4, and then 300x200 pixels inside a bar in GTA4, and then interleave the columns of those two to make one 600x200 image. It's not going to be an actual 600x200 image! Now hypothetically move the camera nearer, so the one down the street moves towards the bar entrance, while the one in the bar moves nearer the door. Combine two 300x200 images from these cameras and you still don't get a single, uniform sampling of the same scene. Now move the cameras to within one virtual metre...still not a single image. Now place them next to each other, 3" apart, looking at slightly different directions. The result remains two discrete scenes sampled from two different points creating two separate sample sets, that cannot be combined to create a single higher resolution image.
Thus 3D cannot be considered the same resolution as 2x the horizontal res of each eye image. 3D will be
perceived differently, meaning comparisons of resolution between 2D and 3D won't be the same, and 1280x720 may look sharper or less sharp than 2x 1280x720 in 3D (and it would appear from what Arwin posted that my original guess is back-to-front, and 3D will increase sensitivity to resolution, which actually makes more sense. The whole is worth more than the sum of the parts and all that), but in actual metrics, 3D resolutions cannot be considered the same as the doubled-up 2D resolution of the same aggregate dimensions.