I think it's because most are looking at 40-46" TVs from 6-8'. If you can't make out individual pixels, will squeezing more into the same space make a noticeably difference?
In terms of output resolution, you quickly start running into extreme diminishing returns. Viewing a heavily-supersampled 720p image on my 37" TV from 10' seems to be well within my "I don't particularly care anymore" range.
However, I said "heavily supersampled" for a reason. Output resolution will restrict how sharp your image can be... but by itself it says nothing about whether or not you're doing a good job dealing with aliasing. And high-frequency high-amplitude visual components can need a ton of sampling in order to look stable.
Before I made this post, I decided to run a little experiment. What's a good example of a high-frequency high-amplitude visual component from this gen? Halo 3's normal-mapped specular highlights, obviously. Now, Halo 3 samples at 640p. If what some people say is true, and all that matters is that you can't easily distinguish side-by-side pixels, it's almost certainly the case that Halo 3 should look totally stable and fine on my 37" TV if I use a massive viewing distance like 15', right? I went to a place in game with normal-mapped Forerunner surfaces and started looking left and right with the flashlight on to make them shimmer. I was never able to measure how far back I'd have to be to make the shimmering stop, because the aliasing was still blatantly obvious when I ran into the farthest part of my building from which the TV is still visible. I measured that viewing distance to be ~53'.
Let me reiterate that: the aliasing from Halo 3's specular reflections is easily visible when viewed on a 37" TV from a distance of
AT LEAST FIFTY-THREE FEET.
So yeah. 720p versus 1080p probably doesn't matter all
that much in terms of visual clarity for console gamers who sit a substantial distance from their TVs (although with how often people sit less than two yards from 50" screens...). But in terms of sampling sufficiently to produce a stable image? A 720p backbuffer versus a 1080p backbuffer can make a very visible difference, even from a large viewing distance. Your eyes can pick out inaccurate garbage crawling and shimmering, even if they can't easily distinguish individual pixels.