Oh come on. Just compare the perf/mm² of G92 and G94. Compare the ASPs/mm² after taking the relative volumes of the different SKUs into consideration. Look at quite low performance penalties of reducing the number of clusters (I can't remember which site did that, if you really want to I can find the link).
I fail to see the relevance in any of that. The chip performs well and is sold at reasonable prices.
On top of that, the chip is relatively low on power-consumption (so basically that nullifies all reasons why one would like to have a smaller die and smaller manufacturing process):
http://www.anandtech.com/video/showdoc.aspx?i=3209&p=12
As you can see, the 8800GT actually uses less power than the 3870, while delivering considerably better performance. And although the 8800GTS does use a bit more power, its performance is in line with the power consumption.
Really, these chips beat AMD on anything, except perhaps the diesize, but that's because AMD was pushed to move to 55 nm early. It is quite obvious that AMD has to stretch the silicon to the max in order to get competitive performance (and fail at that). nVidia outperforms AMD in absolute performance and performance-per-watt with just 65 nm. So yes, the die is larger... and? They don't have to push the silicon too hard, so they probably get better yields than AMD anyway, which would explain why their profit margins are still healthier than AMD's are.
I agree they could have done even better... but then again, that is usually the case... At some point you just have to get a product out of the door aswell, in the real world.
But I think any kind of comparison to AMD is just preposterous. AMD loses everywhere... They don't make healthy profits, their chips aren't energy-efficient despite the 55 nm advantage, their cards aren't that attractive in price... and worst of all, they still can't outperform my aging 8800GTS with any single GPU they have. 55 nm just made their chips 'bearable' instead of the complete powerhog that the 2900 was. Other than that, the chips are still as unimpressive, and since most people can easily afford an 8800GT or better, they aren't an attractive option anyway.
Again, who cares about diesize? I care about the advantages of a smaller diesize, if they actually exist... In this case they don't.
I think your argument is similar as arguing that Intel shouldn't have put out 4 MB L2-cache Core2 Duo's since the 2 MB models are nearly as fast, and would be cheaper to produce, because the die could be considerably smaller. Apparently the difference is not interesting when you have a solid design and good yields on your production line. You get diminishing returns on faster models anyway.