I find their "diminishing returns" argument to be full of holes, BS marketing speak. There's a reason why CPUs hit a diminishing returns bottleneck, and it's because their workloads are primarily single threaded. Increasing ILP and clock for single threaded workloads does show diminishing returns.
There are some diminishing returns, particularly with no chip yet from any manufacturer doing more than one tri/clock (BTW, do you have any insight as to the reason?). In polygon throughput a 3870x2 slaughters a GTX 280, for example:
http://www.ixbt.com/video3/gt200-part2.shtml
Setup is definately an issue, as I discussed here:
http://forum.beyond3d.com/showpost.php?p=1177998&postcount=112
Eventually, monolithic chips will give problems feeding your shader units, too, because even if you're pixel shader limited, there is a minimum number of fragments that must be submitted between pipeline flushes to maintain ALU saturation. To a certain degree you can handle multiple renderstates simultaneously, but at some point it's just more efficient to have multiple GPU entities. As an example, GT200 can hold 30,720 fragments in flight since having much less can reduce efficiency. We know it can deal with VS, GS, Comput, and PS batches simultaneously, but how many different renderstates can it juggle?
There are issues with the way crossbars scale, too. If a load can be efficiently run on two chips, that's much better. When looking at the die shots, I think this is one of the reasons for GT200's non-ideal scaling compared to G92.
I agree that the analogy to CPUs is terrible, but it's not entirely BS.
The problem for ATI is that because they don't have the uber-big-chips, their marketing message will only be successful in the short-term as the 4870x2 goes up against the GT200. But as soon as NV shrinks it and tweaks it, and roles out the inevitable smaller cut-down versions, NV will have a story at the high end and middle of the market.
Obviously ATI already took this into account when determining their sweet spot. GT200 won't shrink
that much during its lifetime. ATI can always scale up if it makes sense, like R300->R420.
The question is whether NVidia will ever be able to cool and power a pair of GT200's in a dual slot case, and whether people will care.
As for cut down versions, we already have a pretty good idea of what to expect with G92 and G94. In fact, I think they've already stated that they're going to continue G92. There aren't any features or perf/mm2 improvements in GT200 that warrant a cut-down derivative in addition to G92.
Scarier for NVidia is how fast a 128-bit RV730 could be...