Don't forget that while you may save some bucks on the smaller GPUs in mGPU card you're still burning much more on double memory size and more complex PCB/power/cooling solution.
So what? X2 cards of any sort sell for huge amounts of money, with ludicrous margins to spread all around. Unless your single chip performance just sucks and then you have bigger problems. Nobody needs to optimize high-end cards for cost. Why do you think NV fully specifies the high-end, including the cooling? Because if folks in Taiwan try to cut costs, they will create problems further down the road.
If you look at the strategy for high-end cards, it doesn't involve optimizing for cost. Cooling >130W is quite expensive, routing that much power involves many many layered PCB, shit tons of caps, VRMs, etc.
Generally "one big GPU" cards are simply more effective than any mGPU solution that we saw up until today. And that's true not only in cost of producing cards, but in performance and features too.
RV770x2 vs GT200 is like an opposite of RV670x2 vs G80 -- the second scenario was a bit unfair to mGPU cards because RV670 was kinda bad. And now we have a scenario which is in my opinion unfair to "big single GPU" cards because now GT200 is kinda bad.
You totally missed the big picture. The super high-end of the market that buys GTX 280 or RV770x2, is minuscule by volume and has almost 0 impact on overall profits. It's the halo effect that's useful. GPU vendors make most of their money on pro GPUs and GPUs in the $100-250 range.
So that high-end card, it doesn't actually have high volumes until it's been shrunk to the next process and becomes mid-range. And in semiconductors, volume is king because it dictates your NRE.
Yes a single GPU may be more efficient, but only in a narrow and uninteresting sense...and I'm not even convinced that single GPUs are necessarily more efficient (more on that later).
I'm quite sure that nothing will bring mGPU solutions to the level of single GPU solutions in terms of flexibility and efficiency ever.
Flexibility is tricky, since SLI/XF are software visible hacks that require changing your app.
How do you define efficiency?
Frankly, if you look at good CPU architectures, it's quite easy to see that DP servers that is pretty much exactly as efficient as a single socket server for many workloads and hence are the sweet spot for efficiency (e.g. 95% scaling).
GPU workloads are by definition trivially parallel, so it's quite easy to see how a dual chip approach would be just as efficient. Both from a performance and power/cost standpoint.
An interesting area for mGPU cards lies a bit higher than where AMD is putting it's RV670/770 GPUs -- let's say that you have a GPU with performance between middle and high-end class. There may be a window where you can make an mGPU card with two such GPUs which cannot be challenged with one single big GPU simply because you won't be able to make such GPU (due to technical limitations). Anything higher will be too complex to use in mGPU cards, anything lower will be beaten by a single big GPU boards.
This window is where AMD is with RV770x2 essentially, but only WRT performance levels, not die complexity.
Yes, that's interesting. But what else is interesting is having a much more highly optimized card to server the $100-250 market, where you can kickass AND make mad money because your die size is way smaller.
The biggest single advantage of a monolithic GPU is that using multiple GPUs for general purpose workloads is retarded, because the programming model (i.e. no coherency) sucks ass. NV has to produce large monolithic GPUs to make GPGPU interesting and get sufficient performance gains over a standard dual socket server.
Well, people tend to expect too much. But why isn't GT30x middle with GT200+ performance and DX10.1/11 support at $250 price point won't be a performance bomb anyway? After all that's exactly what's expected from AMD's RV870.
What's the problem with doing everything that AMD does PLUS doing a single big GPU AFTER you've done what AMD did? I don't see how's any kind of power usage may be a problem here.
It's a huge waste of money and engineers time. Next question?
DK