I'm sorry, I'm not following at all here. Imagine a hypothetical architecture that has 2x the ALUs/TMUs/ROPs of 5870, while maintaining the exact same memory bandwidth. Would you consider it pointless to claim that the architecture is inefficient based on the (likely many!) games that scale very badly on it?
Assuming you have games that are capable of scaling decently, the hypothetical card will be faster, because bandwidth isn't a bottleneck 100% of the time. Some games will scale by quite a lot.
You can already see this effect in HD5770 in Metro 2033 which is faster than HD4890 despite having ~60% of the bandwidth.
This hypothetical SKU (let's say it's the end of 2011, and we're talking about a $200 card), might be considered unbalanced on today's games. But that SKU will be considered good value, because newer games will probably be fine. It might have even less bandwidth than HD5870. Of course, by then, the architecture may have enjoyed other tweaks that make it more bandwidth-efficient. (Cypress appears to have some such tweaks.)
There are some suggestions that Cypress was meant to be 384-bit, with presumably 48 ROPs. Instead this 256-bit chip is stuck in no-man's-land with considerably less bandwidth, despite having a die area that could accommodate a 384-bit bus. Some rumours suggest the ALUs/TMUs in that chip would have been the same, other rumours suggest it would have been 1920.
Taking the best case: 1920, 48 ROPs, 384-bits, would seem to imply a better-balanced chip, i.e. ROPs/bandwidth scaled by 50% over Cypress with 20% more math/texturing. One might argue that existing games are most sensitive to fillrate, and so that chip's bias towards fillrate/bandwidth makes for a more efficient SKU.
But, using games that are scaling poorly anyway doesn't make a good basis for saying the architecture is or isn't efficient. D3D10's more finely-grained state management and the multi-threaded CPU side of state construction in D3D11 both demonstrate that graphics performance scaling is a prisoner of more than just the graphics card.
I'm not saying that Cypress is efficient - tessellation appears questionable for example. I am saying that scaling comparisons with HD4890 that exclude bandwidth are ill-judged and the poor scaling of lots of games makes that comparison even worse. Using them to generalise about efficiency going forwards is pointless.
See HD5830 for a great example of junk. HD4890 has vast amounts of bandwidth that it generally uses fairly poorly. R600 is a disaster zone.
So, to answer your question: well, such a chip isn't technically possible right now on the current ATI architecture, it would need 28nm (maybe it would squeeze in below 600mm²?). A chip of that kind of specification
is very likely to turn up eventually (compare HD4850 and HD2900XT). Regardless, such a chip would be wasteful for "today's games" assuming no bandwidth efficiency gains (since bandwidth is increasing so slowly - and even then that's assuming games that scale well). It is possible to scale up an architecture too far and fall victim to architecture-intrinsic limitations, e.g. tessellation in Cypress may be showing up the rasterisers as unfit. Or maybe it's the ALU thread scheduling that's hit the end stops?
Chips of a given architecture can't scale linearly with unit count, that's Amdahl's law, as D3D isn't perfectly parallel. If people want to assert that Cypress is beyond the pale and is an inefficient configuration, then they have to show it failing to scale when another architecture (or at least a better configuration of the architecture at a similar die size) continues to scale, on the same games.
Your hypothetical chip seems likely to be crap if it were to appear now - too unbalanced (the opposite of HD2900XT). R580 had a mildly similar problem, appearing "over specified" for its time. Eventually games catch up with these increased ratios of ALU:GB/s and TEX:GB/s and fillrate:GB/s, but by then the chip turns out to have too little bandwidth for anything but budget gaming. The architecture will have been revised.
GTX480 could be a chance to see where Cypress fails, if it shows good scaling in places that Cypress is rubbish, on existing games. Not long to go...
Jawed