I get what you are saying, and I don't disagree at all. But given Maxwell's efficiency, and assuming that efficiency is maintained as performance scales up with bigger chips, there is an awful lot of headroom left to work with even within the die size / transistor density constraints of 28nm. Case in point, GM107 is obviously headroom capped purposefully by Nvidia. They could have easily slapped it with 7ghz vram and shipped out 1275mhz boost clocked chips, attaining or surpassing GTX 660 performance. (And I believe they may still do that in a future 800 series rebadge of GM107).
Roughly around +50% for the top dog isn't exactly an awful lot of headroom in my book, but that's just me.
[strike]On another note considering that table above it's as close as it can be for GM200 and complete nonsense for GK110. A GTX780Ti gives 5.04 TFLOPs FP32, which at 250W TDP is exactly 20.16 FLOPs/W (and no not with the turbo frequency). If you'd go up to the hypothetical 25 FLOPs/W for GM200 at again a 250W TDP you get 6.25 TFLOPs or else 24% more TFLOPs[/strike] ***edit those numbers are for correlation. Add another 35% higher efficiency for Maxwell and you're at +64% best case scenario. Now think of an average in existing applications and how much custom vendor SKUs really could push the power envelope beyond that, also considering that there's a good chance that transistor density has increased in GM200 vs. GK110.