What's the mystery? It's performing exactly as it should.
The GTX 260 has 22% lower texturing and setup speed and only 2% higher math speed than G92b, which we know is similar in speed to the 4850. The only advantage it has is bandwidth and efficiency improvements, so considering that it's actually doing quite well. Theoretically, the GTX 280 is 5-39% faster than the GTX 260, and the benchmarks show that. The 39% only happens when shader limited (including texturing and math).
The problem is that GT200 has lower perf/mm2/clock than G92 by design, and it didn't hit nearly the same clock speed either. The design choice was probably made because NV didn't expect ATI to improve perf/mm2 (after all, they already picked the low hanging fruit with RV670), so they thought they could gamble a bit on making their architecture better for GPGPU with DP, more registers, more math, etc.
I expect GT200b to be 20% smaller and 20% faster (by correcting mistakes, not just the process), but that's going to be some time from now -- well after R700 completes ATI's leap to the front at all price points above $150.