At least in SGEMM, Larrabee at +600mm2 and over stock clock barely edged out RV770...
I'm not sure what other numbers are available for comparison.
How can LRB have a 'stock' clock, when there are no SKUs?
The SGEMM demo had Larrabee overclocked to break a TFLOP, which I would assume meant breaking standard TDP. Perhaps they weren't able to tweak voltage, so that overclock was the best that could be done without that knob.
At any rate, whatever issues Larrabee had limited it to less than half of desired peak, and this assumes more modest clocks than the early 2.5 GHz top end for 24 cores, which would have left an even wider gap with 32 on die.
You're missing my point. I think they probably would have traded off density for better power consumption.
I also don't think it necessarily means breaking TDP. Depends on your cooling. If you cool well, you lower Tj and your transistors run cooler, consume less power, etc. etc.
How much earlier?
RV770 was denser from a transistor standpoint and it was at 55nm. Nvidia's GPUs had a density deficit for that generation compared to AMD, even if process normalized.
Penryn hit markets in late 2007. If LRB hit markets in late '08 (which would have taken a miracle) it probably would have been much better. They would have had a density advantage, a performance advantage, a big power advantage (HKMG + normal process shrink gains) and perhaps, just perhaps, those two advantages could make up for other issues.
However, that's clearly impossible scheduling wise. A more interesting questions is: What if they skipped 45nm and went straight to 32nm at the end of 2010? Again, while their competition is still on 40nm. They'd have a half-node density advantage, mature yields, a substantial transistor performance advantage, etc.
The point is that Intel should have figured out how to leverage their process technology advantages more heavily (to make up for software disadvantages), rather than falling victim to an overly aggressive schedule and coming to market at parity.
David