Yes. Interesting data coming out of the PRACE project in europe on % of peak and time to solution for various supercomputer configurations/languages:
http://www.prace-project.eu/documents/02_wp8prototypes_hh.pdf.
Efficiency differentials between x86 stacks and GPU/CELL stacks are fairly striking: x86 upwards of 6x higher %peak vs GPU and 80x vs cell. It takes a LOT of flops and a LOT of power efficiency to overcome numbers like that. According to their data, 2S nehalem outperformed C1060 in most cases with a much quicker time to solution.