I worked at Bing for a year at one point, you'd be surprised how low CPU utilization was even on the compute clusters, you were almost always constrained by how efficiently you could move the data, rather than any processing you might be doing.
Wouldn't going with a smaller number of fast cores to reach a given level of compute performance result in less data having to go over the network than using a larger number of slower cores? Or are you referring to moving data between disks and cores?