I still think they are beating a dead horse with their cluster/CMT approach, the premise were wrong, big cores are getting nowhere.
What do you mean by the bold part? I think the problem with Piledriver is mostly single-thread performance*. Intel seems doing just fine with cores bigger than the ones in Piledriver. In my opinion, the problem with CMT is that, for a fixed die area, fat cores with SMT are, for most workloads, better than smaller cores with CMT.
- Single-thread high-IPC workloads can use all the core resources with the SMT approach, but they can't with the CMT approach.
- In multi-threaded workloads, its going to be a toss up: CMT requires more die space but provide an higher increase in throughput than SMT. Some workloads currently suffer on Piledriver because only 4 decoders are available per module, but this is going to change with Steamroller. Also the instruction cache is quite small, but it is going to be larger in Steamroller.
Once shared resources are split (instruction decoders) or enlarged (instruction cache), one has to wonder whether the die savings of CMT versus separate cores, even for multi-thread workloads, are worth the trouble. Especially since SMT is much cheaper in terms of die area and still provides decent improvements for many workloads. Maybe AMD will eventually pull off a "Pentium M" and go back to a K10-derived architecture in the future.
*There are other issues with Piledriver, especially the power consumption, but I don't think they are in any way related to CMT.