I see this comment thrown around a lot and superficially it seems to hold water... but in reality, I wonder whether people have the facts/numbers to actually back this up (particularly from the hardware point of view), or whether people are just making a lot of assumptions. I tend to give hardware designers the benefit of the doubt with respect to making good decisions, but hey I'm no hardware expert and maybe the people making these comments are
Still, if that's the case, I'd be interested in seeing the facts/logic backing up the assertion rather than more vacuous statements.
Charlie of Semiaccurate fame claims he spoke with an Atom engineer who stated that core was 15-20% larger because it was x86.
The relatively contemporaneous Intel P5 core and the Alpha EV4 showed a 3.1M to 1.68M transistor count disparity.
There are some notable design differences, but also some overlap in overal specification.
There will probably never be a completely apples to apples comparison because manufacturers have different design targets and different circumstances.
It should be noted that Core2 was a notable x86 milestone in that it went 4-wide.
RISC chips that wide had been around since the mid-1990s.
The decoder block in Nehalem is one of the largest partitions in the core.
AMD's predecoder for K8 has a predecode block fo 16 parallel predecoders--before it gets to the actual decoders.
So long as there have been performance RISC designs with a concerted development effort, Intel desktop x86s have only ever approached parity with a process lead.
There are a range of flags and processor status registers that are arbitrarily set by various instructions, so many of those are renamed.
The load/store pipeline is more complex, the number of addressing modes is greater, and pipelines tend to have a few extra stages because of decoding.
The instruction caches for a given level of capacity are larger for performant x86 chips because they have predecode information in them. Some designers trade-off size by reducing the error-correcting capability of the L1 Icache versus the data side.
The cost is non-zero, and the multicore era is preventing the usual "bloat a core until x86 doesn't matter" process.
This is somewhat mitigated by the increasing dominance of L3 and non-core logic, as the proportions for this are relatively ISA-agnostic.
This is a piece from 2000, but there are numbers and reasons stated.
http://www.realworldtech.com/page.cfm?ArticleID=RWT021300000000&p=1
There is a comparison of an Athlon and Alpha core with a near 2x transistor disparity.
We may have another data point once the Cortex A9 chips come back in silicon form.