Itanium's code density is unusually bad.
The extension-fest x86 has gone through over the years has significantly eaten into the code density advantages it had over RISC ISAs, so other factors must intrude.
As noted, Intel has a vastly superior cache subsystem outside of the Icache, while AMD's is in some ways cringeworthy and ARM implementations almost as a rule are as cut-rate as can be done.
The biggest thing the leading ARM architectural licensees do besides customizing the cores is putting in a competent memory subsystem.
Being able to compensate for one architectural point by updating everything else around it has its benefits, and it's not like an 8-way non-shared L1 can't do better at times than whatever was attached to Bulldozer.
The extension-fest x86 has gone through over the years has significantly eaten into the code density advantages it had over RISC ISAs, so other factors must intrude.
As noted, Intel has a vastly superior cache subsystem outside of the Icache, while AMD's is in some ways cringeworthy and ARM implementations almost as a rule are as cut-rate as can be done.
The biggest thing the leading ARM architectural licensees do besides customizing the cores is putting in a competent memory subsystem.
Being able to compensate for one architectural point by updating everything else around it has its benefits, and it's not like an 8-way non-shared L1 can't do better at times than whatever was attached to Bulldozer.