Fully consistent emulation heavily depends on a massive advantage in straight-line speed over the architecture being emulated.
Matching or exceeding throughput is also a requirement, but that has proven to be much more easily done with the increase of transistor budgets.
There's no avoiding that Xenon and Cell were physically clocked at 3.2 GHz, and the serial component of the workload can't be consistently emulated at speed. If the parallelism isn't there, or the older architecture didn't fall down on its face, there's no getting past the physical reality that the new chip takes longer to take each step.
That's without taking into account the that at least some emulated steps will have to be mapped to multiple clocks on the running architecture.
To get closer, the hardware design has to shift to match the architecture it is emulating. The best example of emulating a different architecture is the the hardware cracking of x86 instructions into internal instructions, which pushes the emulation steps into the pipeline and forces most of the activity to happen at a clock cycle granularity again.
It still doesn't cover all scenarios if the clock is slower, and in this case it is much slower.
Matching or exceeding throughput is also a requirement, but that has proven to be much more easily done with the increase of transistor budgets.
There's no avoiding that Xenon and Cell were physically clocked at 3.2 GHz, and the serial component of the workload can't be consistently emulated at speed. If the parallelism isn't there, or the older architecture didn't fall down on its face, there's no getting past the physical reality that the new chip takes longer to take each step.
That's without taking into account the that at least some emulated steps will have to be mapped to multiple clocks on the running architecture.
To get closer, the hardware design has to shift to match the architecture it is emulating. The best example of emulating a different architecture is the the hardware cracking of x86 instructions into internal instructions, which pushes the emulation steps into the pipeline and forces most of the activity to happen at a clock cycle granularity again.
It still doesn't cover all scenarios if the clock is slower, and in this case it is much slower.