The parallelism in EPIC is explicitly pointed out by the code through the template and stop bits, and by how the ISA defines valid instruction packets as not being rife with dependences.
Implicit parallelism is derived by how x86 chips analyse the instructions they load and check for dependences that IA-64 would have spelled out.
Dependencies aren't spelled out by IA-64, instruction bundles contains instructions that explicitly have no dependencies (and thus can be issued in one cycle without any checking).. However IPF still uses a scoreboard to track register dependencies (and you cannot do VLIW like a=b, b=a swaps inside a bundle).
Most of what seemed like good ideas when IPF was conceived are millstones around its neck today:
1. The rotating integer register file adds an adder in the critical register access path, - low latency register access is imperative to IPF, the result is a low operating frequency (and lots of power used).
2. The instruction templates, while allowing for simple issue logic, dictates a plethora of execution units, with a resulting *massive* result forwarding mux, the result is a low operating frequency (and lots of power used).
3. The ALAT allows for speculative loads, but the explicit clean up needed means that it is rarely used, meanwhile Intel x86 from Core 2 and onwards has had a reordering load/store unit, where speculative loads under outstanding stores are supported, which if effect means that every load is the equivalent of an ALAT load.
4. Predication of instructions to avoid branches. later findings showed that conditional moves cover 90% of cases where predication makes sense. The remaning 10% is eroded by ever improving branch predictors. Branch predictors are fucking awesome because they break data dependencies (ie, apparent data-dependenct latency is obliterated because branches are handled are at the front of the pipeline, unlike write-back which is at the very end of the pipeline). And in this day and age where almost all CPUs are bound by power, eager execution of nested if-then-else constructs, using exponential amounts of energy compared to work, is plainly just a bad idea.
5. The current implementations are in-order with the high sensitivity to cache access latency that implies. The cache system of current IPFs is fantastic, multiple superfast accesses, but the consequence is lots of power spent and a lower operating frequency.
The only really useful thing in IPF is that it packs non-power-of-two sized instructions into its instruction bundles, -and allowing for variable sized instructions (ie. 64bit immidiates).
I'd would like to see a high speed OOO IPF implementation. The rotating register file adder would be renamed away, with a speculating load/store unit, the ALAT could be completely ignored, and the higher latency tolerance of an OOO execution engine would allow more slack in cache access, lowering the amount of power spent there, as well as allowing for higher operating frequencies.
IPF/EPIC is an architecture that shook the world because it killed MIPS, Alpha and PA-RISC purely by politics and speculation long before any implementation existed. My bet is that Intel will dump it (on HP) within 2-3 years.
Cheers
Last edited by a moderator: