Panajev2001a said:
Gubbi, Intel and HP will deliver... Itanium II is no performance joke and it is only Intel's 2nd generation IA-64 implementation while the upcoming Prescott is 7th generation ( and like the third upgrade to the 7th generation IA-32 implementation )...
IPF has the same OS as IA-32 has... Windows XP ( yes the 64 bits version ) and that means uniform programming model ( WIN32 API, Direct X, etc... ) and tools ( MS VS.NET, Intel reference compiler, etc... )...
They haven't delivered yet. Merced was supposed to be faster than contemporary RISCs, it was about half the performance. Second generation IPF was supposed to greatly exceed the competition, it is (very) competitive on floating point, and just barely keeping up on integer performance.
And Microsoft still has to prove that they can produce an enterprise class OS. Furthermore, what's the point of a 64 bit chip when the API is 32 bit? I am fairly confident that the
serious applications for Win64 will use the NT native API and not WIN32.
Panajev2001a said:
IPF in 130 nm will come and soon after that we will ( well relatively, 2005 ) receive the dual core revision with 9 MB of cache per core ( 90 nm, 18 MB total L3, 1 Billion of Transistors... it depends if they save the 65 nm process for the new IPF made by the ex Alpha guys or if they use it for the x86 platform.. internal rivarly of the IA-64 and IA-32 teams will play a role ) and the 130 nm Itanium 2 should ship at the end of this year with up to 6 MB of L3 cache and 1.5 GHz and then will be upgraded to 9 MB of L3 cache in 2004...
But these are just shrinks of the existing core. All their competitors will shrink their designs as well, Power 4 is already dual core, Sparc will be.
Panajev2001a said:
Process wise if you compare the, as some have done, the .18 um Itanium 2 with the .18 um Pentium 4 ( Willamette ) it is quite clear Itanium 2 is not sucking THAT hard...
Apples and oranges. Granted, the I-2 is faster but it also has a vastly larger die (and produces more heat) and a more aggresive memory subsystem.
Panajev2001a said:
People stopped laughing after Merced... Itanium 2 showed IA-64 is no joke... come on, they are the same guys who kept making miracles and miracles for the x86 platform, and the IA-64 ISA is a more recent and better developed ISA...
Actually they are not the same. I-2 was mostly developed by a former HP design team (now Intel). Of course the process people are all Intel.
As for IA-64 being a better ISA, time will tell. It was conceived at a time where people thought out of order schedulers wouldn't scale. This was proven wrong over time (the Pentium 4 being the prime example).
It was based on the premise that scheduling is BAD. So HP came up with EPIC, which esentially is compressed VLIW. Instructions are bundled together 3 at a time, template bits are used to describe where the instructions are supposed to be scheduled (what type of exec unit the individual instructions go to). Now, this only works if the processor has a full complement of exec units to match the different bundle types. This is why Merced sucks and McKinley doesn't: Merced lacks execution units and the instructions in the bundles has to be scheduled (extending the length of the pipeline) to the available exec units, and stalls are frequent.
McKinley has the exec units to match two whole bundles, hence bundles are fetched, template bits decoded and the instructions handed down to the execution units, - much faster.
I can see two problems in the future for IPF.
1.) Imagine you have profiled different applications and found that your IPF processor lacks integer performance. The problem is that you can increase performance by making the processor crack another bundle per cycle, -and to do that fast you need a full set of execution units. Hence you get an extra floating point unit, extra branch units and an extra load/store unit (demanding another port in the cache, impairing cycle time) which you don't really need or want.
2.) Future implementations are likely to be multithreaded. The apparatus needed to schedule and track which instructions belong to which context is very similar to the register renaming and OOO scheduling you find in OOO cpus, which is why the Hyperthreading in P4 only took an extra 5-10% die area (similar numbers from the Alpha people regarding EV8). IPF, however, has none of this. You can of course build an OOO multithreaded implementation of IPF, but then the template bits in the bundles are basically discarded (ie. baggage). And your compiler is stuck with generating code for the bundle instruction format, filling each bundle with NOPs for every slot that it can't use, which in turn wastes I-cache and fetch/decode bandwidth.
Full predication is also an artifact from the late 80s/early 90s when eager execution of fully predicated instructions looked like the way to go. But the last 10 year's improvements on branch prediction has made it evident that it is not so
hot
Panajev2001a said:
Alpha is dead... I do not expect even the "would ahve been wonderful" EV7 to make a real dent in anyone's business...
Sadly, yes. Even though it appears to be sandbagged (low operating frequency), with a smaller die size and lower power consumption it still beats IPF on everything but some floating point applications (see recent SAP benchmarks ?
)
Cheers
Gubbi