Intel and AMD, nVIDIA and Ati

8ender

Newcomer
Hmm, just reading over the excellent posts from the community over the GeforceFX being canned and it got me thinking..

While I only have laymans knowledge of Chip Architectures (this forums helping a lot though, thanks!) I've noticed that nVidias release of the GeforceFX is strangely similar to Intels release of the P4.

Correct me if I'm wrong, as most of this information comes from memory and not research, but heres a few similarities I've noticed

- AMD's Athlon was kicking Intels ass in performance
- Intel released the P4 to compete
- The P4 was released rather late (i think?)
- The P4 was terribly inefficient and poorly engineered
- AMD chip, despite begin clocked lower, often beat the P4 in situations where the P4 was clocked significantly higher
- Intel was caught with its pants down

I'm not sure if that was entirely accurate but doesnt Intels P4 seem similar to the FX in terms of "less than spectacularness"?

Maybe companies need to learn not to get lazy once they're on top. Both the P4 and FX seem to me like products designed simply to beat the companies previous product.

Another thing to consider, could nVidia get the lead back like Intel did when it reengineered the P4?
 
Hmm, Intel never re-engineered the P4. It was designed from the start to scale very well in clockspeed. The only changes I have seen are the increase in L2 cache and later on the implementation of HyperThreading. These were not things that were added spuriously but worked on from the beginning.

AMD have also increased the L2 cache recently and added an increased TLB to the Palomino from the Thunderbird. They also battled with power dissipation and have added another metal layer which has allowed the Thoroughbred core to clock higher also.

Intel was caught with its pants down in the P3 era, where the Athlon scaled well beyond the max of 1.13GHz (Intel's recalled CPU), but they quickly have gotten their act together.

So personally I do not see the similarities at all.
 
And look at things NOW, and in the coming months... The pentium 4 is simply dominating the Athlon. 800 mhz FSB in 2 months for the pentium 4s. AMD simply can not compete with that, even with a boost to 200 FSB on the Bartons.
 
surfhurleydude Posted: Tue Feb 11, 2003 2:41 am Post subject:

--------------------------------------------------------------------------------

And look at things NOW, and in the coming months... The pentium 4 is simply dominating the Athlon. 800 mhz FSB in 2 months for the pentium 4s. AMD simply can not compete with that, even with a boost to 200 FSB on the Bartons.

:rolleyes: :rolleyes: :rolleyes: :rolleyes: :rolleyes:
 
surfhurleydude said:
And look at things NOW, and in the coming months... The pentium 4 is simply dominating the Athlon. 800 mhz FSB in 2 months for the pentium 4s. AMD simply can not compete with that, even with a boost to 200 FSB on the Bartons.
Ok, this kind of crap is irritating.

If you are going to talk about "Effective" speed, then use it everywhere:
IE, P4 bus is 533, will be 800. AMD bus is gonna be 400.
If you are gonna talk about actual clock speed, then use it everywhere:
P4 bus is 133 (quad pumped) and is gonna be 200 (quad pumped). Amd is gonna be 200 (dual data rate).

Mix and matching aint fair.
 
The P4 was released more or less on time. And the design was very pointedly not a response to Athlon. We can deduce this because 1) the major decisions behind the (for lack of a better term) Netburst core would have been made in the 1996/97 timeframe, well before Intel had any inkling how competitive K7 would be; 2) the P4 was a singularly bad design for the purposes of rescuing the PIII from the superior Athlon circa 2001.

The Netburst core is a brilliant piece of engineering, IMO, but its design is optimized for .13 and even more so .09 and .07 process nodes, and it sacrifices performance on poorly compiled code in order to achieve speedups when code is compiled with a modern compiler. Plus it was designed to be supported by very high bandwidth DRAM subsystem, which at launch time meant hideously expensive dual PC800.

In other words, the Netburst core was almost certainly originally intended to have a life cycle reminiscent of the P6 core. The first incarnation of the P6 core was the Pentium Pro, launched in late 1995 on the .50 process. The PPro was big, hot, quite expensive to manufacture (512k of core-speed off-die SRAM for L2), and slower than a plain old Pentium when running the prevailing apps of the day. If you recompiled for it with a modern compiler, OTOH, it screamed: the PPro gave the SPECint title (briefly) to an x86 chip for the first time in history. The PPro was intended only for workstations and perhaps enthusiast machines, and it stayed in that niche nicely.

1.5 years later, in early 1997, the P6 core had its mainstream incarnation, the PII, which launched in .28 and quickly moved to .25. In the smaller process, the P6 core was cheaper and cooler, and with the move to .18 the L2 cache could be put on-die for lower cost and better performance. By the time P2 launched, optimized 32-bit code was mainstream, so it had a large performance lead over the Pentium, which only increased with newer software. Because the P6 core was optimized for the .25 and .18 process nodes, Intel got very good clock scaling out of the P2 and PIII, and the P6 core is generally regarded as among the most successful CPU core designs in history.

So why didn't Intel just repeat the plan with the Netburst core? This is where the Athlon comes in. Obviously Intel knew the Netburst core would make a lousy consumer CPU in its .18 implementation. But the MHz race of 1999 and 2000 forced Intel to burn all the headroom they had in the .18 PIII, which essentially only got them to 866 or 933 MHz in volume. With nothing to compete with Athlon's ever climbing clock rates, Intel decided to thrust the P4 into the consumer mainstream in its Willamette implementation, before it was really ready. That made a lot of people think the P4 is a bad or inelegant design, just as people would have said the same about the PPro if Intel had tried to pass that (or some cut-down .5 or .35 P6 chip) off as a mainstream CPU. In fact the Netburst core is a very elegant design, but it's only now beginning to be able to stretch its legs.
 
Dave H said:
The P4 was released more or less on time. And the design was very pointedly not a response to Athlon. We can deduce this because 1) the major decisions behind the (for lack of a better term) Netburst core would have been made in the 1996/97 timeframe, well before Intel had any inkling how competitive K7 would be; 2) the P4 was a singularly bad design for the purposes of rescuing the PIII from the superior Athlon circa 2001.

The Netburst core is a brilliant piece of engineering, IMO, but its design is optimized for .13 and even more so .09 and .07 process nodes, and it sacrifices performance on poorly compiled code in order to achieve speedups when code is compiled with a modern compiler. Plus it was designed to be supported by very high bandwidth DRAM subsystem, which at launch time meant hideously expensive dual PC800.

In other words, the Netburst core was almost certainly originally intended to have a life cycle reminiscent of the P6 core. The first incarnation of the P6 core was the Pentium Pro, launched in late 1995 on the .50 process. The PPro was big, hot, quite expensive to manufacture (512k of core-speed off-die SRAM for L2), and slower than a plain old Pentium when running the prevailing apps of the day. If you recompiled for it with a modern compiler, OTOH, it screamed: the PPro gave the SPECint title (briefly) to an x86 chip for the first time in history. The PPro was intended only for workstations and perhaps enthusiast machines, and it stayed in that niche nicely.

1.5 years later, in early 1997, the P6 core had its mainstream incarnation, the PII, which launched in .28 and quickly moved to .25. In the smaller process, the P6 core was cheaper and cooler, and with the move to .18 the L2 cache could be put on-die for lower cost and better performance. By the time P2 launched, optimized 32-bit code was mainstream, so it had a large performance lead over the Pentium, which only increased with newer software. Because the P6 core was optimized for the .25 and .18 process nodes, Intel got very good clock scaling out of the P2 and PIII, and the P6 core is generally regarded as among the most successful CPU core designs in history.

So why didn't Intel just repeat the plan with the Netburst core? This is where the Athlon comes in. Obviously Intel knew the Netburst core would make a lousy consumer CPU in its .18 implementation. But the MHz race of 1999 and 2000 forced Intel to burn all the headroom they had in the .18 PIII, which essentially only got them to 866 or 933 MHz in volume. With nothing to compete with Athlon's ever climbing clock rates, Intel decided to thrust the P4 into the consumer mainstream in its Willamette implementation, before it was really ready. That made a lot of people think the P4 is a bad or inelegant design, just as people would have said the same about the PPro if Intel had tried to pass that (or some cut-down .5 or .35 P6 chip) off as a mainstream CPU. In fact the Netburst core is a very elegant design, but it's only now beginning to be able to stretch its legs.

My sentiments exactly, the P4 seems to me to have been designed not for the immediate speed crown but as a platform for an easy buck for intel. I mean, look at how much intel made on the P6 core. They rebadged that architechture under three seperate names.

Perhaps I was just grasping at straws but I think the similarities are there.
 
How could the P4 be a bad design then, but a good one now? Especially since nothing has changed (except more cache and faster)

It was designed to clock faster, and in order to do that, they had it do "less" with each clock and deepened the pipeline. Sure, the Athlon series can do more work per clock, but it doesn't matter much because the Athlon can't meet the same clock speeds.

Each is a design tradeoff that has ended up with me paying less for my processors. :)
 
RussSchultz said:
How could the P4 be a bad design then, but a good one now? Especially since nothing has changed (except more cache and faster)

I think things have changed, especially since the 130 nm process came. I bet that if Intel manufactured the Athlon XP on their process, they'd get much higher clock speeds.

Even with the P4's clock advantage, Athlon and Athlon XP were faster for quite a while after its launch. Intel just took off with clock speed once they got their new process working. This will be amplified even more when Intel gets their 90 nm process going.

Even including differences in clock speed, I think Athlon XP architecture delivers higher performance if all things are equal. However, Intel's manufacturing process is quite superior, so they have an even higher clock speed advantage, and this will only get stronger since Intel has the funds to spend billions on researching better fab processes.

AMD is in trouble for this reason. I think the only way out is if they get some monopoly charges against Intel, and force Intel to decouple their manufacturing and chip design. Then everyone (including ATI/NVidia) can purchase chips from Intel's manufacturing process.
 
RussSchultz said:
How could the P4 be a bad design then, but a good one now? Especially since nothing has changed (except more cache and faster)
The main reason is that interconnect and transistor switching speed benefit far differently from process shrinks. Specifically, interconnect delays don't improve that much as fab geometries get smaller, because even though the wire lengths are shorter there's more interference and so it takes longer to reach a stable value, especially when you've got high fan-in/fan-out. Meanwhile, transistor switching speed reaps very nice benefits from process shrinks.

The result of this is that, broadly speaking, any core design will be transistor-limited at a coarse enough fab process, and interconnect-limited at a fine enough process. It's all a matter of targeting the right process (and then of clever circuit design as more traditional techniques tend to be interconnect-limited). A subtle point is that in order to get optimal scalability you also want your pipeline to be best balanced at your target process node, not necessarily the one the core is originally designed in! One very obvious example of the P4's dont-be-interconnect-limited design is the fact that it actually has 2 pipeline stages devoted entirely to moving signals around the chip! These are probably a complete waste on .18, but will be very useful once the Netburst core hits .09, .065, and beyond.

Second, as I mentioned, the P4 is very picky when it comes to how code is compiled. Pretty much most of the conventions that optimize for a P4 also optimize for the PIII/Athlon, it's just that the performance increases--or penalties for doing it the wrong way, if you'd rather--are much greater on P4 than those architectures. Also there's the deprecation of x87 for SSE/SSE2, which is a very good idea when applicable (Hammer goes for this in a big way, incidentally). The result is that when P4 launched (just as when PPro launched), about the only real application benchmark it did well on was the SPEC suite, because SPEC allows recompilation; nowadays, as more and more apps are compiled with modern compilers, the P4 does better and better relative to the Athlon.

[The above, BTW, is not an indication of "good" or "bad" design on the part of either the Netburst or the K7 engineers, but rather a recognition of their different market realities. Intel's marketshare is big enough that when they come out with a new core which requires recompilation to achieve decent performance, the market listens and ISV's optimize for it. AMD doesn't have anywhere near that market power, so when they come out with a new core they have to design it to run existing software as fast as possible. That's one reason why for K8 AMD went with SSE2 instead of inventing their own RISC-like scalar FPU extension to replace x87.]

Then there are physical characteristics: the Netburst design uses a lot of transistors to achieve its speed gains, and it was frankly way too big to be a mainstream consumer chip at .18 (217mm^2 for a consumer chip!). 130mm^2 is a much nicer size, especially now that Intel is transitioning to 12-inch wafers. And the new packaging that came along with Socket478 was a help too; those Socket423 P4s were ungainly suckers.

You mention the added cache size as if it was a seperate variable from the core design, but it's not: look at how much greater a boost P4 got from moving 256k->512k L2 than did Athlon. Another Netburst design characteristic that favors later implementations over .18.

Finally it's not technically correct that extra cache is the only improvement the core has undergone since introduction. There were some minor circuit-level and microcode optimizations made when Northwood underwent a relayout, around the introduction of the 2.66 GHz chip or so, which netted a few percent performance improvement. There may have been further optimizations made during the .13 shrink for Northwood, I forget. Also, HT support was fixed and switched on (it's been present in silicon all along).

And, finally, the Netburst core will surely undergo several more improvements throughout its lifetime, which should stretch for a while longer yet. I haven't heard much of anything about the next x86 core from Intel, so presumably Netburst will be their sole horse for the consumer desktop all the way through the .065 node, which should launch in 2005. (That is, maybe the next x86 core will launch on .065, perhaps in 2006 or so...but if history is a guide it probably won't be a good consumer chip until at least the next process shrink! :D )
 
Back
Top