Bob Colwell (chief Intel x86 architect) talk.

MfA · May 15, 2004

Diminishing retuns have only really set in around a decade ago. Until recently there was no existing market which would have a quick enough uptake to invest real money in parallel architectures, GPUs and consoles are changing that ... once both of those, and the PCs developed using the console chips, make a big enough dent in the willingess of people to shell out for new "C"PUs even the desktop processor manufacturers will have to buckle.

The only applications the majority of us need lots of cycles for have plenty of parallelism.

Entropy · May 15, 2004

aaronspink said:
nutball said:

Would the idea of going very, very wide (say 4 x 16-pipeline chips on a card) and dropping the core frequency address the power issue?

Click to expand...

In general you get cubic reductions in power with a reduction in frequency (and matching reduction in voltage)... This assumes that you are opperating within linear regions wrt Vt.
Doing some back of the hand calulations, a 4 die R420 at 330 Mhz, vs 525 Mhz 800 Xt, would have ~ the same power and 2.5x the performance.
All for 4x the cost.
And we haven't dealt with memory power. Or ineffeciencies.
So while doable, it is probably not economically viable for the vast majority of the enthusiest market.

Generally speaking, gfx chips already trade off clock-speed for parallellism. That's why we have these huge die-area, 16-pipe chips running at relatively modest clocks. This works very well in graphics, while being quite a lot trickier for general purpose code.

Since it parallellizes nicely, graphics has been limited mostly by how many transistors you could cram on to a die with sufficient yield per wafer that it could be sold at a profit. The problem for that model now is that power draw isn't dropping with feature size the way it used to, thus the payoff for moving to finer lithography will be more limited than it used to be. Graphics will still stand to gain a lot by going to finer lithography though, more than CPUs for sure.

{CPUs are likely to move to on-chip parallellism as well, but the performance payoff will be more limited in general, particularly if we move from two to 16-64- et cetera. That is, there is some low hanging fruit to be picked early on in the game, but as long as we are talking vanilla SMP and single user machines, the payoff isn't likely to be impressive. Then again, the advances from improving lithographic technique doesn't look likely to stagger us with its pace either. We've had a factor of two in three years now - almost twice as long as it used to take.

Windows and x86 isn't exactly a dream team to base a massively parallell architecture on. For single user machines, you'd like your parallell processors to work efficiently towards solving a single problem fast, more of a supercomputer scenario as opposed to the server scenario where you can run relatively independent processes on different processors.}

KimB · May 15, 2004

Entropy said:
{CPUs are likely to move to on-chip parallellism as well, but the performance payoff will be more limited in general, particularly if we move from two to 16-64- et cetera. That is, there is some low hanging fruit to be picked early on in the game, but as long as we are talking vanilla SMP and single user machines, the payoff isn't likely to be impressive. Then again, the advances from improving lithographic technique doesn't look likely to stagger us with its pace either. We've had a factor of two in three years now - almost twice as long as it used to take.

I don't think so. I think we'll simply see not much gain at first, but as software gets more used to the idea (with parallelism in the range of 2-4), software developers will get used to writing parallel code, and we'll start to see massive improvements going forward.

Remember that for any processing-intensive code today, there are many separate pieces that need to be processed. Of course you won't be able to make as good of use of the parallelism as is made in a GPU, but it'll definitely be a huge improvement.

pascal · May 15, 2004

IIRC the P4 had three generations and since the begining heat and performance were a concern. It has some valid points like improved bandwith (compared to the P3) but something went wrong with the hyperpipeline.

IMHO the P4 design was a failure since the begining. I dont know if the problem is at the conceptual level or the implementation level but my guess it is the later.

Now Intel has only the redesigned P6 core to start again. Probably they can quicklly do something redirecting it for better performance like higher memory bandwith next year (800MHz or 1066MHz).
CPU paralelism just happened because the cores are small and then it doesnt cost much to put two cores in the same die. Definetlly it will be much better than hyperthread

Software will slowlly start to use it but I dont expect much. PC software in general are low quality hype driven software. I really dream about a good clean light RTOS with high quality human interface and tools.

But the redesigned P6 core is not the ultimate core. Maybe in the future we will see something (new core) much better with extreme superscallar, improved FPUs, maybe something like a vector unit.

Also memory latency and bandwith has to be improved. The multipliers are too high. IMHO this is holding the CPUs today. Cray said that you cannot fake bandwith.

I dont know if the hyperpipeline will be used again with a new design and better project management but CPUs design cost too much and business value is too high.

Another future possibility is having the CPU, some generalized GPU and onchip memory controller in the same chip sharing a large fast low latency UMA 8) About power consumption, heat generation and dissipation. Consoles like the PS2 and gamecube should be the model.

IMHO we need some kind of PC-console hybrid.
Cool, quiet, low-heat, efficient, low cost, visualization driven, flexible and with a lot of power.

MfA · May 15, 2004

You think that it could have been running at an even higher clock with better implementation and that would have saved it?

I think we need such a hybrid too, maybe if enough people buy IBM's workstation we will get it too. Cause x86 wont get out of the chicken and egg problem anytime soon, and graphics card developers can only do so much with a large part of the cost of a PC still going to supporting some archaic wide superscalar monstrosity even if they start supporting more generic processing.

pascal · May 15, 2004

MfA said:
You think that it could have been running at an even higher clock with better implementation and that would have saved it?

Higher clock no but some kind of improved Northwood core with lower-heat and higher Instruction per cycle maybe could have saved it.

MfA said:
I think we need such a hybrid too, maybe if enough people buy IBM's workstation we will get it too. Cause x86 wont get out of the chicken and egg problem anytime soon, and graphics card developers can only do so much with a large part of the cost of a PC still going to supporting some archaic wide superscalar monstrosity even if they start supporting more generic processing.

It is a shame we are still using x86 in the 21 century. I hope the best for IBM too.

Who else could do this hybrid for us?

MfA · May 15, 2004

I was talking about the Cell workstation.

The problem isnt x86, PPC is no better ... which is not to say I think they are bad, I just think that wide superscalar implementations make no sense. A simple in order dual issue architecture with a scalar and a SIMD pipeline is what is needed for parallel oriented processors (and both x86 and PPC could do that decently enough, not ideal but as x86 has proven you can do good enough with a non ideal ISA).

If NVIDIA or ATI merged with someone with a x86 license (and preferably a fab) they might able to pull it off.

Entropy · May 15, 2004

pascal said:
Another future possibility is having the CPU, some generalized GPU and onchip memory controller in the same chip sharing a large fast low latency UMA 8) About power consumption, heat generation and dissipation. Consoles like the PS2 and gamecube should be the model.

From a computer architecture point of view, I find it extremely disturbing to see 256 MB of very fast RAM that only the GPU can use, (and in practise typically don't need!). Imagine the positive overall effect if the CPU designers knew they would have access to such resources.

IMHO we need some kind of PC-console hybrid.
Cool, quiet, low-heat, efficient, low cost, visualization driven, flexible and with a lot of power.

I think you put the finger on Microsofts Great Fear. Look at the PS2. Look at a PC box. Imagine a close future where someone could buy a new gaming console, hook it up, and play games, watch films, and if they like, access the net and do small time utility computer tasks. Just how interested will consumers be in a bulky, noisy PC that, adding insult to injury, is more expensive? Of course, consumers already have that option today, but this generation it turned out that the consoles simply removed PC upgrade impetus. Which is bad enough, in Microsofts view. (The first vision though has a better chance of becoming reality the next time around since in 5 years or so quite a few people will have TV screens that can do a decent job as monitors as well, unlike the situation for the PS2.) The console doesn't have to replace the PC completely - simply turn it into a utilitarian commodity, and the PC has lost much of the battle for consumer dollars. I would contend that this has already happened to a large degree. The PC can fight back by going to extremes - $500 gfx cards drawing over 100W isn't an option for consoles, on the other hand the PC market for such beasts isn't exactly huge by either console or total PC volume standards.

IMO, the best path for the PC to take is to take a leaf from the console book, exactly as pascal describes it above. It would mean taking an initial step backwards in power, but two steps forward in ergonomics and fitting into peoples lives. It would also be percieved as something new and a positive change - something the PC sorely needs. It would also retain all the traditional advantages of PCs such as upgradeability, flexibility in peripherals and software et cetera.

Intel scrapping the P4 roadmap show that even they aren't hell bent on following the beaten track. The question remains - when will the gfx IHVs follow suit? The trick, as I see it, is daring to take that initial step backwards in performance, and finding ways of effectively selling the advantages. I think it would be successful. It could be argued that consumers are already favouring such a path as demonstrated by the continuing growth of portable marketshare.

pascal · May 15, 2004

MfA said:
I was talking about the Cell workstation.

I know.

MfA said:
The problem isnt x86, PPC is no better ... which is not to say I think they are bad, I just think that wide superscalar implementations make no sense. A simple in order dual issue architecture with a scalar and a SIMD pipeline is what is needed for parallel oriented processors (and both x86 and PPC could do that decently enough, not ideal but as x86 has proven you can do good enough with a non ideal ISA).

I agree. I was just pointing how loocked to the past we are. We have to get away with most of the legacy for the practical and spiritual reasons and clearlly signalling a new direction.

MfA said:
If NVIDIA or ATI merged with someone with a x86 license (and preferably a fab) they might able to pull it off.

Agree.

[high speculation mode on]Some more possibilities.
Nvidia buy SGI.
Recognized brand worldwide. Lots of IP. Some engineering. Some corporate recognition. MIPs IP.

Now SGI start to sell some kind of open hybrid with improved MIPS CPUs, fast UMA, generalized GPU, DVD, etc... for the desktop (game, SOHO and corporate) and the living room. Maybe 2 internal slots for expansions/flexibility.

Also deliver HW&SW tools integrating/scalling it in some way.

IIRC Jim Clark wanted to go to the consumer level.

pascal · May 15, 2004

Entropy said:
pascal said:

Another future possibility is having the CPU, some generalized GPU and onchip memory controller in the same chip sharing a large fast low latency UMA 8) About power consumption, heat generation and dissipation. Consoles like the PS2 and gamecube should be the model.

Click to expand...

From a computer architecture point of view, I find it extremely disturbing to see 256 MB of very fast RAM that only the GPU can use, (and in practise typically don't need!). Imagine the positive overall effect if the CPU designers knew they would have access to such resources.

Me too. Also the comunication between CPU and GPU could be improved a lot. And the idea of using the generalized shaders units could be done more easilly.

overclocked · May 15, 2004

A very logic person from the impression i got.
Stands with his two feetÂ´s on the ground.

WaltC · May 15, 2004

Entropy said:
Heh. Thought you guys would enjoy it. I was amazed at how straightforward and clear he was - very geek to geek, and with a strong sense of technology integrity shining through.

That's fine, but never lose sight of the fact that he's an Intel geek....

(Not an independent geek. Heh...

)

In the context of the B3D forum, for people involved in gaming (which includes all of 3D gaming) his comment that Intel is really attentive to that group must be gratifying. However, he immediately followed up with "but you can't base a 30 billion dollar company on them" which should be a warning.

I'm not really sure what to make of that comment, as it's never really been obvious to me that Intel has ever at any time in its history been a company "based on" developing products primarily for people who play computer games...

It's an interesting comment, though, and kind of an odd one, if we assume, which I do, that Colwell has not spent his time with Intel believing that the company was "based on" creating products for computer gamers...

I refuse to believe that he's just now found out that Intel was "based on" some fundamentally different concerns, and made these remarks out of his shock at this discovery...

Perhaps it's a back-door apology offered in advance for Intel being unable to push x86 performance much further, which he assumes will be of concern to computer gamers, and he wants them to know that while Intel appreciates their business it's important that gamers remember that Intel has other fish to fry...?

Still, I'm not convinced he meant to say that, either, exactly. Just an odd remark in this context, I think, as I really don't know a soul who's ever thought that Intel was based on creating products for computer gaming.

And the general gist of the presentation really told the story - the age of pushing performance forward at the cost of other parameters is coming to an close for general purpose computing. It's not over yet, and may never fully be, but other factors will get progressively more attention as soon as the marketeers figure out how to sell them, and they will find ways to sell those features, because apart from gradually loosing attraction value performance has already ceased to improve at the accustomed brisk pace.

Interesting that you'd use the word "age" in this context. It's actually like about "five years" from Intel's perspective as opposed to an "age," don't you think? Wasn't it 1999 in which the primary x86 workhorse for Intel was the PIII, which it stuggled mightily to bump to 1GHz in response to the cpu perfomance of AMD's K7, which was nowhere near as dependent on ramping MHz clocks for its overall processing performance? Prior to AMD's introduction of the Athlon, it's absolutely certain that Intel was in no MHz ramp rush whatever, as the company routinely released new models of older cpus clocked 50-75MHz higher than the last one, with somewhat large gaps of time in between, in a lazy, unconcerned fashion befitting a confident monopolist. Intel found it couldn't ramp the PIII much in MHz in response to K7, though, and then the P4 made its debut.

Remember in the beginning how Intel talked often in glowing terms about ramping the P4 to "10 GHz," eventually? And Intel was saying things like this without the slightest clue that it could take the P4 to 10GHz in the first place, and I found it amazing that people gave it any credence at all at the time. Interestingly enough, you never heard similar talk out of AMD at the time about the future MHz performance of Athlon, because AMD was too busy looking for ways to increase processing performance other than in ramping an architecture to 10GHz. It strikes me that Intel is only now apprehending the "core" notions AMD was working with when it was designing the original K7 prior to introducing it.

The thing about performance in cpus is that there are other ways to describe it apart from MHz...

It's heartening to see that Intel is finally acknowledging this publicly.

(It's also a bit silly and utterly facetious, too, since Intel has always known this basic fact--Itanium proves the premise conclusively. But then, so does Athlon, G5, etc., and of course Intel doesn't like to talk much about that.)

The Q&A session had a notable passage from 1.11.30 onward that made it very apparent that in Colwells opinion x86 really carries a lot of baggage, and that it may not be able to compete quite as impressively going forward with clean-sheet designs. That's probably not much of an issue in PC space for compatibility reasons, for consoles however other rules apply. What will stagnating CPU speeds mean for the development of future PC graphics engines? And does this have any short term or long term implications for PC vs. console gaming?

Is it really "notable" that he'd say this, considering that Intel opposed x86-64 from the start, and was busy publicly telling the world that "Hey! If you run x86 software, then relax! You don't need 64-bit computing. But the good news is that when you get to Itanium you're going to love it!"...? Also, there's no doubt in my mind that by far the biggest piece of "baggage" relative to x86 that Intel would like to chuck is AMD...

Also, the last remarks you make as to general "graphics engines" and "consoles" and what you term "stagnating cpu speeds" by which I assume you mean "stagnating MHz clocks"--which as I pointed out does not have to mean "stagnating performance" at all--sound very much like you assume that he's speaking for AMD and everybody else. I don't think it would be wise to view any of his remarks outside of an Intel-specific context.

"Clean sheet" just sounds so "clean," doesn't it? It surely sounds better than saying "An architecture wholly incompatible with the entire world- wide x86 software market," no doubt. That's the other kind of "baggage" Intel should have considered--the hundreds of millions of dollars, if not billions of dollars, that companies and individuals have invested in x86 software in the last few years. That's definitely not the kind of "excess baggage" companies and individuals consider disposable, is it? Obviously not...

He also remarked on how graphics processors are getting more programmable, and how "this hadn't gone unnoticed at Intel", a remark that's quite intriguing, and a bit disturbing if you happen to be a grahics IHV. (And of course he let slip the amount of i-cache on a gfx-processor unfortunately without saying which one.) What could he have meant by that remark?

I have no idea what he meant by it--just as I had no idea what he was talking about in saying that it should be understood that Intel couldn't be a company "based on" making products for computer gamers...

(Since Intel never has been that--and he might be the only person alive confused about that, should he actually ever have thought that himself.)

Why should IHVs be concerned at all about an off-the-cuff remark which says, essentially, absolutely nothing?....

Intel talked about 10GHz for the P4, too, which was a lot more specific than this remark, and nothing came out of that, either. Intel similarly did a lot of talking about Rdram, etc. that proved wholly inaccurate. I think what will "concern IHVs" coming from Intel is when Intel introduces and markets retail 3d chips & reference designs competitive with theirs--that's when I think they'll be concerned. Intel got into the retail 3d-chip business a few years ago, briefly, in trying to prove the validity of the AGP bus's practical application for 3d gaming, got their socks knocked off by local-bus products from 3dfx and nVidia, and they took their marbles and went home as I recall.

First, I think that Intel needs to say something specific here , and then Intel needs to do what it said, and then it will be the proper time for any parties in the competitive landscape to become "concerned." I think you are stretching his remark here way out of context.

Anyway, I felt that if someone of Bob Colwells caliber speaks about the state of computing, in a way that was just recently backed up Intel scrapping their entire P4 roadmap (!!!), then maybe even the graphics nerds will take notice as his words weigh infinitely heavier than those of an anonymous "Entropy". Times they are achanging although to what degree remains to be seen. Maybe it would be smart to ask ourselves how this is likely to affect the graphics business?

Just goes to how show different people interpret things differently--as I didn't see it as "backing up" Intel's ditching of the P4 roadmap at all--I saw it as an apology for Intel being unable to bring it's original proclamations as to the inevitable MHz ramp for P4 to fruition, and an acknowledgement that the expectation had never been sound from the beginning. The problem with it, of course, as I point out in response to you remarks above, is that a whole lot of relevant information as to the innovations and directions of companies other than Intel have been of signal importance in the scheme of things, and Intel spokespersons almost always talk about what Intel is doing as if nobody else existed. It's certainly a convenient security blanket to wrap up in, but I doubt it has much substance in the way of effective insulation properties...

There's a whole world of technology out there aside from Intel, but sometimes I wonder if Intel itself won't be the very last party to realize it.

KimB · May 15, 2004

pascal said:
IMHO the P4 design was a failure since the begining. I dont know if the problem is at the conceptual level or the implementation level but my guess it is the later.

Oh, I definitely think it was at the conceptual level. Higher MHz at the expense of IPC is a dead end. Intel should have known this.

Saem · May 15, 2004

The P4 wasn't a bust from the begining. Prescott is the problem, they should have attacked IPC, instead they went after clock rate TOO aggressively. Sustaining the clockrate or getting small bumps, while persuing things like TLP more aggressively would be smarter.

But to say it's a bust is pretty stupid consider the performance results achieved. You can go about the k8, but they're more reliant on exotic materials for clock bumps, considering their solution is SOLELY attack serial performance.

ChronoReverse · May 15, 2004

Eh? AFAIK, the K8 design was to both increase IPC (such a vague concept) and frequency headroom over the K& design.

KimB · May 15, 2004

Saem said:
The P4 wasn't a bust from the begining.

Sure it was. The first P4's often couldn't outperform the P3 1GHz, let alone the Athlons at the time.

There were some benefits to be had on the bandwidth side, but I don't see how that had anything to do with the Pentium4 architecture. That had more to do with the bus architecture: a bus which could have been strapped to any CPU architecture.

glappkaeft · May 15, 2004

Chalnoth said:
Saem said:

The P4 wasn't a bust from the begining.

Click to expand...

Sure it was. The first P4's often couldn't outperform the P3 1GHz, let alone the Athlons at the time.

Yes, but that was 1.5 GHz P4's running code most often compiled for 486 and Pentium 1 processors. With todays codebase I'd much rather have a 2.0 GHz Willy P4 over a 1.0 GHz P3. What I don't understand is how Intel went from the Northwood (IMO the overall best 130 nm processor) to the Prescott. With all the signs of accelerating leakage on the 130 process and beyond, why go with a 31 stage design and double the amount of core transistors?

KimB · May 16, 2004

glappkaeft said:
Yes, but that was 1.5 GHz P4's running code most often compiled for 486 and Pentium 1 processors. With todays codebase I'd much rather have a 2.0 GHz Willy P4 over a 1.0 GHz P3.

Sure. But if you remember, the 2.0 GHz P4 wasn't released until quite a bit afterwards. If Intel had continued with P3 line, they'd have been quite a bit faster than 1GHz by the time the 2GHz Willy was released.

Now, I would like to say is that not everything Intel did with the P4 was bad. There are definitely many things about the architecture that are very good for performance. But I think that every single one of its benefits could have been done better on a chip that was made for lower clocks and higher IPC.

But, the Intel brass has for a very long time believed that MHz is king, that people equate high frequency with high performance.

Vince · May 16, 2004

Chalnoth said:
pascal said:

IMHO the P4 design was a failure since the begining. I dont know if the problem is at the conceptual level or the implementation level but my guess it is the later.

Click to expand...

Oh, I definitely think it was at the conceptual level. Higher MHz at the expense of IPC is a dead end. Intel should have known this.

Implimentation Level... I wish I could post Bogg's actual comments from Micro-33, but my laptop is.. er.. broke; so this shall have to do:

[url=http://www.eet.com/story/OEG20001213S0045 said:

EETimes.com[/url]] MONTEREY, Calif. â€” If Intel Corp.'s microprocessor architects had had their way when they were designing the Pentium 4, it would have been a very different beast than it is today, said the company's principal processor engineer.

A third-level cache strapped to the die, two full-fledged floating-point units and a bigger execution trace cache and level-one cache were all part of the original blueprint for Intel's most recently introduced processor. As it turned out, these features had to be either modified, stripped down or dumped altogether to keep costs in line.

Intel engineers were forced to rethink their lofty intentions when it became clear that chip size had gotten too unwieldy as they tried packing in more hardware units to maximize performance. Power consumption, architecture complexity and testing also posed serious problems, said Darrell Boggs, Intel's principal engineer for the desktop platform group (Hillsboro, Ore.), addressing a room full of researchers and engineers at the Micro-33 conference here.

"The general trend has been to make [the CPU] larger in physical area," he said. "But anytime you have a large die size, that means you have to have many fabs. You can become capacity-constrained unless you build a new fab."

Click to expand...

The rest of the article talks more about the cuts, if you have time it's an interesting read. Sorry I can't give you the actual presentation, but it's around the net somewhere and if I could find it...

MfA · May 16, 2004

Those designs are made at the conceptual level. They decided to make the cuts, and then implement what they had left ... the implementation was an effort to make good on a poor design, at which they succeeded as well as could be expected.

Bob Colwell (chief Intel x86 architect) talk.

Similar threads