Bob Colwell (chief Intel x86 architect) talk.

Entropy

Veteran
I want to start off with a message to the Powers That Be, that they should feel free to move this post wherever they feel it belongs. I couldn't decide, as the proper forum might seem to be "Hardware Talk" but these topics never show up there. CPU architecture is hotly debated in the "Console" forum, but I finally decided to put it here, because it seems technologically interested/broader picture people cruise by here.

This is the link (url tags don't seem to work on .asx?):
http--stanford-online.stanford.edu-courses-ee380-040218-ee380-100.asx

Bob Colwell, senior chip designer at Intel, gives a very candid talk on where we're at, and where we might be going. Basically, it is an inside view of x86 development, historically and going forward.

The full presentation is 1.24 long and I don't regret sitting through it one bit. Indeed, I've found myself going back to it time and again. He does a very good job, all he says is accessible to just about anyone on these board, and, I believe, of interest to just about everyone who cares at all about where computing and the PC platform is heading.

I could go on about the content, and all kinds of interesting reflections and speculations that might arise from it, but I suggest you simply listen to it.

For a long time now, I've felt that I should put together a good post about industry trends, and why I feel that the the Gfx IHVs, and these boards in particular are out of sync with them, but I've come to realize that I just won't be able to devote the necessary time for the forseeable future.

Bob covers some aspects of this, and also offers horses mouth insight into where x86 computing is heading. ´My perception is that a lot of people on these boards may hear his words with mixed feelings.

I must say that listening to him speak is great, and that it's gratifying to know that this guy has influence. Enjoy.
 
I don't suppose you could pull out the key points and summarise them here (for those of us who can't listen to this) ? Just very briefly...
 
Interesting that the Power/Heat thing keeps cropping up - IBM, ATI, Intel all in a few days. you're right about mixed feeling because you kinda wonder about whether they are going to be able to progress at this rate of development - can they find alternative ways of alleviating these issues whilst still creating more complex designs.
 
Complexity I'm not so much worried about. I mean, it shouldn't be too much of a problem to start moving in the direction of multi-core CPU's. A large number of the more challenging problems in CPU performance were solved some time ago. Now it's about packing more processing power into one chip, and the easiest way to do that would be multi-core CPUs. It would require effective resource distribution, but that's more a software issue, and one that's also an old problem that's largely been solved (not in all software, obviously, but solved enought that most software should be able to readily take advantage of multicore CPUs).

However, the ability to pack more logic into less space really is going to scream to a halt all too soon. We'll need radically new technology to get around this barrier.
 
Chalnoth said:
It would require effective resource distribution, but that's more a software issue, and one that's also an old problem that's largely been solved (not in all software, obviously, but solved enought that most software should be able to readily take advantage of multicore CPUs).

For some workloads that is true, but for other just adding processors is not a trivial solution. Lets face it, multi-processing has been the easiest option for decades.

Q: Why build a multi-billion dollar fab to scale your single processor performance by a factor two when slapping a second processor on the motherboard gets you the same performance gain?

A: Because it's costs you more money to write the software to get the same performance gain out of dual processors than it does to build a single processor of twice the performance.

If the software solution was so easy, it would have been done already. Why aren't we running Microsoft Word on Beowulfs-Of-486s-In-A-Box?
 
I find the power & heat (twin) issues interesting, too. Intel seems to have halted P4 development and switched to the P-M's (P3++) as their dominant CPU, seemingly primarily because of its power+heat/performance ratio. The P-M seems to perform far better than the P4 using the same amount of power, and I think they solved that using smarter, not (primarily) smaller engineering. ATi and nV are probably learning the same tricks with their mobile parts, and no doubt they'll be transferring their mobile know-how to their desktop chips when they begin to hit power/cooling barriers.

Sure, power-saving tech probably costs more to implement at first, but powerful CPUs are already so relatively inexpensive that the added cost (or the trade-off for less speed with less power) can't be that great of a burden.
 
There are also completely different, radical and maybe "exotic" approaches to be evaluated.
For example, this one: The WIZ Processor

Only 1 opcode (COPY), "intelligent" registers (where the work is done), inherently parallel, clockless.
Sound strange? Take a look! :)

Bye!
 
I watched the presentation 'Things CPU Architects Need To Think About'.

http://www.stanford.edu/class/ee380/winter-schedule-20032004.html

Many of the points are relevant to the VPU/GPU world as well.

Quick Summary of main points :

- The number of bugs in a chip is relative to it's number of transistors (logic trasistors), (this point is not relevent to cache transistors).

- As complexity increases so does the bugs : in chips and software.

- more complex = more fragile.

- as complexity increase; this narrows down the avalible amout of knowlagble people to fix any problems.

P6 core - only 15 people could understand / keep in head design.

Pentium 4 - only about 3 or 4 people could understand / keep in head design.

- Exponetional trends are not sustainable (trasistors/power/clock rate/bugs).

- Don't "bend" (cheat) the benchmarks. (ie only show a perfect "hand-tuned" code test to Executives -- non representative to majoiry of cases code = bad risk)

- Don't compare "desktop" and "server" CPUs as there development economics are different.

- When migrating arcutectures - do you :
1) swap straight
2) create a middle solution that talks to "both old new" (is this even possible).

- User centered design (user requirements are most important).
 
nutball said:
For some workloads that is true, but for other just adding processors is not a trivial solution. Lets face it, multi-processing has been the easiest option for decades.

Q: Why build a multi-billion dollar fab to scale your single processor performance by a factor two when slapping a second processor on the motherboard gets you the same performance gain?

A: Because it's costs you more money to write the software to get the same performance gain out of dual processors than it does to build a single processor of twice the performance.
No. It's because of the cost of the hardware. It's simply cheaper to build a single CPU system.
 
Mark0 said:
Only 1 opcode (COPY), "intelligent" registers (where the work is done), inherently parallel, clockless.
Sound strange? Take a look! :)
I'm not sure replacing instructions with registers really makes things simpler in the end.
 
PeterAce said:
Many of the points are relevant to the VPU/GPU world as well.
Not as much. A GPU is made up of many identical units, and so it is far less complex than a typical CPU for the number of transistors.
 
Here

It was linked over at Ace's 3 or 4 months ago.

Another of the interesting points in there is that the design for P4 (netburst architecture) was partly driven by what the market perceives as a fast processor which in part led to the P4 being hyper pipelined so it could reach those high clock speeds.

Cheers
Gubbi
 
Chalnoth said:
nutball said:
For some workloads that is true, but for other just adding processors is not a trivial solution. Lets face it, multi-processing has been the easiest option for decades.

Q: Why build a multi-billion dollar fab to scale your single processor performance by a factor two when slapping a second processor on the motherboard gets you the same performance gain?

A: Because it's costs you more money to write the software to get the same performance gain out of dual processors than it does to build a single processor of twice the performance.
No. It's because of the cost of the hardware. It's simply cheaper to build a single CPU system.

/me checks price of 3.4GHz Pentium 4
/me checks price of 1.7GHz Pentium 4

I'm gonna have to disagree with that!

And you didn't address this bit:

If the software solution was so easy, it would have been done already. Why aren't we running Microsoft Word on Beowulfs-Of-486s-In-A-Box?
 
There'll always be put effort into making faster single thread performance.

You can speed a task up by adding processors until you reach a point where the serial component of the task is the limiting factor.

Formalized in Amdahl's law

The second a general purpose processor vendor shift focus away from serial performance it is doomed (look at how Sun's SPARC is teh suck).

Cheers
Gubbi
 
Gubbi said:
There'll always be put effort into making faster single thread performance.

Yes, I'm sure there will be. The interesting question is how much more scope is there to increase single-thread performance? I guess the equation goes something like this:

instructions per second = instructions per clock * clocks per second

It seems that the rate at which the latter term will increase over the medium-term might begin to fall off. At least that's the message I got from Colwells talk and from all the other stuff that's happened with Tejas, etc., recently.

The achievable IPC is also bounded, there's only so much instruction-level parallelism that you can extract from a typical x86 instruction stream. There's only so many additional execution units you can add before you hit the point of diminishing returns (I read somewhere that that threshold is around about 3-4 for x86).

So what else is there?

You can speed a task up by adding processors until you reach a limit where the serial component of the task is the limiting factor.

Formalized in Amdahl's law

Hehe, yeah I'm familiar with Amdahl's Law (I work in HPC! ;)). Point is that only some workloads are amenable to parallelisation, and it's not immediately clear to me that the typical things desktop PCs spend their time doing falls into this category. If your serial fraction is 99% then extra processors don't help you any.

Parallelisation is also costly (in developer man-hours), there's a whole load of stuff you need to pay attention to that you don't with single-threaded applications. Automatic parallelising compilers do exist, but tend to be very limited in their scope.

That was what I was getting at responding to Chalnoth -- building multi-processor computers it pretty easy. If it was as simple as building a two-processor PC and adding the '--parallelise_my_code_please' flag to the compiler to get a factor 2 speed-up in MS Word, it would already have been done, we'd all be using parallel desktop applications on multi-processor desktops right here, right now.

This is a software problem, not a hardware problem. Folks have been working on making parallelisation simple for 30-40 years, and have found that it's really, really hard to get it to work in the general case.

Multiple cores are good for system throughput as a whole, but don't necessarily translate to improved performance for all applications in all cases.
 
nutball said:
Multiple cores are good for system throughput as a whole, but don't necessarily translate to improved performance for all applications in all cases.

Volari, anyone? =D
 
Pete said:
Done and done, for you impatient lot.
Thanks :)

Found this quote from the thread fascinating :
With 500-member design teams, barely anyone can keep their head around the whole design, and thus barely anyone really understands the whole system. The P4, apparently, can exhibit chaos-theory-like complexity in some of its behaviors, suddenly slowing for 1000's of cycles for no real reason.

Can anyone expand on the 'chaos-theory-like' behaviour? In the future are we going to see embedded chips in, say, aircraft that suddenly start behaving unpredictably because the complexity is beyond our individual comprehension? I'm almost seeing some kind of Terminator-style future where machines build machines and no human can understand what they are doing anymore....
 
Nutball, sorry, wasn't trying to teach you anything, just felt that a clarification was in order. Some people has a tendency to look at peak performance of a multi CPU system and assume that equals its performance on any given problem.

nutball said:
That was what I was getting at responding to Chalnoth -- building multi-processor computers it pretty easy. If it was as simple as building a two-processor PC and adding the '--parallelise_my_code_please' flag to the compiler to get a factor 2 speed-up in MS Word, it would already have been done, we'd all be using parallel desktop applications on multi-processor desktops right here, right now.

This is a software problem, not a hardware problem. Folks have been working on making parallelisation simple for 30-40 years, and have found that it's really, really hard to get it to work in the general case.

I agree. I blame software too. Auto-parallizing Microsoft's C/C++ spaghetti is never going to work. Most software has a much higher degree of serilization than the original problem the software is a solution for warrents.

I think it will take a shift away from strict algorithmic thinking in the design and programming phase of projects aswell as more sophisticated tools to get good auto parallelizing code. Right now it can only be done on highly regular structures like vectors or matrices. We need a similar degree of structure on our basic programming constructs. Something like nodes in CSP.

Cheers
Gubbi
 
Back
Top