Interesting Anandtech article on future of CPUs/multi-core

Diplo

Veteran
The Quest For More Processing Power, Part 1

This is a fairly long feature looking at what the future could hold for CPUs and why current CPUs have hit the problems in scalability they have (in other words, why Intel's prediction of 10Ghz P4's was so far wrong). It also explores the caveats of single and multiple core CPUs in some depth. Basically you could some up the problem of single-core CPUs as heat/leakage and multi as lack of multi-threaded applications.

Well worth a read.
 
Pretty technically advanced for an anandtech article, it's up there with arstechnica...

However, this was almost entirely predicted already years ago. I read a paper dated 2000 where an Alpha 21264 was theoretically scaled to a 35nm technology, and factoring in wire delay, cache latencies, projected memory technology(increased memory gap hence more pipelining) and extended ILP. Every which way they sliced it, either 10GHz w/deep pipelining or 5GHz w/massive superscaling, or large caches w/large latency or smaller w/less latency they couldn't get more than a 7-fold increase in performance...starting with a 0.25u process.

The only thing they didn't quite get right was the projections for the processes, 70nm they guessed for 2008, and 35nm for 2014! 65nm prototypes were shown last year... :)


http://www1.cs.columbia.edu/~cs4824/handouts/agarwal00.pdf
 
Did they use esotheric materials in that alpha analysis tho? Like, carbon nanotube wiring, diamond transistors etc? :) Probably not, right?

So there might still be some headroom to expand into by turning up the clock speed dial a bit more.
 
The Anand article is written by Johan De Gelas, who used to write for the Ace's, which always were one of the better hardware sites.

Cheers
Gubbi
 
Guden Oden said:
Did they use esotheric materials in that alpha analysis tho? Like, carbon nanotube wiring, diamond transistors etc? :) Probably not, right?

Of course not. Nano-tube grids aren't even semiconductors! :LOL:

So there might still be some headroom to expand into by turning up the clock speed dial a bit more.

Well, no, not really. This problem is an _architectural_ problem first and foremost, not a clock-frequency problem as in the clocking of the transistors(...or heat or power consumption), and not something you solve with some wonder-material. If you check the paper, they project the wire delay and simply speed of light makes 1% of the chip reachable within one clockcycle! If you get a material that conducts current considerably faster than the speed of light, ok, then you're home free... ;)

And still there is the problem with the memory gap, if we say...crank up the clock frequency to 20GHz, DRAM memory latency won't be anywhere near 1/6th the latency of today. And the latency today to the main memory is in the order of 100cc! We have alleviated this by cache memories(and deeper pipelines) so far, but even now in the AMD64 and P4 the L1 cache have about 3cc latency b/c of wire delay! L2 cache about 15-30cc depending on the size. And if you want to keep the latency down, you have to have less cache. Less cache -> less efficiency. There's just no good compromise, really.

The moral of the story is that we need new microarchitectures, the monolithic core just can't hack it anymore. We need to somehow segment the chip so we can have different clock domains. One way of doing that is dual-core, you get 2 clock domains with two cores and so on. One is maybe to have parts(ALU, pipeline-stages et.c.) of the core clocked separately, but then you must have asynchronous communication between the parts with handshaking and the lot. Increases the complexity a fair bit.

Another is perhaps to make the whole core clockless, i.e. asynchronous. Philips&ARM have researched that subject for 10+ years(Amulet), so it doesn't look all that simple unfortunately. But simple 16-bit asynchronous processors are on the way to the market after all apparently. We'll see how that pans out.

Anyway, the future is in high level parallelism, the monolithic core as we have today is doomed. Now the question is how to exploit the parallelism, and that's a whole another ball'o'wax.
 
Back
Top