One final addendum re: the definition of Moore's Law.
You guys have got it quite right that Moore's Law refers to a doubling of transistor counts every period (~12 months when originally formulated, ~18 months for the last couple decades). But "a doubling of transistor counts" doesn't mean anything except in reference to some independent variable that is kept constant. (Certainly it is not a doubling of the maximum possible transistor count, which can pretty much be as large as you want if you're willing to waste the money on it.)
One might assume, then, that Moore was referring to a doubled transistor count with cost held constant, or yield held constant, or double the number of transistors per silicon area. Instead, the point of reference is a bit more complex, and a bit more interesting.
Thing is, for any given semiconductor process, there is a transistor count that is the
most efficient for that process, i.e. has the lowest cost/transistor. (Note that we're talking cost per packaged and tested good IC.) Cost/transistor starts going up at high transistor counts for obvious reasons--yield declines as die area increases. But cost/transistor also goes up at low transistor counts, because testing costs are roughly constant per chip, and because packaging costs also don't scale that low as transistor counts decrease. (At quite low transistor counts this is due to being "pad-limited"; that is, there is a certain minimum die size required to accomodate the pads for however many pins a chip needs, and thus lowering transistor counts beyond what would fill up that die size is pointless, because you need to expend the silicon anyways.)
These two effects combine to give cost/transistor vs. transistor count a sort of U-shaped curve. The minimum of this curve is the most efficient transistor count for that particular process node. And that value is what Moore's Law predicts will double every period.
A very important thing to note is that if, to enable the functionality you seek, you require much above this most efficient transistor count, eventually it becomes more efficient to use two (or more) ICs instead of one. That is, Moore's Law is primarily a measure of levels of integration. Much more directly than it explains increasing CPU performance (much less clock speed), Moore's Law explains things like integrated FPUs, integrated geometry/T&L units, integrated video chipsets and SoCs. It's also key to explaining why digital cameras will have gone from essentially none of the consumer camera market to essentially all of it in the space of 5 or 6 years or so.
The amazing thing is that these are exactly the sorts of predictions Moore makes in his paper (and, again, not anything to do with increasing clock speed or even performance): in the future, levels of integration will rise to the point where more and more functionality will be integrated onto fewer ICs, and more devices (particularly mobile devices) will become commercially feasible as a result.
Arstechnica did a
great article on Moore's Law some months back that basically covers the above with some pictures, etc. Or you could just read
the darn thing yourself. I highly recommend it. To me it's perhaps the most prescient document in computer science with the possible exception of Turing's paper on AI. Of course, I'm a sucker for that sort of thing, so YMMV.