The problem is that once you have a desktop PC chip that uses tens of Watts, if you double the transistor count you also double the power use. For a chip line to not melt a hole in the earth after a few shrinks, you need to halve the power used per transistor every shrink too. Until fairly recently, that was the case, and the semiconductor industry was famous for delivering huge performance increases year after year in a smaller package that used roughly the same power.
Over the last decade, that power use has crept up, and the rate of increase is increasing too. Not a good sign. Mr Muller pointed this out with a nice chart that showed that scaling is still working for physical size and current, but not as well for capacitance, and voltage has effectively stopped dropping. What this means is the power consumed per circuit used to go down by a factor of 1/(a^2), where 'a' is the shrink factor, it now decreases by 1/a.
For those that don't do graphs and intersection points well in their head, that means the power is going down linearly now when it used to be an exponential drop. If you are power limited and just about every chip out there is, your transistor growth is also linear now too, not exponential. This means that, for the most part, chip performance is now on a linear curve as well. Pity the GPU makers.....
Mr Muller summed this up by saying the first wave of computing, defined by mainframes and minis up to the 1980s, was defined by adding performance with each new model. The 1990s saw the advent of the PC, and the overriding metric for that era was performance/$. In the 2000s, notebooks were the hot commodity, and those added another metric, power use. The dominant factor here was performance/(power * $), basically that if you wanted to sell a chip in to that market, power use was a very real concern. Power use could go up, but performance had to go up commensurately or more.
The future as he sees it will be dominated by 'mobiles', be they phones, MIDs or whatever form factor dominates. These have very different requirements from a laptop, all-day battery life, days of standby, always on, and many other things that a PC was never engineered to do. The overriding concern for this era adds another metric to the metric, energy cost.
With laptops, the problem was more one of finding an engineering solution to a problem of power draw. In the future, the question will not be, "Can we do that?", it will be "Is it worth it to do that?". Cost of energy is increasing so the problem becomes one of how to get a specific performance level with the minimum Watts used. Performance going up does not matter much any more, the unwritten message is that we are on the verge of 'fast enough'.
Strangely enough, ARM is very well positioned here, they have been designing chips that use power at levels that round to zero for years. Every other CPU company out there is doing the same to one degree or other, and the peripheral chip makers are also following suit. In very short order, every chip out there is going to have as standard what would be considered bleeding edge power management features a few years ago.
Semiconductor foundries are going to play their part in keeping scaling going as much as possible while using as little power as possible, but the days of easy and assured power scaling is over. New materials and tweaks on the atomic level will help, but will not get us back to where we were.
On that down note, things turned to the more technical side, and the role ARM was playing there. On the 32nm process, there have been ARM cores fabricated on it as early as 2008. IBM made a chip called Explorer in July of that year, followed by a full Coretex-M3 in October of that year, and Global Foundries did the same in May of 2009.
This was followed up by Alpha PDK (Product Development Kit) IP validation chips from IBM in June 2009 and Samsung a month later. Most interesting is that the chips listed as being on the 32LP TC1a process from IBM and 32LP TC1b process from Samsung. Full silicon validation of the ARM IP was first done by Samsung in February 2010 on a process labeled 32LP TC2.
That said, 32nm is almost old news by now, and 28nm is far enough along so there won't be any major changes. 20nm is the next big thing, and ARM did not disappoint there. The company talked about their CP (Common Platform) 20nm SOC test chip based on a Coretex-M0. This core is .2mm x .2mm and contains 8K gates, 20K if you count the entire processor subsystem. This was overlaid to scale on an ARM2 chip, built on a 2µ process with 6K gates in total. The Coretex-M0 was a speck on the older chip, and probably used a commensurate amount of power.
Now that a company can make a CPU that is 1/25 of a square mm but has more performance than a cutting edge RISC machine from a few decades ago, what do you do with them? NXP is currently selling their Coretex-M0 variant for $.65 each, how much cheaper do you need? How do you communicate with them, or do you at all?
There isn't a specific answer to these questions, but the goal is what many call the 'internet of things', basically everything will be sensor enabled and aware of what it needs to be aware of. If you can make a chip that small and power it, all sorts of opportunities become available.