And with respect to realtime graphics ASICs:
First off, the notion that we are anywhere close to "good enough" is silly. Can you still tell the difference between best-of-breed game graphics and high-end offline CG (e.g. Finding Nemo or, gag-worthy as it was, The Hulk) displayed on your computer monitor? And can you still tell the difference between The Hulk and a film of a theoretical real-life Hulk? And, for that matter, can you still tell the difference between a film of actors (projected in a theater) and people in real life?
We have a long, long way to go.
With that said, one of the most notable features of consumer 3d ASICs is their extraordinary rate of performance gains all the way back to when they first hit in 1995. The rate of performance gains has arguably been faster for longer than that of any other category of IC. Will progress continue to be that rapid? To figure that out, we need to look at the sources of the steady performance gains:
1) Increasing size of the market. As 3d graphics have become more and more compelling, and 3d-capable consumer GPUs have become ever cheaper, the size of the market has grown immensely, from nil in 1995 to nearly saturation in the desktop space now. There are still significant market size gains to be made as 3d becomes more common in laptops--and near-total engineering overlap between laptop and desktop parts. And everyone is betting on large gains from diverse markets such as set-top boxes and PDAs, although GPUs targeted at those markets will necessarily require significantly different designs than desktop parts. And there might still be a decent bump from Longhorn.
But in general, this trick is played out: the fact of huge money on the demand side is now quite well matched by huge resources on the supply side at ATI and Nvidia.
2) An embarrassingly parallel problem. General-purpose CPUs have a very difficult time converting the increased transistor budget given them by Moore's Law into corresponding performance gains. That's because the limiting factor in general-purpose performance is extracting parallelism from inherently serial instruction streams, rather than putting up the resources to actually execute the instructions. The problem of putting an image on the screen--at least the way 3d rendering is done today--is, on the other hand, inherently parallel: each pixel's color value is calculated independently of any other's.
This isn't going to change. However, the workload involved in calculating each individual pixel's value is becoming more serialized and taking on some of the characteristics of general-purpose computing. This seems to be inevitable for us to continue to reap increases in realism. And increases in raw pixel throughput are worthless, as we can already output relatively simple pixels faster than monitors can display them.
Future hardware will have to make the tradeoff between (just to take the pixel shader pipeline as an example) a smaller number of more capable pixel pipes and a larger number with fewer resources per pipe. At the moment, this tradeoff is rather meaningless, since the pixel pipeline is so simple it can pretty well be described simply by counting its functional units. Once shader workloads become more like general-purpose computing--dominated by control flow rather than execution resources--shader pipelines will start to look more like the datapath of a general-purpose CPU, and then this tradeoff will become much more important.
3) One-time algorithmic increases. Perhaps the most significant reason GPUs have reaped performance gains at a faster rate than Moore's Law proceeds is that increasing transistor budgets have allowed not just for more parallelism in the hardware design, but for new algorithmic techniques that substantially improve efficiency. A perfect example is multisampling plus anisotropic filtering vs. supersampling. Each achieves the same ends (more or less), but MS + AF is an extraordinary efficiency gain over SS. But MS + AF also takes more logic to implement, particularly when you include the color compression that makes MS + AF significantly more bandwidth efficient than SS. Other examples are hierarchical Z, early Z reject, and Z compression. Yet another example is tile-based deferred rendering. Another is texture compression.
Greater transistor budgets allow designers to dedicate the resources necessary to implement more efficient algorithms in hardware. The resulting performance is often much greater than if the transistors had been used instead to provide more of the same naive functional units. The question is how many such leaps await us in the future, and how much of the low-hanging fruit has been picked. One such algorithmic improvement we can probably expect in the next couple years is analytical AA techniques like Z3. (Matrox's FAA is similar, but trades a simpler implementation for probably unacceptable artifacts in the form of missed edges.)
But how many others are there on the horizon? And will they enable us to sustain the benefits reaped in the last few years from MS + AF, compression, and early Z?
Beats me.
In the medium term, I'd say this is the biggest factor determining whether GPUs will continue to scale in performance at their remarkable historic rate.