I imagine we'll keep the rasterizer...
Not likely. Tesselation is making some triangles really tiny, while other triangles remain pretty large. So you'd need to spend a large portion of the die area on a dedicated rasterizer to sustain the maximum triangle rate, but it's going to be idle much of the time. So eventually it's more efficient to just replace the rasterizer with more shader cores and get high utilization all the time.
Other tasks also benefit more from having extra programmable cores versus a bulky rasterizer.
It's very similar to the vertex and pixel pipeline unification that took place several years ago. Applications were held back by the ratio of vertex and pixel pipelines. Unification fixed this and also enabled new uses. Programmable rasterization is one of the next steps to ensure you can throw almost anything at the GPU and have it processed efficiently.
...ROP/compression logic (so you can basically rasterize a compressed MSAA buffer of arbitrary data) but it's not important to have fixed-function resolve. My guess is it will just be generalized to allow the programmable stuff to deal with the compressed data slightly more efficiently.
Framebuffer compression can also perfectly be handled by programmable hardware. More local storage is needed though, and to make that available GPUs should reduce the number of threads they need to keep in flight, by reducing execution latencies (Fermi needs to hide at least 24 cycles, Larrabee only 4 cycles, and it can be further reduced with out-of-order execution - which isn't all that expensive when you have very wide vectors).
Rasterizers, ROPs and even texture samplers, can in time all be replaced by more generic cores.
nAo said:
With power consumption being the number one constraint these days it's likely fixed function HW will keep us company for a long long time..
While power consumption is definitely a big constraint, I don't think it's the number one constraint. You can't have too much dedicated hardware even when taking power consumption out of the equation, because the chip would just get too big (read: expensive). Performance/dollar still dominates performance/Watt.
Looking at the future evolution, peak performance/Watt will steadily improve with process technology, but effective performance/dollar improves more slowly. Cost determines the die size and power consumption determines the peak performance, but getting high effective performance requires high utilization.