The act of making something "generic" as in just more fully programmable resource will result it it consuming more power for the same performance. Basically you may see what looks like a good area trade off in replacing a fixed function unit with programmable units, but when you look at power it rarely proves to be a win.
It all depends on the utilization. Clearly when a dedicated component is never used, it means that to achieve the same performance for the rest of the chip you'd have a higher power consumption than when you have the entire chip for performing the task. Note that reaching higher clock frequencies also requires increasing voltage, so the power consumption increase can be substantial. So dedicated hardware has to have a certain utilization level before it becomes worthwhile. More software diversity means the average utilization of non-generic components is decreasing though.
Also note that making things programmable enables clever software optimizations. Today's applications take a lot of detours just to fit the graphics pipeline. It's utterly ridiculous that on modern GPUs you often get no speedup from using LOD techniques. I'd rather use that headroom for something more useful,
and I'm not alone.
Even simply using dynamic code generation with specialization can avoid a lot of useless work. "The fastest instruction is the one that never gets executed."
The unification of vertex and pixel shading is a bad example as they where already both programmable units with very similar functionality so it was logical to merge them.
Despite both being programmable, vertex and pixel processing were quite different (at the time of the unification). Vertices need high precision, they need streaming memory access, and they use lots of matrix transforms. Pixels can generally use lower precision, need lots of texture accesses, use exp/log for gamma correction, use interpolator data, etc.
Also from a power consumption point of view, unifying them certainly wasn't logical. The GeForce 7900 and GeForce 8800 had competitive performance for contemporary games, but the latter had a significantly higher power consumption.
Despite these things, they did unify vertex and pixel processing, and they never looked back. So the real reason for the unification was to enable new capabilities and new workloads. And nowadays developers are still pushing the hardware to the limits, by using it for things it was hardly designed for. So while it might be hard to imagine what exactly they'll do with it, more programmability is always welcomed. It's an endless cycle between hardware designers saying that dedicated hardware is more power efficient, and software developers saying they care more about the flexibility to allow more creativity.
Performance scales at a dazzling rate anyhow. So you might as well make sure that it can be used for something more interesting than what was done with the last generation of hardware. Obviously fixed-function hardware can't be replaced with programmable hardware overnight, but surely by the end of this decade hundreds of TFLOPS will be cheap and power efficient and there will be no point making it more expensive with fixed-function hardware, or making it even cheaper by crippling certain uses.
Conversely, things like rasterisation are very clearly and well defined and are a perfect fit for fixed function and will always be best performed in dedicated HW.
I disagree. Rasterization is evolving, and even the very idea of polygon rasterization is crumbling. There's a lot of research on micropolygons, ray-tracing, volumetric rendering, etc., but IHVs better think twice before they dedicate considerable die space on it. Some games may love the ability to have fine-grained displacement mapping and advanced blur effects, others prefer accurate interreflections and advanced refraction effects, and others don't care about these things at all (and I'm not necessarily talking about GPGPU / HPC applications).
You see, every application has some task that would be executed more efficiently with dedicated hardware, but you simply can't cater for all of them. It's overall more interesting to have a fully programmable chip, than something which prefers certain usage.
When John Carmack first proposed floating-point pixel processing, certain people (including from this very forum) practically called him insane. But these people (understandably) never envisioned that by now we have Shader Model 5.0 and OpenCL, and it's still evolving toward allowing longer and more complex code. So with all due respect I think it would be really shortsighted to think that in ten years from now dedicated hardware will be as important as it is today.
Ultimately you only have to look at the already huge power consumption of modern desktop cards to see where excessive and compeletly unecessary programmability is going to lead.
The existence of cards with very high power consumption doesn't mean those are the norm. It also doesn't mean the programmability is excessive. Nobody cares if Unreal Tournament 2004 could run faster and more efficiently with fixed-function hardware. People care about Crysis 2 and every other contemporary application, which depend heavily on the shading performance. The next generation of consoles will also spur the creation of more diverse games which demand generic computing cores.
The simple reality is that GPUs won't increase in theoretic performance as fast as they did before. They've only been able to exceed Moore's Law because they played catch-up in process technology, aggressively increased the die size, and made the GPU the hottest part of your system. The only way the're now able to significantly increase performance, is by waiting for another process shrink to allow more transistors for a given power envelope. But do you want that to benefit particular applications, or do you want it to benefit all applications? So far the programmability has always increased, and I don't see this ending any time soon...