But once you start implementing a large set of different algorithms in hardware, eventually it becomes more efficient to just have fewer programmable processors do the same tasks. Granted, if you know exactly what your hardware is going to be doing when you design it, it makes most sense to design it specifically for that task. But since graphics cards can be given many different tasks, it makes much more sense to make them more generalized.
After all, if we imagine that nVidia had done nothing more than taken the TNT architecture and expanded its performance, we would currently have parts with around 32 pipelines. They would be beasts as far as performance was concerned, at least when doing basic 3D operations. But they wouldn't hold a candle to modern processors in terms of being able to render a believable scene, even if they had implemented a number of hardware-accelerated routines for shadows, lighting, bump mapping, etc.
Edit:
Put another way, even if you could make a clever fully hard-wired solution for accelerating 3D graphics, there would always be the game developer who says, "I want to do this!" and the hard-wired solution just won't have a way to do it, or at least not remotely efficiently.
So, this is clearly where programmability comes in: developers have a much wider range of algorithms which they can realistically apply. What's more, software development is accelerating at a rapid pace in 3D, and it makes much more sense to build hardware that can react to changing software, than to build hardware that needs to be re-built as software changes.