I like the one shader to rule them all approach.
That doesn't mean you wouldn't have dedicated units for things like rasterization, texture fetch/filter, interpolation, Heirarchical Z, FSAA or what have you. It just means that data could be fed to and recieved from each of these units from a "central" processing location. Basically, the focus of the chip would be a large, flexible, and fast array of programmeable elements, with dedicated hardware for tasks common and/or beneficial to most rendering pipelines.