A fast response to the opening post before I read the rest (which will probably happen tomorrow or so):
It's extremely expensive to develop a new processor, while it's relatively easy to put more existing ones on a single die and interconnect them. And you would really have to develop a number of different processors to fill all the needed performance areas.
x86 processors have evolved by cramming in many specialized sub-processors as execution units in that single, monolitic core. From an engineering point of view, it's much easier and cheaper (even while being a waste of transistors and die area) to add more monolithic cores of which the specialized parts can be used as required, and it's also a lot easier to develop when you have only one instruction architecture to take into account.
From that perspective, it makes more sense to offer hardware support for running virtual machines on the otherwise unused cores and have them all run the same programs, than making specialized units.
So, expect a hybrid in the medium future: something like 4 x86 cores, with some vector (stream) and other specialized units, an I/O, DMA and scheduling processor, and a GPU.
It's extremely expensive to develop a new processor, while it's relatively easy to put more existing ones on a single die and interconnect them. And you would really have to develop a number of different processors to fill all the needed performance areas.
x86 processors have evolved by cramming in many specialized sub-processors as execution units in that single, monolitic core. From an engineering point of view, it's much easier and cheaper (even while being a waste of transistors and die area) to add more monolithic cores of which the specialized parts can be used as required, and it's also a lot easier to develop when you have only one instruction architecture to take into account.
From that perspective, it makes more sense to offer hardware support for running virtual machines on the otherwise unused cores and have them all run the same programs, than making specialized units.
So, expect a hybrid in the medium future: something like 4 x86 cores, with some vector (stream) and other specialized units, an I/O, DMA and scheduling processor, and a GPU.