I'm going to hedge on the reverse coming to pass, as the x86 manufacturers are trying to install lightweight hardware monitoring and virtualization, while OS vendors are looking to virtualize.
All parties want to make those many cores useful without constraining hardware evolution or fragmenting the software base.
A lightweight control layer and VM that allocates computation might be the final result.
(Minor quibble, there is sort of a driver for the Speedstep functionality).
I'll buy that. Having some sort of lower-level VMM certainly makes sense. Yet, I wouldn't call that a "driver" in the sense that a driver is something you add on to an operating system or a "driver" in the sense of some pretty sophisticated software that translates high-level DX/OpenGL into hardware commands. Certainly virtualizing and managing parallelism is a real issue for future VMMs and operating systems. I personally am really interested to know more about this "Grand Central" technology that Apple is building into its next version of its OS.
I hope someone tries an ARM Larrabee, just for the comparison.
That is an interesting idea. Adding a big Larrabee-like vector unit to ARM makes lots of sense.
Atom's design was rumored to be an estimated +15% transistor heavy just because of x86 compatibility.
I'd believe that. When asking some Intel designers about Atom, I was told that they saw x86 as costing them extra transistors but a negligible cost in terms of power or performance. Basically, it came down to a fabrication cost issue, and Intel has the edge there.
Yet, Atom is clearly a disappointment. It is a chip aimed at a market that doesn't yet exist: something between a mobile PDA/phone/iPhone device and a full-blown laptop. Unless this new market segment takes off, I can't see Atom doing so well.
My question to Intel is: where is the really low power x86 to compete with ARM (but I guess that isn't really on-topic...)
Larrabee's design might reduce the percentage down, but we wouldn't know without the comparison.
I'm sure x86 does cost Larrabee something. Without such a big vector unit, the overheads would have likely killed them. At least with big vectors, they are able to amortize the cost of x86 support over a pretty big vector and texture unit. Yet, I see diminishing returns beyond 16-element vectors, so if Larrabee is going to scale, it needs to scale in terms of number of cores. If I was designing Larrabee, I would have been tempted to rip out lots of the legacy x86 stuff, but for some reason they really wanted all the Larrabee cores to be a fully compatible x86 core.