First, let's start by dissecting an x86/64 CPU.
There is a byte-stream of instruction code going in, that is taken apart and scheduled on multiple, different execution units. The execution units themselves don't run x86/64 instructions directly. Even more so: the whole architecture of the execution units is quite different from what the semantics of the instruction code makes you believe it looks like. It's a virtual CPU.
What execution units does it contain?
It starts with the JIT compiler (also known as the instruction decoder). Next would be the dispatcher, that forwards the instructions to the right execution unit. Functions (also known as complex instructions) are unrolled, or dispatched to the API (also known as a complex instruction unit, that uses microcode and could issue new instructions). And after that we get the scheduler, that schedules the instructions to the actual execution units. Which are basically the REAL processor cores.
And the same goes for GPUs. While they do most of the post-processing in a driver that runs on the CPU, they use the same model. Both are virtual processors. The actual execution units are inside a black box, and completely vendor-specific in their actual implementation.
If you want to run multiple tasks at the same time, you can use basically two different models: processes and threads. The main difference is, that each process runs in it's own virtual machine, while threads run in the same virtual machine as the parent process that spawned them. But the interesting part is in the virtual machine. Different processes are completely separated from each other. And it doesn't matter one bit on which processor or execution unit they run.
The whole multitasking model of modern computers is to give each process the illusion that it runs on it's own computer. Not only the processors are virtually cloned, but the memory and subsystems as well. Although there are of course I/O devices that can be accessed by all, sequentially.
If you put multiple processors on the same die, it's definitely much easier to just copy the cores. But, would the processes notice it if you would simply spread out the execution units (the real processor cores) and have the dispatcher and scheduler handle the distribution? Or if you switch to a different kind of core and update the JIT compiler? Of course not.
And adding a GPU is easy, when it's running in a VM. Using it's pipelines to run FP, MMX, 3DNow!, SSEx or whatever instructions makes sense. They're just more execution units. And the unified, scalar GPGPUs are ready for it.
The only potential problem I see here is, that you basically have to embed an OS on the chip to make it happen. Like a VMware ESX server: it's a black box, that runs their own (linux derived), invisible OS, on top of which you can run other OSes and/or applications. Do Intel and AMD want that hassle?
Then again, what else is there to do than add more of the same (and less useful) cores every process stap? Much cheaper in R&D, yes, but with fast diminishing gains. And the whole industry is heading that way in either case.
There is a byte-stream of instruction code going in, that is taken apart and scheduled on multiple, different execution units. The execution units themselves don't run x86/64 instructions directly. Even more so: the whole architecture of the execution units is quite different from what the semantics of the instruction code makes you believe it looks like. It's a virtual CPU.
What execution units does it contain?
It starts with the JIT compiler (also known as the instruction decoder). Next would be the dispatcher, that forwards the instructions to the right execution unit. Functions (also known as complex instructions) are unrolled, or dispatched to the API (also known as a complex instruction unit, that uses microcode and could issue new instructions). And after that we get the scheduler, that schedules the instructions to the actual execution units. Which are basically the REAL processor cores.
And the same goes for GPUs. While they do most of the post-processing in a driver that runs on the CPU, they use the same model. Both are virtual processors. The actual execution units are inside a black box, and completely vendor-specific in their actual implementation.
If you want to run multiple tasks at the same time, you can use basically two different models: processes and threads. The main difference is, that each process runs in it's own virtual machine, while threads run in the same virtual machine as the parent process that spawned them. But the interesting part is in the virtual machine. Different processes are completely separated from each other. And it doesn't matter one bit on which processor or execution unit they run.
The whole multitasking model of modern computers is to give each process the illusion that it runs on it's own computer. Not only the processors are virtually cloned, but the memory and subsystems as well. Although there are of course I/O devices that can be accessed by all, sequentially.
If you put multiple processors on the same die, it's definitely much easier to just copy the cores. But, would the processes notice it if you would simply spread out the execution units (the real processor cores) and have the dispatcher and scheduler handle the distribution? Or if you switch to a different kind of core and update the JIT compiler? Of course not.
And adding a GPU is easy, when it's running in a VM. Using it's pipelines to run FP, MMX, 3DNow!, SSEx or whatever instructions makes sense. They're just more execution units. And the unified, scalar GPGPUs are ready for it.
The only potential problem I see here is, that you basically have to embed an OS on the chip to make it happen. Like a VMware ESX server: it's a black box, that runs their own (linux derived), invisible OS, on top of which you can run other OSes and/or applications. Do Intel and AMD want that hassle?
Then again, what else is there to do than add more of the same (and less useful) cores every process stap? Much cheaper in R&D, yes, but with fast diminishing gains. And the whole industry is heading that way in either case.
Last edited by a moderator: