Lysander said:But, then X2cpu is IBM`s "only PPE" idea chip.
scificube said:I'm thinking of Deano Cleaver's comments as to not thinking of the PPE as a single 3.2GHz core but two 1.6GHz cores. If you would only use thread on the PPE you would be effectively using a 1.6GHz CPU.
blakjedi said:I always wondered about that quote.
wouldnt the XeCPU be 3/3.2 = 1.06 per core then?
wouldnt the Cell then be 8/3.2 = 400Mhz per core?
randycat99 said:The whole 2x1.6 Ghz idea is predicated on 2 threads getting "equal-time" on the actual execution process. A goes, then B, then A, and so on... Conceptually, it could operate anywhere between 2x1.6 Ghz to 1x3.2 Ghz at a given moment, depending on just how "hungry" a particular thread ends up being. You can design hardware for 2 threads or 2 "gazillion" threads, but it's always got to shoehorn into the same one core. There is no extraction of magical clock cycles of execution from thin air. You are simply maximizing useage of the finite handful of execution cycles that the hardware can facilitate.
"whats left from the CELL deal"-garbage
Titanio said:I'm slightly confused by this, but it's an issue I've wondered about for a while so..
..what's the distinction between a core/cpu with support for 2 "hardware" threads vs a core/cpu that supports only 1 "hardware" thread but just switches between software threads? Does it just provide for faster switching between two threads or..?
randycat99 said:Functionally, they accomplish a very similar effect- it's just the thread granularity is taken 1 step closer to the hardware when it comes to "hardware threads" (I guess "in the hardware" would be a better description). We are digging deeper into the hardware itself to recover "unused" execution cycles.
randycat99 said:The whole 2x1.6 Ghz idea is predicated on 2 threads getting "equal-time" on the actual execution process. A goes, then B, then A, and so on... Conceptually, it could operate anywhere between 2x1.6 Ghz to 1x3.2 Ghz at a given moment, depending on just how "hungry" a particular thread ends up being. You can design hardware for 2 threads or 2 "gazillion" threads, but it's always got to shoehorn into the same one core. There is no extraction of magical clock cycles of execution from thin air. You are simply maximizing useage of the finite handful of execution cycles that the hardware can facilitate.
Titanio said:I'm slightly confused by this, but it's an issue I've wondered about for a while so..
..what's the distinction between a core/cpu with support for 2 "hardware" threads vs a core/cpu that supports only 1 "hardware" thread but just switches between software threads? Does it just provide for faster switching between two threads or..?
aaaaa00 said:A thread context switch is really expensive in software, but cheap in hardware.
You could (on a PC) burn thousands of cycles pretty easily on a software managed context switch between two threads in any of the modern OSes, but in hardware like a hyperthreaded P4, you can hardware context switch on the next cycle between two threads, because all the register state is duplicated on chip between them.
Basically, software context switching is a way to make the CPU appear to run more than one thing from the user's point of view, since humans operate on the 100+ millisecond time scale, and blowing thousands of cycles doesn't mean a whole lot on that time scale.
But hardware context switching is a technique to recover wasted execution cycles from the CPU's point of view by having something else immediately ready to run and use the idle execution units, because in the CPU's timescale it does care about wasting a few hundred cycles waiting for memory.
Titanio said:Which leads to the question, would you be ill-advised to use more threads in your application than you can keep "in hardware"?
When one thread is waiting because of data/memory dependancy, execution can switch to the other thread, thus keeping the execution units busy more of the time.But, how is now optimisation of idle unites achieved; what is the benefit of all this?
Nemo80 said:Almost right, though the latest CELL revision contains two full hardware VMX units, which can be seen from the DiE shots that go around in the internet. This means the CELL PPE can run 2 independent threads at full speed. The only limitation which counts in is the shared L2 cache.
But situation is even worse on Xenon. These cores contain only one VMX unit per core, enabling the Xbox 360 to do 3 "real" threads at once, only one more than the CELL PPE can handle. (of course the Xenon VMX unit can do hyper threading which gives a little performance boost but not much). But what is much worse is that all these threads on Xenon are blocked by each other because they all share one quite small (for so many threads) 1 meg L2 cache.
therealskywolf said:Yes but its VMX 128. The cores in Xenon are considerably more robust than the single core in the PPE.