Nemo80 said:
Why do context switchin when you have two independent hardware VMX units, each capable of executing one thread independently of the other (in contrary to Xenos)?
The problem is the CELL PPE has 2 hardware VMX units on the DiE whereas Xenos only has one per core, with the difference that the Xenos single VMX unit has 128 registers, and each of the CELL PPE's VMX unit has their own 32 registers (=2x32). So there is no context switching. The only drawback of the CELL PPE is the shared L2 cachce, but even that is much worse on 360 Xenos.
Was it ever confirmed that the Cell PPE had 2 VMX units? Maybe two sets of registers, but 2 VMX units would imply nearly double the FP performance for the PPE than has been quoted by STI to date. If they could quote more FP performance I'm sure they would
(It's performance would be 8+8+4 * 3.2Ghz, vs the presumed 8+4*3.2Ghz, no?)
scificube said:
H
2. It's more related to having finite exectution resources. There really aren't enough there for two threads to fire off at 3.2GHz. There is no restriction on what resources a thread can use beyond whether they are available or not...so in essence it may be impossible to ever have enough execution resources if one makes their code greedy...is this correct? Makes sense to me because if you did elect to use one thread...why shouldn't you have access to all available resources?
...
If it's just faster context switching I still see the value in the optimization but I feel I've been duped into thinging allot more work could be done than really could be with these CPUs.
I've similar thoughts, although I don't quite feel "duped" because I just never bothered to look into it myself
My lingering question is if one thread isn't using an execution unit, can the other use it simultaneously? Of course, having one thread need it when the other doesn't could be tricky..
This is all quite elementary, and probably learned this in a class at one point. I feel quite embarrased to be unsure about it
As far as I can tell, and this is wholly academic of course, but if you switched between 2 threads evenly on a 3.2Ghz CPU, they'd both get 1.6 billion cycles to work with. They
are sharing resources (sans any clarification on the usage of an unused execution unit by a second thread), Of course, that's not necessarily just like a 1.6Ghz CPU - if each thread spent the same proportion of time blocked on the 1.6Ghz CPU as on the 3.2Ghz CPU, that'd be half the cycles again. Of course, threads may not spend half their time being blocked, a thread could be waiting for a while on a SMT processor even if it's ready to go. In which case, you could indeed be better off running your code on two seperate 1.6Ghz cpus.