VMX Units on the X360 CPU

Well that clears that up quite a bit. :D

I swear there's been so much noise over this that it's crazy. So, the EUs in the cores are not set up like pipes and it really is simply SMT for the sake of avoiding pipeline bubbles? Thus, what people and press have bandied about as the 6 (X360) versus 9 (PS3) simultaneous threads is in actuality, 3 versus 8 (assuming a cell operating at full efficiency).

Thanks! :D
 
Acert93 has a good description of what XeCPU really is. It's 3 cores with hyperthreading (or as IBM calls it SMT - Simultaneous Multi-threading). You will never get 6 full threads of performance from the XeCPU. For every core you add, performance scaling can be quite high as SMP configurations often can yield up to 1.8-1.9x scaling (for very good multi-threaded code). If you study SMT/hyperthreaded cores you will see that you get typically a 1.1-1.5x performance speedup for the 2nd thread with very good multi-threaded code. So net, individual core threads offer more performance scaling than SMT/hyperthreaded threads.

So think of XeCPU as 3 core threads + 3 SMT threads. CELL on the other hand is 8 core threads (1 PPE + 7 SPE's) + 1 SMT thread (from the PPE).
 
dcforest said:
Acert93 has a good description of what XeCPU really is. It's 3 cores with hyperthreading (or as IBM calls it SMT - Simultaneous Multi-threading). You will never get 6 full threads of performance from the XeCPU. For every core you add, performance scaling can be quite high as SMP configurations often can yield up to 1.8-1.9x scaling (for very good multi-threaded code). If you study SMT/hyperthreaded cores you will see that you get typically a 1.1-1.5x performance speedup for the 2nd thread with very good multi-threaded code. So net, individual core threads offer more performance scaling than SMT/hyperthreaded threads.

So think of XeCPU as 3 core threads + 3 SMT threads. CELL on the other hand is 8 core threads (1 PPE + 7 SPE's) + 1 SMT thread (from the PPE).
So you're saying that the cell really has 8 cores?
 
As far as I get it, I'd rather say that X360 has 3 cores and 3 co-processors, whereas Cell has one core and 8 co-processors. Assuming that the SPUs can't execute FP and integer calculations simultaneously :)

But it's still a bit messed up... for example, the components described as "FPU" and VMX are the same in both X360's and PS3's PowerPC cores, right?
 
archie4oz said:
It's 3 cores with hyperthreading (or as IBM calls it SMT - Simultaneous Multi-threading).

SMT is the 'clinical' name. Hyperthreading is Intel's marketing name...

Yes, I know that. But more people know of the feature as Hyperthreading due to Intel's marketing....
 
From what can be accomplished on an SPE, I'd lean toward calling it a core. Granted the PPE is a bit of a master core, but the SPEs do have a degree of independence.

Where are you getting the 3 coprocessors for the X360, L-Y?
 
But, so does each SPE - which, since the SPE has EUs other than the VMX, would make the SPEs 1 core and 1 coprocessor each.

I think based on how stuff is processed - the VMX should be included in the core definition (from my understanding it's not like the EE's VUs).
 
Last time I've read about it, the SPU was an evolution of the PS2's VUs... where's some hard info on this?
 
In a lot of ways, it is an evolution of the VU - primarily in its relationship to the PPE. That said, it's functionality has really been expanded beyond that and it has a much greater purpose. Debate rages on about how general purpose they are, but I lean toward the more GP side which is why I call them cores (that and that Sony has referred to them as such).

(anyhow - gotta split - have a nice weekend)
 
twotonfld said:
In a lot of ways, it is an evolution of the VU - primarily in its relationship to the PPE. That said, it's functionality has really been expanded beyond that and it has a much greater purpose. Debate rages on about how general purpose they are, but I lean toward the more GP side which is why I call them cores (that and that Sony has referred to them as such).

(anyhow - gotta split - have a nice weekend)

So you're saying that spe has as much functionality as a ppe. What do you need ppes for in that case?
 
That's what I don't get either. Arstechnica's article is very certain that the SPEs aren't compatible with the VMX in any way; otherwise their DP processing ability wouldn't be so limited, for example.
Actually, the author also suggests that the PPEs VMX core isn't as good as the one in the G5 CPU, either. The in-order execution certainly makes it inferior...
 
A VMX unit is like SSE, or 3D Now. Just another execution unit, next to the integer, floating point and load/store ones. It's only a different ALU that can execute a specific kind of instructions in the CPU core.

The CPU can execute, say, two instructions at the same time, which will use any two of the available execution units. So, an integer and a VMX, or an integer and a floating point, etc.

That's all.

Edit: if you look at it that way, an SPE is just another processor, that can execute two instructions at the same time, just like the PPC cores.
 
dcforest said:
Acert93 has a good description of what XeCPU really is. It's 3 cores with hyperthreading (or as IBM calls it SMT - Simultaneous Multi-threading).
Cell's PPE (and possibly XeCPU core) is fine-grain SMT while Hyper-Threading is out-of-order and has 2 sets of Architecture State, so they are different and HT is richer (it's almost dual-core when decoding).
 
Acert93 said:
twotonfld said:
According to MS's specs there are 2 HW threads per core which the majority of the net is taking to mean 6 threads processed per cycle. I think it's even been mentioned at B3D.

That is where the public has misinterpreted what 6 HW threads is.

XeCPU does have 6 HW threads, but they are not executed on concurrent cycles. Basically think of it as 3 P4's with hyperthreading. 3 real cores, each core capable of 2 threads in hardware. As Guden said, if it could execute both threads at the same time on the same cycle it would be a 6 core processor! Basically threads != cores. A lot of confusion right now, so don't worry. After fall it wont matter anymore ;)

SMT on something like a Pentium 4 does allow simultaneous execution of more than one thread at any given time: it allows the processor to issue an instruction from two threads and this is why Intel touts it as improving efficiency and IPC. It is not just about reducing time spent alternating execution from thread to thread: why do you think EV8 engineers made a big deal about SMT to feed a 8-way monster like Aranha/EV8 ?

Say your thread is not stalling because of mispredicts or cache misses, it has lots of work to do, but lots of work without much parallelism (lots of dependant instructions so we execute only one istruction per cycle leaving tons of units idle): say that the CPU running that thread is capable of executing 3 or 4 instructions per cycle and is currently executing an average of 1.5 instructions per cycle... say that we could pick an instruction from another thread and issue it to a free/idle execution unit. The actual issue phase might be delayed by a cycle or two, but execution definately overlaps hence why we tend to replicate some CPU logic to support SMT.
 
Panajev2001a said:
SMT on something like a Pentium 4 does allow simultaneous execution of more than one thread at any given time: it allows the processor to issue an instruction from two threads and this is why Intel touts it as improving efficiency and IPC. It is not just about reducing time spent alternating execution from thread to thread: why do you think EV8 engineers made a big deal about SMT to feed a 8-way monster like Aranha/EV8 ?

Say your thread is not stalling because of mispredicts or cache misses, it has lots of work to do, but lots of work without much parallelism (lots of dependant instructions so we execute only one istruction per cycle leaving tons of units idle): say that the CPU running that thread is capable of executing 3 or 4 instructions per cycle and is currently executing an average of 1.5 instructions per cycle... say that we could pick an instruction from another thread and issue it to a free/idle execution unit. The actual issue phase might be delayed by a cycle or two, but execution definately overlaps hence why we tend to replicate some CPU logic to support SMT.

AFAIK, a CPU with HyperThreading can have instructions of two separate threads in the pipeline at the same time, when it just switched tasks. Which is the only thing it does better than a CPU without HT: swithcing tasks with minimal overhead. It cannot run two tasks at the same time.
 
Hyper-Threading Technology provides thread-level-parallelism (TLP) on each processor resulting in increased utilization of processor execution resources. As a result, resource utilization yields higher processing throughput. Hyper-Threading Technology is a form of simultaneous multi-threading technology (SMT) where multiple threads of software applications can be run simultaneously on one processor. This is achieved by duplicating the architectural state on each processor, while sharing one set of processor execution resources. Hyper-Threading Technology also delivers faster response times for multi-tasking workload environments. By allowing the processor to use on-die resources that would otherwise have been idle, Hyper-Threading Technology provides a performance boost on multi-threading and multi-tasking operations for the Intel NetBurst® microarchitecture.

http://www.intel.com/technology/hyperthread/

Intel disagrees DiGuru:

http://or1cedar.intel.com/media/training/intro_ht_dt_v1/tutorial/index.htm

This is one of the many times Intel enphasize how improved is functional units' utilization per cycle is improoved: if reduced thread switching time was the only thing HT provided then they might as well call it SoEMT. How would the kind of SMT that only saves times by reducing the thread switching time do ANYTHING to improove IPC count ?

High processor utilization rates. One processor with two architectural states enables the processor to more efficiently utilize execution resources. Because the two threads share one set of execution resources, the second thread can use resources that would be otherwise idle if only one thread was executing. The result is an increased utilization of the execution resources within each physical processor package.

http://www.intel.com/business/bss/products/hyperthreading/server/index.htm
 
DiGuru, what Pana says is basically it.
intel's HT is essentially an orthogonal mechanism over the concept of super-scalarity (which nowadays is not ever 'scalar' anymore : ) HT allows that concurrent ops (in super-scalar terms) operate in different contexts. of course, it's not even ops but rather uops that exhibit this concurrency. in comparison to super-scalarity, though, HT has a kind-of diminishing-return aspect - higher non-determinism -- you can utilize much easier a super-scalar behaviour rather than hyper-threading -- latter is more a matter of statistics rather than planned effect ...but you can't beat the odds, they say : )
 
Back
Top