Crytek on PS3/X360 (+ more - great read)

Shifty Geezer said:
For twice the VMX performance! XeCPU can have one VMX instruction running at a time, switching between threads. PPE can have 2 instructions running, presumably either one per thread or two dual-issued on the one thread.

But information about the USE of dual VMXs never state that. They state that the second vmx reduces latency by essentially always having the next vector operation completed without waiting for the register to clear. XeCPU seems to accomplish the same thing but in a single VMX unit.
 
blakjedi said:
But information about the USE of dual VMXs never state that. They state that the second vmx reduces latency by essentially always having the next vector operation completed without waiting for the register to clear. XeCPU seems to accomplish the same thing but in a single VMX unit.

still something has to be there making the cell implementation the faster one.
 
blakjedi said:
But information about the USE of dual VMXs never state that. They state that the second vmx reduces latency by essentially always having the next vector operation completed without waiting for the register to clear. XeCPU seems to accomplish the same thing but in a single VMX unit.
That'd suggest alternating instructions between VMX units, in which case if that has a benefit, how cana single VMX unit match that? If IBM can keep the same speed increase with one VMX as two, surely they'd do so? Unless there's some magic and the IP is MS's that doesn't seem overly plausible to me.
 
lets not forget his comment that the PS3 implementation is "slightly better than hyper threading"

this isn't a big deal, if the x360 is 1.5, maybe the PS3 is 1.6, does it really matter? The difference is not much.

more interesting than that IMO is teh reference of a 50% performance increase from the extra thread, that's more than I would have thought.
 
scooby_dooby said:
lets not forget his comment that the PS3 implementation is "slightly better than hyper threading"

this isn't a big deal, if the x360 is 1.5, maybe the PS3 is 1.6, does it really matter? The difference is not much.

more interesting than that IMO is teh reference of a 50% performance increase from the extra thread, that's more than I would have thought.

It may not be that much, or it might be more. It really depends on the implementation and what kind of code is being run. From the context of the sentence, my guess is that 1.5x is just a number he pulled out of his ass to imply that you wouldn't be seeing 2x performance from 2 hardware threads. If the implementation is SOEMT, I really tend to wonder if 50% is perhaps giving them the benefit of the doubt.

With Cell, it really depends on what they are doing. If Shifty is correct about 2 VMX execution units it could provide a nice improvement in VMX performance, but I don't think we've established this is the case yet?

Nite_Hawk
 
Shifty Geezer said:
That'd suggest alternating instructions between VMX units, in which case if that has a benefit, how cana single VMX unit match that? If IBM can keep the same speed increase with one VMX as two, surely they'd do so? Unless there's some magic and the IP is MS's that doesn't seem overly plausible to me.

Wouldnt you need 4 vmx units or 4 sets of registers to accomplish this with two independent threads?
 
blakjedi said:
[/u]SMT on the POWER5, where the hardware support comes mainly in the form of duplicated registers."

Ars is incorrect. There is a lot of support throughout the pipeline in Power5 to support the SMT execution as well as a general upgrading of functionality and resources throughout the pipeline to further optimize it for SMT operations.

Now what Im trying to understand is if a single VMX-128 unit on Xenon can have two sets of registers (one for each context), then why does Cell PPE have two full VMX units?

Does it? Some people of asserted this but there really isn't anything to collaborate this from anyone that really knows. In addition, 2 VMX units would be a waste unless the issue width of the processor was increased. Given that the PPE is AFAWK 2 issue, dual VMX would have marginal at best real world performance improvements over a 1 VMX design.

Aaron Spink
speaking for myself inc.
 
aaronspink said:
Um, why? AKA why you think the PPE will be faster?

I think he's referring to Crytek's implication that the PPE was "somewhat better" in terms of threading (whether that means faster or not is another issue).

I'm sceptical of the 2 VMX unit claims too though - surely that would boost theoretical floating point peaks over the announced figures, and I'm sure Sony wouldn't have shied away from including that in their spec.
 
scooby_dooby said:
lets not forget his comment that the PS3 implementation is "slightly better than hyper threading"

this isn't a big deal, if the x360 is 1.5, maybe the PS3 is 1.6, does it really matter? The difference is not much.

more interesting than that IMO is teh reference of a 50% performance increase from the extra thread, that's more than I would have thought.


There are two different translations in this thread, one saying 'somewhat better', the other stating 'slightly better'

It seems there is enough of a difference for him to state so.
 
CrazyAce's post, above, provides all the evidence needed for VMX operation. The PPE cannot dual-issue math operations within the vector scalar unit. Dual-issues involving, e.g., a load and a math op in the VSU are possible, though.

etc.

Jawed
 
Wasn't the theory behind 2 VMX units based on the Die pics, comparing V1 to V2 of Cell? Crazyace's post suggests it's more akin to the dual-threading of SPU's (where you can run an int and a float concurrently) but then again is that diagram from version 2? :???:



:???: :?: :oops:
 
aaronspink said:
Does it? Some people of asserted this but there really isn't anything to collaborate this from anyone that really knows. In addition, 2 VMX units would be a waste unless the issue width of the processor was increased. Given that the PPE is AFAWK 2 issue, dual VMX would have marginal at best real world performance improvements over a 1 VMX design.

The 'evidence' for 2 VMX units seems to be the die shots of DD1.0 and DD2.0.

However this is just wishful thinking, most people don't know WHAT DD1.0 was and more to the point what its implementation of VMX was like.
 
Back
Top