Crytek on PS3/X360 (+ more - great read)

scooby_dooby said:
I though hyperthreading shared one set of execution values, while the X360 cores has two sets?

"This is achieved by duplicating the architectural state on each processor, while sharing one set of processor execution resources. "
http://www.intel.com/technology/hyperthread/

I don't think anyone is claiming that the 2 threads double the performance.

The state is what you need to load into registers etc. to run the thread. A dual threaded CPU keeps these available at all times so switching is fast - the reason switching is slow on non-threaded CPUs is because you have to save out one thread's state and load the other's. It's not a duplication of execution hardware or the like.
 
scooby_dooby said:
I though hyperthreading shared one set of execution values, while the X360 cores has two sets?

"This is achieved by duplicating the architectural state on each processor, while sharing one set of processor execution resources. "
http://www.intel.com/technology/hyperthread/

I don't think anyone is claiming that the 2 threads double the performance.

"....while sharing one set of processor execution resources. "

unless the resources doubles (and that would mean dual-core's with one thread each), 2 threads sharing one core is never 2x the performance.
but of course we know that for ages...
 
So what's the benefit of having two mirrored execution sets, as opposed to the hyperthreading approach of a single set?

Also, Intel has stated they saw a 15%-30% performance increase using hyperthreading, but that was on an OOO cpu. Would an IO CPU benefit much more from hyperthreading that OOO since OOO already does a much better job of keeping the CPU busy?

Seems to me like hyperthreading should be the solution to getting decent performance out of a IO core. When it stalls out...switch threads, isn't this what the extra set of execution units is supposed to help with??

Otherwise, why did they even put them in there?
 
Jawed said:
They're saying that it "performs like 1.5 threads", i.e. is losing efficiency.
Completely the wrong way to look at "HyperThreading" type processing as it should be gaining efficiency. No matter how you cut it, this has the capabilities of three processor cores - if each of those cores were running a single thread, inevitably the processors will not be able to run a peak utilisation 100% of the time - having two threads in context, in theory, means that where thread one isn't making good utilisation of the hardware in a single core, the second thread is there to make use of the available processing "slots". Effectively its increasing the processing capability by making more efficient use of the available processing time/execution units.

http://www.beyond3d.com/reviews/intel/p4306/index.php?p=2
 
If Cell's PPE uses 2 full HW threads that kinda means that the PPE is divided in 2 cores.....1,6ghz x 2 in itself.

That doesnt mean a Xenon core is slower....it just means that the core is working at full 3.2ghz, at double the speed, not divided, meaning it cant do 2 Full HW threads, it does more of a Hyperthreading thing.

Same as comparing 2 Pentium 4 at 1,6 VS a 1 Pentium 4 at 3,2 with HT. Wich one is faster?
 
Same as comparing 2 Pentium 4 at 1,6 VS a 1 Pentium 4 at 3,2 with HT. Wich one is faster?
The P4 3.2GHz, since each of the of 1.6GHz P4's will be less efficient at running each thread (of course, assuming the application is threaded, if not the two P4's are screwed).
 
Dave Baumann said:
The P4 3.2GHz, since each of the of 1.6GHz P4's will be less efficient at running each thread (of course, assuming the application is threaded, if not the two P4's are screwed).

It was kinda rethorical......but yeah. Thats what i was thinking.
 
dskneo said:
"....while sharing one set of processor execution resources. "

unless the resources doubles (and that would mean dual-core's with one thread each), 2 threads sharing one core is never 2x the performance.
but of course we know that for ages...

You are actually incorrect. There are actually several workloads that do achieve ~2x performance with a 2 thread design. These workload are generally extremely memory latency limited and as such, both threads are stalled most of the time. An example of this is walking a either a linked-list or a tree data structure.

Aaron Spink
Speaking for myself inc.
 
Dave Baumann said:
The P4 3.2GHz, since each of the of 1.6GHz P4's will be less efficient at running each thread (of course, assuming the application is threaded, if not the two P4's are screwed).

Depends how much the threads block, I would think..(?) Or their blocking behaviour.

I'm not sure how valid those comparisons are however to the actual situation. Any talk too of thinking of PPEs like 2x 1.6Ghz is conceptual rather than physically what's happening.

It'll be interesting to find out what the differences are though, that's for sure, and why he thinks the PPE model is "better".
 
Last edited by a moderator:
aaronspink said:
You are actually incorrect. There are actually several workloads that do achieve ~2x performance with a 2 thread design. These workload are generally extremely memory latency limited and as such, both threads are stalled most of the time. An example of this is walking a either a linked-list or a tree data structure.

Aaron Spink
Speaking for myself inc.

i dont buy it.... for a Cpu to gain 2x the peformance, the main thread process must be incredibly slow and erratic. Better throw the Cpu or program to garbage...

i have not once met the 2x figure in my HT cpu.... hell, with HT, games become SLOWER (but this is a P4 particularity, i think)
At best 25% witht most programs....
 
dskneo said:
i dont buy it.... for a Cpu to gain 2x the peformance, the main thread process must be incredibly slow and erratic. Better throw the Cpu or program to garbage...

i have not once met the 2x figure in my HT cpu.... hell, with HT, games become SLOWER (but this is a P4 particularity, i think)
At best 25% witht most programs....

An academic example would be two threads that block half of the time - on a dual-threaded CPU you'd increase utilisation (as opposed to performance, if they're different) 2x.

Obviously that's not a general case though, I'm sure alongside any workloads that benefit to that extreme there are others that benefit more toward the opposite end. Your overall improvement would have to take everything into account.
 
From what I could infer from my Babelfishing, it seems like CryTek is saying the PS3 will be their base platform for CryEngine 2 and will port accordingly from there. Or am I misinterpreting? Interesting if true.
 
And aren't these CPU's, since they're IO, going to be blocked much more than usual?

Therefore, doesn't HT give a much greater performance boost on these CPU's that it would on your typical OOO P4?
 
dskneo said:
i dont buy it.... for a Cpu to gain 2x the peformance, the main thread process must be incredibly slow and erratic. Better throw the Cpu or program to garbage...

i have not once met the 2x figure in my HT cpu.... hell, with HT, games become SLOWER (but this is a P4 particularity, i think)
At best 25% witht most programs....

You can buy it or not, I really don't care, since it really doesn't matter what you buy. There are workloads composed of things such as linked-list traversals or tree traversals that do infact get ~2x speed up on a processor with 2 hardware contexts. It has nothing to do with a garbage CPU or Program. Sometimes you have to use linked-lists or tree data structures and generally when you do, you have to search and traverse the structures.

Traversing a linked list is pretty much the operation that something like the LMbench memory latency test does and does happen in real workloads such as databases.

Aaron Spink
speaking for myself inc.
 
aaronspink said:
You can buy it or not, I really don't care, since it really doesn't matter what you buy. There are workloads composed of things such as linked-list traversals or tree traversals that do infact get ~2x speed up on a processor with 2 hardware contexts. It has nothing to do with a garbage CPU or Program. Sometimes you have to use linked-lists or tree data structures and generally when you do, you have to search and traverse the structures.

Traversing a linked list is pretty much the operation that something like the LMbench memory latency test does and does happen in real workloads such as databases.

Aaron Spink
speaking for myself inc.

lol i dont "buy it" the type of program/cpu that stalls half the time :)
it has to be really, Bad lol...
 
Dave Baumann said:
Completely the wrong way to look at "HyperThreading" type processing as it should be gaining efficiency. No matter how you cut it, this has the capabilities of three processor cores - if each of those cores were running a single thread, inevitably the processors will not be able to run a peak utilisation 100% of the time - having two threads in context, in theory, means that where thread one isn't making good utilisation of the hardware in a single core, the second thread is there to make use of the available processing "slots". Effectively its increasing the processing capability by making more efficient use of the available processing time/execution units.

http://www.beyond3d.com/reviews/intel/p4306/index.php?p=2
My comparison is with the "ideal" speed-up of running 2 independent threads independently, i.e. 2x.

The ultimate issue is whether a thread is latency-bound or compute-bound. If each thread is compute-bound (and Xenon is designed with streaming as a primary goal, so compute-bound algorithms are the target) then running them on two independent cores will deliver 2x the performance of threading them through a single core - while a "HT" core will achieve some other speed-up - typically 15% in the case P4.

Jawed
 
therealskywolf said:
If Cell's PPE uses 2 full HW threads that kinda means that the PPE is divided in 2 cores.....1,6ghz x 2 in itself.

That doesnt mean a Xenon core is slower....it just means that the core is working at full 3.2ghz, at double the speed, not divided, meaning it cant do 2 Full HW threads, it does more of a Hyperthreading thing.

Same as comparing 2 Pentium 4 at 1,6 VS a 1 Pentium 4 at 3,2 with HT. Wich one is faster?

If the two 1.6 P4's can only handle on thread while the 3.2 P4 can dual issue the 3.2 should win because it should be more efficiently in keeping it's execution elements busy where as the 2 1.6 CPUs will lose out for not being dual issue when OoOe is not enough to get the job done.

However in the case of Cell's PPE it was categorized as 2 "dual-issue" 1.6GHz CPUs so in this sense efficiency is greater than that of just two regular 1.6GHz CPUs. A core in MS's chip is also dual issue but with only VMX unit there are less execution resources available.

Someone said that MS's VMX unit is dual issue so this doesn't matter but that doesn't make sense to me. If a thread requests the VMX unit to do some work for it...it does the work...if a thread doesn't ask it for anything is it going to query the other one for work? ...and how will it assign itself to doing the other threads work when the original thread is still executing elsewhere? ....this makes no sense to me, but perhaps I need to be educated.

It would makes sense that the extra register space of MS's VMX unit is to make switching faster and/or to simply allow for more precision in calculations or data to be worked on or something.

I could be wrong...I'm just trying to get some understanding for things.
 
Last edited by a moderator:
liverkick said:
From what I could infer from my Babelfishing, it seems like CryTek is saying the PS3 will be their base platform for CryEngine 2 and will port accordingly from there. Or am I misinterpreting? Interesting if true.

Is there a specific part that makes you think that? Not sure, but it sounds like they're going to have a somewhat seperate engine for PS3 to take advantage of it fully.
 
liverkick said:
From what I could infer from my Babelfishing, it seems like CryTek is saying the PS3 will be their base platform for CryEngine 2 and will port accordingly from there. Or am I misinterpreting? Interesting if true.


Sounds to me like hey are saying the PS3 is much more complex to dev for that they need to build it's own engine from scratchin order to take advanage of Cell's strengths. This is no surprise to me, it's widely excepted that developing for Cell is going to be easily more difficult than PC or 360 and goes along with what the Valve\Carmack guys said the other day on 1up.
 
Last edited by a moderator:
Back
Top