So lets say the PS3 were to ship with 8 SPE's enabled

Isn't the PPE dual-threaded?
If he's being particularly clever here he might count 2+7=9, with the first two coming from the PPE.
 
Isn't the PPE dual-threaded?
If he's being particularly clever here he might count 2+7=9, with the first two coming from the PPE.

Dual-Threaded doesn't mean Dual core... :smile: Personally i'm not even sure which "flavour" the PPE actually is... I seem to remember it's just dual-threaded and not a double core architecture.
 
Dual-Threaded doesn't mean Dual core... :smile: Personally i'm not even sure which "flavour" the PPE actually is... I seem to remember it's just dual-threaded and not a double core architecture.
Yeah sure, but would that stop you if you were trying to market something? :D
It's not even a lie, just a slight imprecision :devilish:
 
In IBM's own applications, they happen to be using all 8 SPEs, so of course, and he did say that they will jointly manufacture, so the mask they produce has 8 SPEs. Whether or not all 8 must be working or not is a problem unto the individual application. For IBM, they're pushing it in the workstation/server/HPC space, so ASP isn't much of an issue compared to a console. For PS3, that's Sony's decision to make. If by some miracle, yields skyrocket before the end of the month, then we may see all 8 active in PS3 -- I don't believe in miracles, though.
 
With 1 additional SPE, it may mean that Sony has more room for its OS and supporting concurrent tasks. The end result may be more responsive interaction (in very specific areas only).

It may also mean slightly easier programming later since the developers have 1 more SPE to assign their tasks to. Or for embarrasingly parallel tasks on SPEs, close to 100% better performance.

For general applications and computation throughput, I doubt it would improve much since Cell is likely to be under-utilized (or we have hit other bottlenecks like what London Boy mentioned).
 
With 1 additional SPE, it may mean that Sony has more room for its OS and supporting concurrent tasks. The end result may be more responsive interaction (in very specific areas only).

It may also mean slightly easier programming later since the developers have 1 more SPE to assign their tasks to. Or for embarrasingly parallel tasks on SPEs, close to 100% better performance.

For general applications and computation throughput, I doubt it would improve much since Cell is likely to be under-utilized (or we have hit other bottlenecks like what London Boy mentioned).

It would be a pretty strange program that gets 100% improvement from the addition of 1/6 more hardware resources.

It would probably mean someone was hobbling the chip on purpose.

If( #of.SPEs =6 wait.half.the.time())
else continue.with.what.you.were.doing.free.of.fake.penalty()
 
It would be a pretty strange program that gets 100% improvement from the addition of 1/6 more hardware resources.

It would probably mean someone was hobbling the chip on purpose.

If( #of.SPEs =6 wait.half.the.time())
else continue.with.what.you.were.doing.free.of.fake.penalty()

Sorry for not being clear. 100% w.r.t. sequential execution by 1 SPU core. So assuming the algo scales linearly, 1 more SPU should fetch another 100% speedup.

I lay it out that way because I don't know how many SPUs a dev will use for a partciular task. And yes, the dev may have to code for it (i.e. know that he/she can take advantage of 1 more SPU).
 
Sorry for not being clear. 100% w.r.t. sequential execution by 1 SPU core. So assuming the algo scales linearly, 1 more SPU should fetch another 100% speedup.

I lay it out that way because I don't know how many SPUs a dev will use for a partciular task. And yes, the dev may have to code for it (i.e. know that he/she can take advantage of 1 more SPU).
It's a contrived case ;)
If that piece of the program were critical, if the speedup was useful, it wouldn't be limited to a single SPE to begin with.

I'd like to believe that programmers are able to use an SPE for more than one thing.
Flushing and reloading the entire local store of an SPE to/from XDR takes something on the order of 20µs. You can easily do that once or twice per frame. You can use "half" an SPE or a third etc. There's not that hard granularity that could make a light-weight task consume an entire SPE. Thus IMO you should always be able to balance things out properly, according to the requirements of the application.
An extra SPE is just 14% more of the same. It doesn't open up any new possibilites.
 
It's a contrived case ;)
If that piece of the program were critical, if the speedup was useful, it wouldn't be limited to a single SPE to begin with.

I'd like to believe that programmers are able to use an SPE for more than one thing.
Flushing and reloading the entire local store of an SPE to/from XDR takes something on the order of 20µs. You can easily do that once or twice per frame. You can use "half" an SPE or a third etc. There's not that hard granularity that could make a light-weight task consume an entire SPE. Thus IMO you should always be able to balance things out properly, according to the requirements of the application.
An extra SPE is just 14% more of the same. It doesn't open up any new possibilites.

That would be the sensible and most efficient way of using the SPEs. Rather than doing multi-tasking as you would with a conventional processor which requires frequent context switches, it is better to process in batches - ie. process one set of objects related to one task say to do geometry and animation and store them, then reload the local store and process another set of objects related to another task say AI.

If you have to do multi-tasking on an SPE, you would have to write the code so the two sets of data and code could co-exist with minimal requirements for swapping out from the local store.
 
It's a contrived case ;)
If that piece of the program were critical, if the speedup was useful, it wouldn't be limited to a single SPE to begin with.

True, but I am not saying that only 1 SPU is in use in these situations. Depending on the problem (e.g., Blu-ray 40Mbps viewing), having 1 more SPU to help in a particular bottlenecked stage (CABAC decoding), can make a real difference to meet a performance level.

I used 100% more power (i.e., normalized to 1 SPU) because I don't know how many SPUs are assigned to say CABAC, while the rest may be used for BD-J and other AVC HP decoding stages.

In such real-time scenarios, it may not be about the average/overall increase in computation. I just felt that the "general 14% increase in SPE computation power" statement does not convey the significance of added "pairs of hands" in needy situations adequately.

I'd like to believe that programmers are able to use an SPE for more than one thing.
Flushing and reloading the entire local store of an SPE to/from XDR takes something on the order of 20µs. You can easily do that once or twice per frame. You can use "half" an SPE or a third etc. There's not that hard granularity that could make a light-weight task consume an entire SPE. Thus IMO you should always be able to balance things out properly, according to the requirements of the application.
An extra SPE is just 14% more of the same. It doesn't open up any new possibilites.

Sure... Perhaps Nariko may not have constantly flowing hair and dress in Heavenly Sword in an 1,000 enemy fight, without guaranteeing 1 "logical" SPE to simulate hair and cloth.
That technically correct "14% power increase" statement may not do justice to what goes on behind the scene.


EDIT: The other way to express my view is:

* If Cell has 1024 cores, then I'd agree with your statement that it's just X% increase in computation. But with just 1 + 7 cores (and almost 1 reserved for OS), then 1 additional SPE core is still valuable and can make a difference.
 
Last edited by a moderator:
Back
Top