HotChips 17 - More info upcoming on CELL & PS3

Status
Not open for further replies.
PC-Engine said:
This got me thinking a 4 core CPU at 65nm would be pretty sweet and cheap too.
uh-oh. Here comes some more 4-core Revolution talk... :p

Oh and as far as yields are concerned how many transistors does CELL have compared to a 4-core XCPU?
A four core XCPU would need all those transistors working. A PS3 Cell can cope with several defects as long as 7 out of the 8 cores remain fully functional. Yields on a four core XCPU would thus be much lower which is presumably in part why it wasn't used.
 
Fafalada said:
Aren't the thermal characteristics already quite a bit worse then a 3.2Ghz Cell(especially 7:1Cell)? I'd be more worried about that then transistor counts(which are rather meaningless until we know the die-size).

Yes but that could easily be managed just like a desktop Prescott. A 4 core would be less than 200 million transistors and comparable to the CELL's die size according to my back of napkin calculations

uh-oh. Here comes some more 4-core Revolution talk...

:LOL:

Yields on a four core XCPU would thus be much lower which is presumably in part why it wasn't used.

That's true which increases the cost.
 
Eight threads all competing over 1MB of L2 (of which some may well be locked off for use by xenos), surely not the most brilliant of ideas... I think six will be quite enough to play around with. It's not as if game performance will scale linearly with the number of cores anyway, as has already been pointed out, you have to have something to run on all of those cores too for them to actually be of any use.

And before someone responds in knee-jerk fashion about cell, because NO, cell is different. You can pipeline SPUs internally to do stream processing between several processors. This isn't feasible with standard CPUs, or at least not without doing lots of main RAM accesses and synchronization headache between threads, which'll most likely lead to performance penalties, not gains...
 
Guden Oden said:
Eight threads all competing over 1MB of L2 (of which some may well be locked off for use by xenos), surely not the most brilliant of ideas... I think six will be quite enough to play around with. It's not as if game performance will scale linearly with the number of cores anyway, as has already been pointed out, you have to have something to run on all of those cores too for them to actually be of any use.

And before someone responds in knee-jerk fashion about cell, because NO, cell is different. You can pipeline SPUs internally to do stream processing between several processors. This isn't feasible with standard CPUs, or at least not without doing lots of main RAM accesses and synchronization headache between threads, which'll most likely lead to performance penalties, not gains...

I would say going from 330KB to 256KB while getting a nice boost in fp performance is a nice tradeoff.
 
Guden Oden said:
And I would say it's not that simple.

Besides, 1MB / 8 != 256kb...

Of course it's that simple. The ARM MPCore has 4 cores, besides I'm talking about cores, you're talking about threads.
 
Last edited by a moderator:
version said:
cx.JPG


The PPE of the Cell ps3 will possess 2 VMX units (16Flops) as in version 2?
 
PC-Engine said:
Of course it's that simple.
Why would it be? There's no hardware partition that splits cache evenly amongst the various threads. How cache performance behaves will vary wildly depending on how aggressively various threads access memory. There will be A LOT of jostling of elbows in that cache, that's for sure, because one thread accessing memory will flush out lines belonging to some other thread. It's unavoidable due to the nature of cache memories, and the only way to alleviate it is to make cache lines smaller and cache size larger. I believe the former screws with frequency scaling, and the latter is obviously very costly (and also screws with scaling, since more lines to search on access is difficult or even impossible without increasing latency).
 
The PPEs are designed for SMP scaling from 1 core to 4 or more cores. There is no rule that three cores needs 1MB of L2. They chose 1MB simply because it's a nice round number for a unified/shared cache not because it's some holy grail for 3 cores. Adding a 4th core isn't going to significantly disturb that balance.
 
Dude, it's a simple fact of physics that the more threads you got competing for the same resources, the less resources there are to go around. Then you can come and offer whatever fanb0y BS you want that you read on the internet, but it won't CHANGE THAT.

More threads = more jostling in the cache. They'll also jostle more on the FSB, and hence in main memory. It's a simple fact of life. It's like a subway car really. You can only squeeze in so many guys before it's full, even if there are lots of guys waiting on the platform trying to get on to go to work in the morning.
 
Guden Oden said:
Dude, it's a simple fact of physics that the more threads you got competing for the same resources, the less resources there are to go around. Then you can come and offer whatever fanb0y BS you want that you read on the internet, but it won't CHANGE THAT.

More threads = more jostling in the cache. They'll also jostle more on the FSB, and hence in main memory. It's a simple fact of life. It's like a subway car really. You can only squeeze in so many guys before it's full, even if there are lots of guys waiting on the platform trying to get on to go to work in the morning.

I wonder what would happen if 8 threads needed all to read from RAM location 0x<PanaMadeThisHexValueUp> (or locations that end to be placed in the same cache-line in the shared L2 block) and in their computation ;) ?

I do not see the L2 trashed that much ;).

I would have rather had 2 MB of L2 in Xenon's CPU if I were developing on it... there is never enough cache, but 1 MB is not a dramatically low amount if you know what you are doing... so yes, things are not that simple after-all :).


Edit: Argh another post... hopefully intelligent discussions will disappear again so I can be calm and quiet lol.
 
Last edited by a moderator:
Guden Oden said:
Dude, it's a simple fact of physics that the more threads you got competing for the same resources, the less resources there are to go around. Then you can come and offer whatever fanb0y BS you want that you read on the internet, but it won't CHANGE THAT.

More threads = more jostling in the cache. They'll also jostle more on the FSB, and hence in main memory. It's a simple fact of life. It's like a subway car really. You can only squeeze in so many guys before it's full, even if there are lots of guys waiting on the platform trying to get on to go to work in the morning.

Stop pointing out the obvious as if it actually means something with respect to a 4 core CPU. Keep on thinking 1MB L2 is the Holy Grail for 3 cores while magically becoming a deathbead for 4. You don't have a clue what you're talking about dude.
 
PC-Engine said:
Looks like MS could've added a 4th PPE and still kept the transistor count below a single PS3 CELL cpu.;)

They didn't add a redundant core, for better yield, which probably mean the configuration for four cores would be quite alot larger in term of die size. Transistors count don't matter much in term of what you can add, you need to look at size and thermal.
 
V3 said:
They didn't add a redundant core, for better yield, which probably mean the configuration for four cores would be quite alot larger in term of die size. Transistors count don't matter much in term of what you can add, you need to look at size and thermal.

With big cores you don't need to add redundant cores. If you did, that would mean your fab process sucks or your chip design layout sucks. I'm also aware that transistor counts aren't completely relevant, but they're not completely irrelevant either. When you know the transistor count and when you know how many of those transistors are for cache, then you can get a good idea how many are for logic. We know the amount of cache/LS on XCPU and CELL and we know the total transistor counts.

When you have a huge chip that takes up over 220 sq mm of space that has 8 small cores, then it's obvious you might have problems with one of those cores. With 4 big cores that take up less than 200 sq mm, there's no need for redundant cores. It's simple chip engineering design philosophy. XCPU has only three cores because it's a compromise between power consumption, cost and performance. It doesn't mean a 4 core implementation isn't feasible.
 
Last edited by a moderator:
PC-Engine said:
With big cores you don't need to add redundant cores. If you did, that would mean your fab process sucks. I'm also aware that transistor counts aren't completely relevant, but they're not completely irrelevant either. When you know the transistor count and when you know how many of those transistors are for cache, then you can get a good idea how many are for logic. We know the amount of cache on XCPU and CELL and we know the total tranistor counts. When you have a huge chip that takes up over 220 sq mm of space that has 8 small cores, then it's obvious you might have problems with one of those cores. With 4 big cores that take up less than 200 sq mm, there's no need for redundant cores. It's simple chip engineering design philosophy. XCPU has only three cores because it's a compromise between power consumption, cost and performance. It doesn't mean a 4 core implementation is unfeasible.

Hmm aren't those cores small though ? That's a reason why they can fit 3 and your reason to fit another one. Cell SPUs are smaller, but still the PPU in Cell are small core too.
 
V3 said:
Hmm aren't those cores small though ? That's a reason why they can fit 3 and your reason to fit another one. Cell SPUs are smaller, but still the PPU in Cell are small core too.

Well compared to SPUs the PPE cores are pretty big. Regardless 4 is half of 8 so you wouldn't need a 5 redundant core.
 
PC-Engine said:
Well compared to SPUs the PPE cores are pretty big. Regardless 4 is half of 8 so you wouldn't need a 5 redundant core.

But you can't know that from speculation, even manufacturer themself wouldn't know without trial.
 
Status
Not open for further replies.
Back
Top