Does Cell Have Any Other Advantages Over XCPU Other Than FLOPS?

Crazyace · Nov 29, 2005

Section 2.4 SPU Events.
....interupt handler located at local storage address '0'

Figure 9-1. Logical Representation of SPU Event Support

Section 9.12.8 SPU Signal Notification 2 Available Event
Section 9.12.9 SPU Signal Notification 1 Available Event

Show interupts driven by external writes to SNR registers.

ihamoitc2005 · Nov 29, 2005

Fair comparison.

aaronspink said:
You aren't comparing apples to apples. The Xenos die has significant additional functionality beyond just the GPU portion.

Aaron Spink
speaking for myself inc.

You are confusing yourself my friend. It was you who said Xenos is better companion processor to PPE than SPE (you said you prefer 2 PPE each with its own Xenos) but now you seem to prefer altered Xenos with different architecture without additional functions beyond ALUs. Without additional functions beyond ALUs, Xenos is not Xenos.

In similar manner, for SPE, calculation portion of SPE is only 1/3 of SPE transistors, yet I included entire SPE because SPE is not SPE without all components.

On the other hand, I left out edram unit of Xenos because that is not fundamental to what is Xenos, so in careful comparison without changing fundamental architecture of Xenos (which you now seem to prefer), Xenos only has 54% of programmable vector processing capability of much more flexible and controllable SPE.

Therefore this comparison is precisely apples to apples my friend and suggests that your proposal of Xenos as superior coprocessor for PPE than SPE is incorrect.

arjan de lumens · Nov 29, 2005

Crazyace said:
Section 2.4 SPU Events.
....interupt handler located at local storage address '0'

Figure 9-1. Logical Representation of SPU Event Support

Section 9.12.8 SPU Signal Notification 2 Available Event
Section 9.12.9 SPU Signal Notification 1 Available Event

Show interupts driven by external writes to SNR registers.

Thank you. Seems I didn't read the signal notification sections (page ~94 + those figures) carefully enough.

Gubbi · Nov 29, 2005

Crazyace said:
Section 2.4 SPU Events.
....interupt handler located at local storage address '0'

Figure 9-1. Logical Representation of SPU Event Support

Section 9.12.8 SPU Signal Notification 2 Available Event
Section 9.12.9 SPU Signal Notification 1 Available Event

Show interupts driven by external writes to SNR registers.

That's just signal handling. It still relies on SPU code to read the signalling channel ("through the general event handler") and hence relies on the SPU code co-operating.

Consider a situation where code running on the SPU ignores the channels (or simply has crashed or entered a dumb infinite loop)

I can't find where it say that a SPU can preempted, other than where it says the PPE can kill (restart) a SPU.

Cheers
Gubbi

darkblu · Nov 29, 2005

Gubbi said:
That's just signal handling. It still relies on SPU code to read the signalling channel ("through the general event handler") and hence relies on the SPU code co-operating.

Consider a situation where code running on the SPU ignores the channels (or simply has crashed or entered a dumb infinite loop)

by definition an event handler should be invoked indiscriminantly of what the pu is doing. why do you assume the SPU would ignore an event from a channel?

arjan de lumens · Nov 29, 2005

Page 135: sending an event to an SPU causes an interrupt to happen within the SPU. Of course, unsecured/renegade code running on it can just turn off the interrupt flag, but that would presumably be detectable with some sort of timeout mechanism, at which time you would have to invoke the PPE to clean up the situation.

nAo · Nov 29, 2005

OMG, we need more NMIs!

arjan de lumens · Nov 29, 2005

nAo said:
OMG, we need more NMIs!

Naah - while having an SPU jammed by a misbehaving program can be annoying, an SPU that DOSes everyone else is even worse.

aaronspink · Nov 30, 2005

ihamoitc2005 said:
You are confusing yourself my friend. It was you who said Xenos is better companion processor to PPE than SPE (you said you prefer 2 PPE each with its own Xenos) but now you seem to prefer altered Xenos with different architecture without additional functions beyond ALUs. Without additional functions beyond ALUs, Xenos is not Xenos.

Oh quit being a tard. You're comparing the die area for a single SPU to the whole of Xenos. If you want to thats fine, but a single SPU has 0, NONE, NADA, ZERO, NOTHING in the performance department. So if you really really want to do that comparison, sure we can do it.

In similar manner, for SPE, calculation portion of SPE is only 1/3 of SPE transistors, yet I included entire SPE because SPE is not SPE without all components.

So you're including the ring, the memory controllers, the PPE, the cache and cache controllers right?

Get a bloody clue or stay out of the conversation.

Aaron Spink
speaking for myself inc.

fireshot · Nov 30, 2005

Fafalada said:
From where I'm standing the so called Cell GPU(or Cell derived GPU or whatever) has been a proliferation of FUD mostly coming from somewhat annoying posters online(including some with high degrees of education and industry experience).
But maybe I should talk to Elvis like nAo and I'll see the light.

Ken-chan one of them FUDs?

This interview starts off with Kutaragi offering a reason that the PS3 made use of a specialized graphics unit (GPU) from NVIDIA (the RSX), rather than a GPU based on the Cell processor. "The seven SPEs (Synergistic Processor Element) of Cell can be used for graphics," reveals Kutaragi. "In fact, many of the E3 demos were made without a graphics chip, with only Cell used for all graphics. However, this means of use is wasteful."

Kutaragi reveals that there was once the idea of using two Cell chips in the PS3, with one used as the CPU and the other used for graphics. However, this idea was killed when it was realized that Cell isn't appropriate for the functionality required for shaders, software tools that are used to draw images to the screen. The decision to go with a separate GPU was made in order to create the most versatile architecture possible

http://ps3.ign.com/articles/624/624605p1.html

PC-Engine · Nov 30, 2005

Ok to kinda get back on topic, XeCPU has 3 PPEs yet its die is smaller than PS3CELL which has 1 PPE and 7+1 SPEs. I can definitely see an advantage for XeCPU for uses beyond a gaming console. Imagine a CPU with 3 PPEs and 6 VMX units at 3.2GHz at 90nm being used as a cheap buidling block for supercomputers. CELL OTOH doesn't seem to be very useful outside of consoles and CT/MRI radar functions.

Qroach · Nov 30, 2005

"Kutaragi reveals that there was once the idea of using two Cell chips in the PS3, with one used as the CPU and the other used for graphics. However, this idea was killed when it was realized that Cell isn't appropriate for the functionality required for shaders, software tools that are used to draw images to the screen. The decision to go with a separate GPU was made in order to create the most versatile architecture possible"

I said this was the case months and months back.

nAo · Nov 30, 2005

Poll! How long does it take to a CELL architecture designer to realize that CELL can't compete with a GPU?

1) less than a minute
2) less than a hour
3) less than a day
4) less than a picosecond

le jeux sont fait!

Brimstone · Nov 30, 2005

aaronspink said:
IPv6 and pervasive computing have nothing to do with cell.

Grid computing is called clusters, its been around from at least the early 90's.

CELL is designed for a gaming console, not toasters, refrigerators, or cell phones. Cell uses way too much power to work in a cell phone. ARM owns the cell phone market and this isn't likely to change.

Aaron Spink
speaking for myself inc.

As far as the cell phone market goes, appliances, and maybe a PSP 2, what about a CELL design based off of asynchronus logic? A clockless CELL CPU approach with one a single PPU and two SPU's?

Fox5 · Nov 30, 2005

Guilty Bystander said:
Considering Next Gen games will be made with Floating Points throughout the entire pipe it does matter and matters a hole ****ing lot.
With the PC Flop/s might not have mattered before and still doesn't but that's because in Flop/s terms the PC just sucks.
An Amd 64 X2 4800+ for example can only do 12GFlop/s at best while the Xenon in the 360 can do 115GFlop/s and the Cell can even do 218GFlop/s which are huge differences obviously.
That's the reason why the Next Gen console don't need PPU's cause their extreme CPU power takes care they can do loads of character animations, complex physics, loads of players, advanced A.I. etc.

And for anyone out there saying it doesn't matter console will do more Flop/s.
It will SOON, real SOON!!!

What are the chances Xenon and Cell will get anywhere near peak performance? G5 cpus have much higher theoretical flops than Opterons, but their measured performance (in synthetics) is within range of what the Opterons do.
Besides, isn't the primary use for FLOPs graphics, which video cards already handle?

Shifty Geezer · Nov 30, 2005

Fox5 said:
What are the chances Xenon and Cell will get anywhere near peak performance? G5 cpus have much higher theoretical flops than Opterons, but their measured performance (in synthetics) is within range of what the Opterons do.
Besides, isn't the primary use for FLOPs graphics, which video cards already handle?

I'm not sure where this thread is meandering to, but before it gets locked for rambling, I'll say that prior to these next-gen consoles, there hasn't been an abundance of Flops to use elsewhere. Just because to date they're mostly used for graphics (and physics in more modern games) doesn't mean that's all they're good for.

mckmas8808 · Nov 30, 2005

PC-Engine said:
Ok to kinda get back on topic, XeCPU has 3 PPEs yet its die is smaller than PS3CELL which has 1 PPE and 7+1 SPEs. I can definitely see an advantage for XeCPU for uses beyond a gaming console. Imagine a CPU with 3 PPEs and 6 VMX units at 3.2GHz at 90nm being used as a cheap buidling block for supercomputers. CELL OTOH doesn't seem to be very useful outside of consoles and CT/MRI radar functions.

Okay so now the CELL is only good for consoles and CT/MRI radar functions?

At first you didn't even recognize that the CELL was good for MRI and radar functions. Well as long as you understand it now that is what really counts.

So do you believe that the CELL chip is good for HDTVs or other possible equipment like a Blu-ray player?

ihamoitc2005 · Nov 30, 2005

Do not be confused my friend.

aaronspink said:
Oh quit being a tard. You're comparing the die area for a single SPU to the whole of Xenos. If you want to thats fine, but a single SPU has 0, NONE, NADA, ZERO, NOTHING in the performance department. So if you really really want to do that comparison, sure we can do it.

Once again you confuse yourself my friend and you forgot origin of the comparison. It is you who said Xenos is better coprocessor for assistance of PPE in tasks PPE is not good for because you said Xenos is more specialized and therefore superior than SPE for such tasks.

If you meant certain parts of Xenos only, you should be more specific. Also, picking only parts of Xenos for your design means you do not really like Xenos for such use but rather prefer new kind of chip using some parts of Xenos but leaving out other parts no?

Also, you are incorrect that single SPE has no performance. Single SPE has 25.6 Gflops performance when used as companion of PPE. It is a individual processor but many can be added to PPE as with STI CELL which has 8 SPE added to 1 PPE.

So you're including the ring, the memory controllers, the PPE, the cache and cache controllers right?

PPE is not included in comparison because the statement of yours is that Xenos is better companion processor to PPE, therefore question is not size or performance of PPE, but rather companion processor, such as SPE or as you proposed Xenos. Therefore only characteristics of companion processor(s) is relevent.

EIB, which is bus connecting SPE and PPE, also not included because if we include that then we must also include any data bus connecting PPE to Xenos in your proposed architecture. Maybe you would like to propose architecture of such bus for your proposed design to connect PPE and (very altered) Xenos if for some purpose you want to include bus characteristics in comparison.

SPE has no cache as such my friend or any cache controller, but it has LS which is not really cache and own DMA unit so all SPE components are included in the comparison. In case you are not aware 2/3 of SPE is just SRAM yet I included it in comparison.

So now you understand that what is relevent to comparison is not characterisitcs of PPE, since your proposal also has PPE, not bus, since your proposal also has bus, but only the companion processor(s) and all components without which it is not what it is..

But since you like to make a comparison of entire CELL to just Xenos chip then you must either add to Xenos or subtract from CELL all transistors and die area of PPE (including PPE cache). I do not know specific transistor count of PPE, but it is approximately 16-17% of die area.

CELL (not including PPE but including eib, memory controller, flexi/o interface, 7 live SPEs + 1 "dead" SPE, etc)
179.2Gflops (from 7 active SPEs)
from 185 Sq. mm = .97 Gflops/sq. mm

As you can see, even after adding all other components of entire CELL including bus and extra "dead" SPE, Xenos is not superior. Activate 8th SPE and Xenos is suddenly 15% inferior.

CELL with 8 active SPEs:
204.8Gflops
from 185 Sq. mm = 1.1 Gflops/sq. mm

So you can see the Xenos, what you call "specialized", is under no circumstances is better than SPE, what you call not specialized. In fact it is inferior so SPE is superior choice as companion to PPE.

Also unlike, as you say, specialized nature of Xenos, SPE is more capable of other tasks as well.

But to return to focus of comparison you originally proposed of Xenos as superior to SPE for companion purpose, here is comparison of STI choice (SPE) of companion processor for PPE with your choice (Xenos):

SPE (single vector processor):
25.6 Gflops ...
from 14.5 Sq. mm = 1.77 Gflops/Sq. mm
from 21m transistors = 1.22 Gflops/million transistors

Xenos (as vector processor, no edram):
192 Gflops ...
from ~200 (?) Sq. mm = .96 Gflops/Sq. mm
from 232m transistors = .83 Gflops/million transistors

So you see my friend Xenos is not effective as companion to PPE.

Get a bloody clue or stay out of the conversation.

There is nothing to be gained from rudeness my friend. Let us maintain civilized discourse no?

ihamoitc2005 · Nov 30, 2005

Ai

Shifty Geezer said:
I'm not sure where this thread is meandering to, but before it gets locked for rambling, I'll say that prior to these next-gen consoles, there hasn't been an abundance of Flops to use elsewhere. Just because to date they're mostly used for graphics (and physics in more modern games) doesn't mean that's all they're good for.

Neural networks is good application of floating point capacity..

[maven] · Nov 30, 2005

ihamoitc2005 said:
Neural networks is good application of floating point capacity..

I'd say they're mostly bound by memory bandwidth as you're essentially only computing weighted sums...

Does Cell Have Any Other Advantages Over XCPU Other Than FLOPS?

Crazyace

ihamoitc2005

arjan de lumens

Gubbi

darkblu

arjan de lumens

nAo

Nutella Nutellae

arjan de lumens

aaronspink

fireshot

PC-Engine

Qroach

nAo

Nutella Nutellae

Brimstone

B3D Shockwave Rider

Fox5

Shifty Geezer

uber-Troll!

mckmas8808

ihamoitc2005

ihamoitc2005

[maven]

Similar threads