Does Cell Have Any Other Advantages Over XCPU Other Than FLOPS?

nelg · Nov 28, 2005

I think Arron points reconcile with what Deano said......

I'm close to screaming (partly because I can't say alot about next-gen GPUs) but hopefully this will explain enough to show GPUs are good at maths as well.

GPUs (not surprisingly) are good at what they do, why do it on a CPU that isn't designed for the job? On there home turf (doing maths on lots of seperate bits of data) they are very good.

GPUs are seriously SIMD. They have a number of units (quads are an old name for them but that distinction will go away) each working on lots of data.

So lets compare an SPU SIMD to an imagninary near future GPU SIMD unit (this is meant as a order of magnitude thought experiment, so read nothing into the numbers).

1 instruction FMAC in a SPU will operate on 4 floats per cycle
1 instruction FMAC in a GPU unit will operate on 48 floats per cycle

It we take 4GHz for SPU and 500Mhz for a GPU, then a SPU can preform 8 times as many instructions.
So in 1 GPU cycle
8 FMAC instructions in a SPU will operate on 32 floats
1 FMAC instruction in a GPU unit will operate on 48 floats

So even using this crude back of the envelope calculation show that a GPU isn't exactly outclassed...

Just in case there any doubt, GPUs will ship with multiple 'units' as illustrated here just as Cell ships with multiple SPUs...

And we are not even thinking about all the 'free' data conversion, low latency memory reads and lerps that are the part of the fixed hardware...

Shifty Geezer · Nov 28, 2005

Edge said:
I notice in the other thread your are equal as vocal against the RSX architecture. You're are certainly exhibiting a certain level of bias here, in favor of the Xbox 360 over the PS3 components.

I think it's safe to say, no matter your experience, we can take your opinion with a certain level of scepticism as to your motives.

I think you're jumping the gun here. No reason to think Aaron's opinions are based on brand bias. If he thinks XB360 has a better CPU and GPU, doesn't mean those thoughts are only as a result of an XB badge on an MS piece of hardware.

nAo · Nov 28, 2005

nelg said:
I think Arron points reconcile with what Deano said......

Not at all, Dean was adressing a completely different issue (and he was right on the point).
Nonetheless GPUs power comes at some cost, just like SPEs power.

Edge · Nov 28, 2005

Crazyace said:
At the end of the day the best programmers will tune their applications to the platform.

I could not agree more. With the PS3 being a fixed standard platform, the developer knows exactly what he is working with, and will spend more time optimizing accordingly, as current console development requires.

Arron points at the failure of Trimedia, and bases all on difficulty of programming, but it also failed simply because why provide lots of development resources to something that sold so poorly. With PS3, which will sale over 100 million most likely in 5 years, will have huge amounts of investments in development resources.

The SPE, by their nature, will invite lots of middleware developers that will take care of the difficultly in programming CELL, as so far seen in the middleware packages being offered at this point.

Look at many PS2 games being superior than the same version on the GCN. GCN is easier to program, and more powerful than the PS2, but because developers have invested so much in PS2 development, and very little in GCN development has lead to PS2 version of some games looking better.

ihamoitc2005 · Nov 28, 2005

What is the connection?

nelg said:
I think Arron points reconcile with what Deano said......

What is the connection my friend? Deano seams to be saying GPU is good at math and could be good synergistic processor just like SPE, despite dependence on seperate primary control unit. Aaron seems to be saying synergystic processor concept is not good and that all processors must have complete functionality (maybe his ideal is, for example, multiple Pentium 4).

So Aaron, who does not like synergistic processor model and wants multiple stand-alone processors, and Deano, who says even GPU can be good synergistic processor even though it is dependent on seperate main control unit, seem to be saying precisely opposite things no?

aaronspink · Nov 28, 2005

darkblu said:
/offtopic/

i'm still curious exactly what made IBM go rigid big endinan with the ppc64..

IBM IS BIG ENDIAN. Seriously, the two maine Endian camps pretty much came out of IBM and DEC. There are advantages and disadvantage to both, but at the end of the day it really doesn't matter with the current transistor budgets.

Aaron Spink
speaking for myself inc.

aaronspink · Nov 28, 2005

-tkf- said:
So the Cell chip was designed by the Sony marketing department and every white/grey and black paper written on it is just PR people covering up this "scandal".

Marketing set the performance target. They didn't actually make it though. They are about 768 flops short of the original target.

Aaron Spink
speaking for myself inc.

weaksauce · Nov 28, 2005

Can somebody explain the "local storage" the spe's have? Is it better/worse, slower/faster than regular l2 cache?

Carl B · Nov 28, 2005

aaronspink said:
Marketing set the performance target. They didn't actually make it though. They are about 768 flops short of the original target.

Aaron Spink
speaking for myself inc.

The original patent for the 1 TFLop of performance was for a 4 PPE, 32 SPE design. And it was refered to as a 'Broadband Engine' comprising of four 'Cell' processors.

So, they made the target. I don't understand how this can still come up every couple of months.

aaronspink · Nov 28, 2005

Edge said:
But why replicate the PPE at all. aaronspink, you should be avocating the replication the PPC 970 (aka G5) as many times as possible till you hit your FLOP target.

The 970 isn't the right architecture for the market or to get a large number of flops. The PPE is a fairly reasonable compromise between the various needs and functionality.

You seem quite determine to downplay the SPE's as much as possible and advocate the PPE's, but you would be far better off advocating the PPC 970.

I have already said that the CELL architecture is reasonable as a point product for the PS3 given the silicon limitations. It however is not a future direction I see the industry or Sony really going to.

I notice in the other thread your are equal as vocal against the RSX architecture. You're are certainly exhibiting a certain level of bias here, in favor of the Xbox 360 over the PS3 components.

My only issue is that the RSX is a bit of a hack job and doesn't bring anything new or interesting to the market.

I think it's safe to say, no matter your experience, we can take your opinion with a certain level of scepticism as to your motives.

My only motive is to bring a reasonable counter point to a lot of the CELL rah rah rah going on.

Like others have shared here, the SPE are specialized processors to deal with large sets of data that require massive computation, and are not designed to deal with the main loop of a game, which the PPE handles.

They are specialized to the point of being limited.

You say, they are difficult to program, but I don't really hear very many complaints from developers on this. Sure there has been a few, but nothing major.

If you haven't heard complaints, you haven't been listening.

Aaron Spink
speaking for myself inc.

ADEX · Nov 28, 2005

But why replicate the PPE at all. aaronspink, you should be avocating the replication the PPC 970 (aka G5) as many times as possible till you hit your FLOP target.

You seem quite determine to downplay the SPE's as much as possible and advocate the PPE's, but you would be far better off advocating the PPC 970.

To get the same peak FLOPS level you'd need 6 or 7 PPEs, the die would be huge and run at around 200 Watts. That's somewhat unrealistic for a consumer device.

It'd be even worse with a 970. Assuming 2GHz, to get the same FLOPS level you'd need 8 or 9 of them it'd require over 400 Watts.

The SPEs require around 2-3 Watts each, the entire Cell uses less than 50W at 3.2GHz (could even be a good bit under 40W).

The cost: the SPEs are harder to program and will be weak on some common control algorithms (which the PPE handles).

--

BTW Apple has managed to get 38 GFLOPS out of a dual core 2.5GHz G5 on a big convolve (sp?) using just AltiVec, that's very close to the peak of 40 GFLOPS.

Getting 50 GFLOPS from a Cell shouldn't be too difficult, they should be asking for 100 or for a real challenge 150.

aaronspink · Nov 28, 2005

ihamoitc2005 said:
What is the connection my friend? Deano seams to be saying GPU is good at math and could be good synergistic processor just like SPE, despite dependence on seperate primary control unit. Aaron seems to be saying synergystic processor concept is not good and that all processors must have complete functionality (maybe his ideal is, for example, multiple Pentium 4).

That is not what I've said.

So Aaron, who does not like synergistic processor model and wants multiple stand-alone processors, and Deano, who says even GPU can be good synergistic processor even though it is dependent on seperate main control unit, seem to be saying precisely opposite things no?

There is the theory of specialization. It pretty much says that either do or don't do it, but don't stratle the fence. One of my issues with the SPUs is they pay a lot of overhead to stratle that fence. They are not as good as GPUs for running real streaming code, and not a good as a real processor for non-streaming code.

Ideally you have have two chips, each with a PPE and a Xenos. This would give you all the flop performance in an architecture actually designed for streaming applications, and control/general purpose processors.

Aaron Spink
speaking for myself inc.

ihamoitc2005 · Nov 28, 2005

Limited?

aaronspink said:
My only issue is that the RSX is a bit of a hack job and doesn't bring anything new or interesting to the market.

Please show what is RSX GPU and also please show how it is hack job.

Also is it not very high bandwidth to CPU something new and providing new and interesting possibilities for CPU-GPU interaction?

They are specialized to the point of being limited.

Limited for what type of use and compared to what?

ihamoitc2005 · Nov 28, 2005

My friend I think youre confused.

aaronspink said:
That is not what I've said.

You criticized SPE for havign some dependence on main control unit many times. GPU has even more dependence on main control unit than SPEs! So this is what you said. I feel you may have confused yourself without realization.

There is the theory of specialization. It pretty much says that either do or don't do it, but don't stratle the fence. One of my issues with the SPUs is they pay a lot of overhead to stratle that fence. They are not as good as GPUs for running real streaming code, and not a good as a real processor for non-streaming code.

What overhead is paid by SPEs that is not paid by other processors in context of performance relative to size/cost/heat/power?

Why is SPE not as good for streaming code as a GPU? Be specific.

Why is SPE not a "real" processor? What is not "real" about it?

And why do you say it is not good for non-streaming code and compared to which processor? It is important in this comparison for you to mention performance as in relationg to size/cost/heat/power..

Ideally you have have two chips, each with a PPE and a Xenos. This would give you all the flop performance in an architecture actually designed for streaming applications, and control/general purpose processors.

So you now propose two control units for non-streaming data and two synergistic processor arrays for streaming data? How would you project total performance and also performance in relation to size/cost/power/heat?

darkblu · Nov 28, 2005

aaronspink said:
IBM IS BIG ENDIAN. Seriously, the two maine Endian camps pretty much came out of IBM and DEC. There are advantages and disadvantage to both, but at the end of the day it really doesn't matter with the current transistor budgets.

Aaron Spink
speaking for myself inc.

yes, knowing the transistor budgets was exactly the reason for my curiousity, as ppc32 was already dual endian. so are you saying they did that out of stubbornness? - i personally could take it for a perfectly valid reason.

Edge · Nov 28, 2005

aaronspink said:
My only issue is that the RSX is a bit of a hack job and doesn't bring anything new or interesting to the market.

And yet, the SPE does exactly that, and you are against it. You sing a completely different tune in your comments about the RSX compared to your comments on CELL. You should try to be consistant. That is why I am suspicious of your motives here.

aaronspink said:
My only motive is to bring a reasonable counter point to a lot of the CELL rah rah rah going on.

Why begrudge people their excitement over CELL? Why care that someone actually likes the potential of this powerful hardware?

aaronspink said:
The 970 isn't the right architecture for the market or to get a large number of flops. The PPE is a fairly reasonable compromise between the various needs and functionality.

I'm only bringing that up, as it matches your opinion on the matter. You say that SPE's are too stripped down, but anyone can use that to argue about the PPE inrespect to G5/Athlon 64/Pentium 4. More is always better right?

I can see a dual core 2 GHz G5 processor outpowering the X360 CPU in a lot of cases, and yet provide a reasonably sized chip.

Getting back to CELL, a few benchmarks/programs have already been provided shows it has incredible potential as a console processor. Do you deny that? Cloth simulation (supposedly 5X faster than highend Pentium 4), IBM's ground flyby demo (what was it, 100 times faster than high end Pentium 4?).

Anyway I don't really see this argument going anywhere. Sony is committed to using CELL, developers will show some amazing things with it, and fans of the architecture like me will be happy with the choices made.

Bobbler · Nov 28, 2005

aaronspink said:
Ideally you have have two chips, each with a PPE and a Xenos. This would give you all the flop performance in an architecture actually designed for streaming applications, and control/general purpose processors.

Two chips like that would likely be over 550+m transistors (~255m each, not including PPE's cache that you'll likely want to have or the edram which xenos was made to use), and if you have a transistor budget that high its likely you could do better than just stapling a PPE and Xenos together. I'm not sure how having two monster chips like that is in any way ideal. Hell, with 250m+ transistors for pure logic you could come up with quite a monster of a cpu, but thats not really feasible for today's fabrication processes (especially considering you'd likely need quite a bit of cache to feed a monster of that sort).

Also: I don't think Aaron is out to spit on the Cell every chance he gets, hes at very least offering up some good conversation and reading -- while his views may not agree with everyones, he does give a valid point that outside of PS3 Cell IS going to have a tough time convincing people that it will be worth it (especially in the consumer/desktop space which will likely stay far away from Cell). This thread has been a rather interesting read and I'm thankful because work has been boring today. =o

Shifty Geezer · Nov 28, 2005

Edge said:
And yet, the SPE does exactly that, and you are against it.

No, the SPE's don't fit as much streaming float performance per sqaure millimetre as a GPU. Aaron's right in that view. The SPE's are neither true ultra-efficient float processors, nor proper cores. But I think Aaron's missed the point in how SPE's are to be used. They're ultimately much more versatile than a SIMD GPU and will allow different routines to be executed very quickly. eg. A fractal program will run very well on SPE's, but not on a GPU that doesn't supprt iterative programs, or on a standard CPU that doesn't have the same FP capability. SPE's are occupying a niche position that's neither satisfied by existing CPU or GPU type architectures.

Edge · Nov 28, 2005

Shifty Geezer said:
No, the SPE's don't fit as much streaming float performance per sqaure millimetre as a GPU. Aaron's right in that view. The SPE's are neither true ultra-efficient float processors, nor proper cores. But I think Aaron's missed the point in how SPE's are to be used. They're ultimately much more versatile than a SIMD GPU and will allow different routines to be executed very quickly. eg. A fractal program will run very well on SPE's, but not on a GPU that doesn't supprt iterative programs, or on a standard CPU that doesn't have the same FP capability. SPE's are occupying a niche position that's neither satisfied by existing CPU or GPU type architectures.

I'm commenting on his "doesn't bring anything new or interesting to the market.". The SPE's very much bring something new and interesting to the market.

mckmas8808 · Nov 28, 2005

aaronspink said:
I have already said that the CELL architecture is reasonable as a point product for the PS3 given the silicon limitations. It however is not a future direction I see the industry or Sony really going to.

Aaron Spink
speaking for myself inc.

What industry are you talking about? Do you mean nobody else that using processors to any extent won't use CELL processors? Are you talking about computer companies like Apple, Gateway, Dell, etc? And do you actually think Sony are only going to use the CELL chip in the PS3 and nothing else?

If so then you are horribly wrong. If I've misunderstood you could you please explain your quote above a little bit more?

Thanks.

Does Cell Have Any Other Advantages Over XCPU Other Than FLOPS?

nelg

Shifty Geezer

uber-Troll!

nAo

Nutella Nutellae

Edge

ihamoitc2005

aaronspink

aaronspink

weaksauce

Carl B

Friends call me xbd

aaronspink

ADEX

aaronspink

ihamoitc2005

ihamoitc2005

darkblu

Edge

Bobbler

Shazbot!

Shifty Geezer

uber-Troll!

Edge

mckmas8808

Similar threads