Does Cell Have Any Other Advantages Over XCPU Other Than FLOPS?

Shifty Geezer said:
Looking to the future if processing requirements (media and maths intensive tasks) see a greater need lots of calculation power over generic processing, that'd be better served with several FPUs than several large generic branch-strong cores, no? I can't find the picture or info, but what I saw of Intel showed something like an x86 core with a dozen FPU's or VPU's around the outside, and that makes sense if the workloads it'll be given are float/vector intensive.

I really wish I could post that AMD slide, it had a great breakdown of what die space is used for in a CPU. x86 overhead was 4%, branch was something similar. By far the greatest chunk was the parts dealing with the memory system, IO was 25% and cache 42% for a total of 67% of total die cost (numbers are AFAICR). I'd say it would be better to put the execution units inside the CPU core and let them benefit from the rest of the CPU apparatus. I have the conference slides bookmarked back home, will post it when I get back.

The Intel slide shows multiple simple cores on a die and no special purpose VPUs or other sillyness.

Shifty Geezer said:
This depends really on where code is heading, but both STI and MS felt streamed vector processing was important for performance

No, the FLOPS numbers are important, not for performance, but for marketing reasons. I'll buy a bottle of champagne for the first dev team that gets over 50 GFLOPS for one second in a game.

Shifty Geezer said:
, and this roadmap I saw suggested Intel think that too. In the same way graphics processing is given over to a streamed vector type, and so a steramed vector unit is much faster thana generic processor at those tasks, a processor with streaming vector units will rattle through those tasks more quickly than a generic processor.

Reality check: GPUs are moving towards more/better general purpose capabilities (random memory access, looping, branching etc), not the other way around (CPUs going streaming).

Few apps (games) today fully utilize the SIMD capabilities of current CPUs, both AMD and Intel anticipate that to change, both have plans to increase SIMD performance in future CPUs, there'll be little room for dedicated vector engines.

Cheers
Gubbi
 
Last edited by a moderator:
Fafalada said:
Ok, if you let me choose the champagne, I would be willing to take that bet :LOL:

Not a bet, just a price. :)

I'll need documentation though, something like a PA readout or performance counter logs for a 1 second trace.

Cheers
 
Gubbi, i think you just announced a very poor bounty - expect 50Gflops in the entry menu of Faf's next title, you know - fractal backgrounds, FFT-ed menu entries and stuff ; )
 
In early days, many people misconceive

pixelbox said:
I have to give it to ya. I was starting to believe him. All the stuff he was saying about cell being a average cpu. Where can i find stuff on cell like how it works?

Misconception in understanding new style of hardware is very common when actual performance is not known no? For example, Anandtech thought PS2 Graphic Synthesizer pixel engines are nearly impossible to gain good performance due to extreme parallel setup of 16 pixel pipelines and hence thought PS2 will have inferior real-world fill-rate. Of course in actual performance it was easy to gain full performance from Graphic Synthesizer and PS2 had very very fast real world fill-rate but nevertheless such misconceptions were had by many in early days.
 
Gubbi said:
I'll need documentation though, something like a PA readout or performance counter logs for a 1 second trace
Oh that goes without saying, I wouldn't expect anyone to take my word for it. Hell I wouldn't just take my word for it. :p

darkblu said:
Gubbi, i think you just announced a very poor bounty - expect 50Gflops in the entry menu of Faf's next title, you know - fractal backgrounds, FFT-ed menu entries and stuff ; )
Now now, let's be fair, I do have some other things in mind to try. Of course, should those fail, FFTing the GUI could be an interesting and groundbreaking innovation in the world of games :p
 
Actually Cell ought to mince through fractals like nobody's business. Anyone play Stardust on the Amiga? They had some really nice animated plasma effects in the background straight out of the demo scene. I'm sure Cell could do some similar yet more amazing effects seeing as the code is small enough to fit into LS and you could run it full pelt. Really demanding iterative stuff that isn't dependent on memory access.
 
Fafalada said:
Now now, let's be fair, I do have some other things in mind to try. Of course, should those fail, FFTing the GUI could be an interesting and groundbreaking innovation in the world of games :p

Expect to see 2 SPUs running MADDs flat out without load or saving anything...

:p
 
Candidates:

Driving massive particle systems with good number of forces + collision hulls could probably get the number up there (like a waterfall simulation with gravity, wind, updraft and what not).

Or run umpteen iterations of shallow water equations per frame, - for super high quality wave dynamics.

Cheers
Gubbi
 
Using one of the heavier equation solvers with say... high detail cloth, would drive a very high number too.

But frankly there's one thing I had in mind for quite some time - and I think it could also make an interesting candidate for this challenge. I'll keep you posted ;)

Mind you I've been thinking - I can get fairly accurate performance analysis on the hw, but how do we actually measure "FLOPs"? :p If it's not a trivially short loop, counting instructions would be a bit far fetched.
I might be able to pull issue rates or something of the sort, but that won't tell exactly what kind of arithmetic was being done (I could only be computing only scalars, and the instructions issued will still be the same as for vectors).
 
Fafalada said:
Mind you I've been thinking - I can get fairly accurate performance analysis on the hw, but how do we actually measure "FLOPs"? :p If it's not a trivially short loop, counting instructions would be a bit far fetched.
I might be able to pull issue rates or something of the sort, but that won't tell exactly what kind of arithmetic was being done (I could only be computing only scalars, and the instructions issued will still be the same as for vectors).

Since I'm looking for a real game workload to exceed 50GFLOPS for one second, that's what I'd like to see. We're just going to rely on the honesty of the developers to not do dummy FP loops. :)

A general breakdown of what the flops are used for would be nice.

As for measuring FLOPS, you're right about the scalar vs vector though. To keep it simple it should just be:

FLOPS= FP ops issued * 8

Unless you can get more detailed data from the performance counters (ie. x86 can filter on x87 or SSE ops).

Cheers
Gubbi
 
Last edited by a moderator:
nAo said:
Gubbi: nice challenge indeed!
But I think you're not going to win against Faf...:)

I don't think of it as a bet. More like a X-price (but a cheaper one :D)

Either way I don't lose.

Cheers
Gubbi
 
Most expensive 50 Gflops ever

Gubbi said:
Either way I don't lose.
What if the winning developer asks for a Dom Pérignon cuvée 1954?
 
one said:
Isn't it typically only once if you put a manager kernel in an SPE that fetchs/switchs all subsequent task stream in runtime by itself after the initiall kick and memory aliasing by PPE?
http://www.research.scea.com/research/html/CellGDC05/38.html
38.jpg

This is more pipedream than reality. The architecture of the SPE's lack many of the features that one would like to run in a true kernal, leaving you with something akin to non-preemptive multithreading with a rather limited code and data space. Very unlikely to happen in reality.

The SPU can't really re-task itself on its own, at best you can have to switch between sections of a program.

Aaron Spink
speaking for myself inc.
 
nAo said:
One: Yes, like any other processor out there you need some kind of kick to start (amiga anyone? :) ), after that SPEs can mostly run without any PPE help.

Um, most other processors don't need a kick to start. They can boot strap themselves. The SPE cannot bootstrap itself.

Aaron Spink
speaking for myself inc.
 
version said:
intel and amd will use cell architecture in about 2010, why cell, why not xenon???
ahhh waste my time.

Yes version, you certainly are a waste of our time.

CELL is an architectural dead end. The general faults with the architecture are such that no one else will likely go down the path that was chosen by CELL.

CELL is a very different beast than a multi- or many- core processor between withs asymetrical aspects, its specialized programming model, and lack of general coherence.

CELL will most likely fall into the same category as the transputer in history. An interesting idea but with a lot of compromises.

None of this however has anything to do with its current use at this time in PS3 or vis-a-vis X360.


Aaron Spink
speaking for myself inc.
 
What is to gain?

aaronspink said:
Yes version, you certainly are a waste of our time.

CELL is an architectural dead end. The general faults with the architecture are such that no one else will likely go down the path that was chosen by CELL.

CELL is a very different beast than a multi- or many- core processor between withs asymetrical aspects, its specialized programming model, and lack of general coherence.

CELL will most likely fall into the same category as the transputer in history. An interesting idea but with a lot of compromises.

None of this however has anything to do with its current use at this time in PS3 or vis-a-vis X360.


Aaron Spink
speaking for myself inc.


1) There is no point in making insult to others no? Do you feel there is much to gain in knowledge or understanding from calling someone "waste of time"?

2) CELL is not x86 so stop trying to fit x86 programming model onto CELL. This is perhaps youre 19,842nd post saying CELL is no good because it has different programming model than one you like. It is different from what you are used to, that does not in itself make it better or worse.

3) Many have already taken very good advantage of CELL and published openly (available with simple google search my friend) so we have indication from real world examples that, despite new programming model that programmers had to learn, CELL is extremely effective and powerful.
 
Back
Top