Jawed said:
I'm sorry, but a 5x speed-up is not impressive.
How many times more peak GFLOPs do 8 SPEs at 2.4GHz have over the P4? An awful lot of that power has gone missing and the paper seems to avoid even touching on that subject.
Jawed
STI initially claimed up to a 10x speedup over "conventional processors". With this software, a 3.2Ghz Cell should be pushing up on a 7x speedup over a 3.6Ghz P4 (scale the P4 to 3.2Ghz and the speedup increases further). If they got the PPE to work on the task itself too, there'd be further gains. I think if you have a speedup that roughly matches the number of SPUs, a roughly linear speedup, that's pretty damn good - i.e. one SPU can be nearly as good as one P4 for this task.
Speedups beyond that are likely dependent on how well the task maps to, or has been mapped to, the memory architecture in Cell. We've seen massive speedups in other examples likely due to that, but this task, or the current implementation, may not benefit as much? Remember, also that this is a port, unlike other examples which were built from scratch for Cell. As it is, though, expressing disappointment at this kind of improvement - almost a linear speedup - makes you sound spoilt
On a side note, the only commentary I found on this from E3 was from IGN:
The next demo was based on a new cloth simulation algorithm being worked into Maya. Again using two Cell processors, the demo was able to run 16 separate simulations simultaneously. Each piece of cloth was defined by 300 vertices, but the real kicker with this demo is that the algorithm incorporated self-intersecting physics, keeping the cloth from flowing through itself. This sort of simulation is much more computationally-intensive than simulating a cloth against another object.
edit - I should look closer at things, the chart in fact shows than a single SPU @ 2.4Ghz is better than a P4 @ 3.6Ghz for this, looks to be maybe 1.2x. With 8 SPUs, it's a 5x speedup - but if you scaled them to 3.6Ghz, it'd be 7.5x (assuming performance scales linearly with clockspeed), and thus across the SPUs the speedup would be pretty much linear.