IBM on CELL as online games server(do physics simulation).

At the Havok FX demo on NVIDIA SLI GPU at GDC 2006, they say some part of physics can be accelerated by parallel processing. Though it's technically possible that 1 GPU does graphics and physics at the same time, it's inefficient according to NVIDIA so at this demo 1 GPU does graphics while another does only physics. Maybe interesting to guess what the Havok implementation on Cell is like.

http://www.watch.impress.co.jp/game/docs/20060322/3dinis.htm

Physics, very high data parallelism, is an ideal match for GPUs
http://www.watch.impress.co.jp/game/docs/20060322/3dinis20.htm
http://www.watch.impress.co.jp/game/docs/20060322/3dinis21.htm
http://www.watch.impress.co.jp/game/docs/20060322/3dinis22.htm

Fluids, particles, cloth map are natural to GPUs as they are highly parallel independent data. For rigid body physics mainly used for games, integration and resolving collisions are suitable for GPU while CPU is still good at detecting collision which is scene traversal.
http://www.watch.impress.co.jp/game/docs/20060322/3dinis23.htm
http://www.watch.impress.co.jp/game/docs/20060322/3dinis24.htm

CPU-GPU communication via PCI-e in CPU-GPU hybrid physics solution
http://www.watch.impress.co.jp/game/docs/20060322/3dinis25.htm

Is physics a data-parallel task? Processing all of them in a seqential manner is inefficient
http://www.watch.impress.co.jp/game/docs/20060322/3dinis26.htm
http://www.watch.impress.co.jp/game/docs/20060322/3dinis27.htm
http://www.watch.impress.co.jp/game/docs/20060322/3dinis28.htm

So let's group those that have affected one another, and put them in parallel pipelines in a GPU. It will be like multi-pass rendering that use pipelines multiple times, but CPU is free so it's acceleration for the overall physics process after all.
http://www.watch.impress.co.jp/game/docs/20060322/3dinis29.htm
http://www.watch.impress.co.jp/game/docs/20060322/3dinis30.htm
http://www.watch.impress.co.jp/game/docs/20060322/3dinis31.htm
 
If PPE is bad, Xenon would be worse than the PPE

People have been commenting how comparatively bad the PPE's performance is. If this is the case, then it must be even more worrying for the X-Box 360, since each of the three otherwise almost identical cores to PPE in Xenon should perform worse than the PPE in Cell due to less cache per core, and there are no SPEs to address the shortcomings in the Xenon.

Cell PPE -
64k L1 cache + 512k L2 cache

Xenon -
32k L1 cache per core + 341k L2 cache per core (1MB total)

Compare this with the single core Revolution -
256k L1 cache + 1MB L2 cache
 
SPM said:
People have been commenting how comparatively bad the PPE's performance is. If this is the case, then it must be even more worrying for the X-Box 360, since each of the three otherwise almost identical cores to PPE in Xenon should perform worse than the PPE in Cell due to less cache per core, and there are no SPEs to address the shortcomings in the Xenon.

Cell PPE -
64k L1 cache + 512k L2 cache

Xenon -
32k L1 cache per core + 341k L2 cache per core (1MB total)

Compare this with the single core Revolution -
256k L1 cache + 1MB L2 cache

They're not identical twins otherwise, apparently. The VMX set up is a bit different, Xenon cores have more registers (though I read conflicting things on how useful that is). There are a few other differences also, perhaps in threading behaviour, it seems, although few ever elaborate on them.

Also, when looking at PPE performance in Cell, you really need to look carefully at which revision is being talked about, as the PPE seems to have changed quite significantly through various iterations.
 
Back
Top