shaderguy said:
Isn't game physics (polygon soup collision detection and reaction) more of a data base search and update problem? While there's some spatial coherency you can take advantage of, isn't most of the time in these algorithms spent searching data structures?
A physics engine consists primary of three steps: Collision detection, Constraint Solving, and Integration. The final step is very parallel since once you've calculated any interactions, applying time integration to every object in the game database is inherently parallel.
Constraint solving is as parallelizable as any matrix solving solution you're used to, but since multiple mutually non-contacting objects are in need of updating from the collision detection phase, these object's constraints can be solved in parallel. The collision detection step can inherently partition the constrain solving problem set by the very fact multiple mutually non-interacting object collisions will be found, each with their own disjoint set of constraints.
That leaves collision detection, which is the "search" problem. Sure, this problem is *less* easily paralleizable than the other two, but it is not impossible to parallelize. A smart physics engine is going to have some kind of scene database partitioning scheme anyway, and well as multiple collision detect phases (gross, midrange, and fine-detail)
One you can determine that a particular gross volume has no collisions with the rest of the database, you can procede to run deeper level collisions within that volume in a separate thread, while the rest of the database search proceeds.
But even in cases where you don't partition like this, database searches and updates can be parallelized, you just need efficient concurrency primitives.
Let's consider a simplified 1 dimensional example: I have an array of 1000 integers A[], and I wish to determine "collisions" between integers in this array. A collision is defined as abs(A
- A[j]) < epsilon. The output is C, a list of pairs of array positions which are in collision.
I can partition this problem into solving the collision problem on A[0..499], A[500..1000] and collisions between numbers from A where i=0..499 to numbers in the range A[j] where j=500..1000. Thus, I can run all three searches concurrently. You can either surround access to the output list C via a critical section, use a concurrent data structure like a concurrent queue (no blocks, no spin locks) implementation, or you can output three sublists, and then merge them at the end.
I'm not saying it isn't a difficult implementation, but to claim that the problem can in no way be parallelized I think is extremist.
There is a reason these physics SDKs haven't been parallelized before, and that's the fact that the target markets did not have SMP capable CPUs, so there was little gain using true concurrency. But now we have dual core CPUs, tri core XBox360, and octo-core PS3, which means alot more work and research will be put into efficient parallel implementations.