Actually, SDK 2.x supports asynchronous stepping.
It was "not designed for native parallelism", as docs honestly say, but supports minor threading inside single physics scene, and can run set of sub-scenes (called compartments) in parallel.
As for GPU acceleration of CPU PhysX games -...