As we all know, given (close to) infinite time, a Turing machine, the simplest logical device ever devised (a one-state machine), can solve any solvable problem. We classify a problem as "hard", when there doesn't exist a solution to it that runs within a specified time.
Processors are about maximizing throughput, and so we require everything to finish as fast as possible. GPUs specifically like their problems small and fast. And while CPUs don't mind larger problems, that run longer, both require execution to be atomic, or at least predictable in time.
Amdahl is famous for putting this into writing. And, as it turns out, the main bottleneck isn't so much the amount of calculations required (as you can always use a faster approximation if needed), but the amount of interaction.
Essentially, that is because the processors are still state machines, executing a multitude of actions that might depend on each others results to continue.
Most methods that are used to run multiple tasks in parallel therefore use locks and synchronization mechanisms. They layer the workload in time, executing sequences of tasks in lockstep.
But, why would you do so? Just because a processor is essentially a state machine, doesn't mean you have to use a similar model to describe your tasks. Because, essentially, that Turing machine can solve any task in a predictable time frame, as long as it isn't a hard problem.
So, instead of trying to come up with better locking mechanisms, a better task division or whatever as long as it is still using the state machine paradigm, we need to come up with something that is free running.
Or, in other words: we need a paradigm that doesn't switch between predefined states that represent fixed lengths of time, but instead average new states from multiple asynchronous occcurences.
That way, you need no synchronization, and you don't require atomic operations either.
Simply gather what data is available, make a best guess for the current time, and update.
An additional bonus of such a model is, that you can steer it solely by priority, and aren't limited by having to process all objects each timeframe.
Processors are about maximizing throughput, and so we require everything to finish as fast as possible. GPUs specifically like their problems small and fast. And while CPUs don't mind larger problems, that run longer, both require execution to be atomic, or at least predictable in time.
Amdahl is famous for putting this into writing. And, as it turns out, the main bottleneck isn't so much the amount of calculations required (as you can always use a faster approximation if needed), but the amount of interaction.
Essentially, that is because the processors are still state machines, executing a multitude of actions that might depend on each others results to continue.
Most methods that are used to run multiple tasks in parallel therefore use locks and synchronization mechanisms. They layer the workload in time, executing sequences of tasks in lockstep.
But, why would you do so? Just because a processor is essentially a state machine, doesn't mean you have to use a similar model to describe your tasks. Because, essentially, that Turing machine can solve any task in a predictable time frame, as long as it isn't a hard problem.
So, instead of trying to come up with better locking mechanisms, a better task division or whatever as long as it is still using the state machine paradigm, we need to come up with something that is free running.
Or, in other words: we need a paradigm that doesn't switch between predefined states that represent fixed lengths of time, but instead average new states from multiple asynchronous occcurences.
That way, you need no synchronization, and you don't require atomic operations either.
Simply gather what data is available, make a best guess for the current time, and update.
An additional bonus of such a model is, that you can steer it solely by priority, and aren't limited by having to process all objects each timeframe.