TimothyFarrar
Regular
It's not possible with the current generation hardware. Try traversing a linked list on GPU and CPU. There's a huge performance difference.
It sure is possible on current GPU hardware, and you would not use a linked list (a linked list is a painfully bad data structure for coherency and parallel processing on the CPU as well).
Programming for a GPU is the same as programming for a cluster. It's all mature, but the hardware is slightly different.
No it is quite different. Programming a cluster is typically all about message passing and limiting latency (network bottleneck). GPU programming currently (at least single GPU) doesn't involve any exposed network message passing.
Synchronization is anathema to these architectures, period.
The problem of synchronization is larger in the CPU space do to how the operating systems schedule threads. If you need a real example of this, think about the typical latency of threaded job result usage in games (latency is large to avoid stalling at sync points were dependent results are needed from asynchronous job invocations). This latency for some developers is an entire game frame (id for example). Others have more dependent operations happening in one frame, however nothing even close to the number of dependent draw calls you can issue on the GPU in a given frame.
The only difference is that said systems were expensive and rare, hence fewer programmers.
And as being so expensive and rare, their use was towards solving a tiny subset of easy-to-parallelize problems.
My point here is simple, when provided with two different tools to solve similar problem, for example a lock pick and a sledge hammer attempting to break through a door, you wouldn't attempt to use the pick like a sledge hammer, nor a sledge hammer like a lock pick. Both however can be used successfully to solve the problem.
People often only think of using simple serial problem solving techniques (like linked lists for example) to solve problems, when in-fact there is a large expanse of solutions to problems. It isn't hard to see the connection here when you think about how your "general purpose" C/C++ program is in-fact running on a statically defined rigid highly parallel connection of transistors known as a CPU...