http://www.elbeno.com/blog/?p=351
Efficient concurrent multi-threading programming is still the principal challenge when writting for multi-core CPUs. I thought this blog entry was presenting an interesting case of that issue, so it definitely deserved its own thread.We (both internally, and I think, in the wider industry) are making some progress on some systems code being multithreaded. However, it’s not great progress. Of course, one problem is that PCs are still lagging behind next-gen consoles: the min spec target, even for games looking at shipping this time next year, is still a single-core P4 based system. Hyperthreaded if we’re lucky. Multicore PCs are out there, and are getting more numerous, but don’t yet have the market share to support dropping the lower end. And of course both Xbox 360 and PS3 still have more cores than a current top-end PC (6 hardware threads and 2 PPU hardware threads + 6 SPUs respectively).
...
But the major problem area for multithreading is where the actual gameplay happens: the intersection of animation, “high-level†AI, player control and to some extent physics. When I talk about “high-level†AI I mean the decision-making and glue code that controls what entities in the game do. The main game loop of object updates has multiple ordering dependencies among these systems and updating one system for a single object often causes cascading requests to other systems. For example: animation must be posed before physics is resolved (because posing animation may cause an arm to stick through a wall that a character is standing close to, and the physics system will then apply constraints to push the arm back outside the wall). Another example is that AI will decide to change what a character is doing and therefore will drive animation blending. A third example is that AI code will often want to know whether two characters can see each other in order to decide what to do. This involves asking the physics system to do a line-of-sight trace between the objects, and the result is needed to make the AI decision, so the call is typically synchronous.
...
So what’s to be done? Well, I have a couple of ideas designed to make it easier for “ordinary engineers†to take advantage of multiprocessing in at least a semi-safe way. The first idea is Software Transactional Memory or STM. You’ll find the literature if you care to look around the web. I think this is the only practical solution to achieving some concurrency in updating the thousands of objects that we need to, each of which touches half a dozen others. And I think we should apply it at the object level. This will require a bit of extra memory, but if done right, the performance gains could be significant. And importantly, it will scale. For the first time, I think we have the potential to update thousands of objects across as many cores as we have. Of course, I don’t yet have any practical experience, and we may find that a new bottleneck (e.g. memory access) springs up.
The second idea I have to enable more multiprocessing of mid- and systems-level code (and perhaps some higher-level code) by functional programming. I bet that more than 90% of the loops we write can be implemented in terms of map and fold. The challenge is finding a way to abstract this in C++ (much as I would like to write games in Haskell, I don’t see it being adopted any time soon - but I may well experiment with a scripting language embedded in Haskell). Imagine if AI and player control engineers could, instead of writing a synchronous for-loop to find the closest enemy, write that same functionality in terms of map and fold (which is easy) and automatically have it be distributed across cores? The beauty of this idea is that the concurrency work is done once to make map and fold themselves multicore, and then (with appropriate forethought and data constraints) everyone can take advantage of that.
So the only thing I need to do now is actually write this code, get some preliminary use cases together, try it out, and then introduce it to my team. Wish me luck…