Tree search algorithim on SPE's?

I didnt want to disrupt the technical discuession at the Cell forum with trivial question but one of the IBM guys mentioned this:

IBM tech guy said:
- With respect to AI, I don't have the answers, but I have good hope. I think AI is usually not bound by computation but by memory access penalties (on high-frequency processors). I can imagine tree search algorithms for the SPEs that absolutely rock, by getting a lot of memory accesses in flight concurrently. There may be a patent out there by M. Necker and myself that describes some of this for the case of routing table accesses ( also a kind of tree search ).

Any idea which tree searching he's referring to? I would imagine that the SPE could be used to start traversal at many different nodes in parallel. This seems pretty obvious yet people have been saying that SPE's wont be strong at handling branching AI code. What's the disadvantage to running something like this on the SPE's?
 
If the data structure fits in local memory it should be very fast.

I suspect what they are suggesting is making speculative DMA requests, I'm really not convinced this will outweigh the cost of putting the search on the SPE. And the potential impact on the memory bus of many outstanding small DMA requests. Couple this with coherent memory layout of the tree and it might run pretty fast.

Having said that since you have lots of SPE's and one PPE, it might still be "faster" to do it "slower" on one or more SPE's. Parallelism about the absolute fastest solution to individual problems, it's about balanced workload.

I think that the real issue with AI and SPE's is the releative interconnectivity of data with complex AI agents. Without very careful design it's difficult to isolate what data a given agent is dependant on. I have a design built to solve this problem for a large scale simulation in a game I'm no longer working on, but it makes building the agents considerably harder and since it was expecting to use a scripting language for the majority of the base logic it doesn't really solve the limited SPE memory issue well. You simply can't hide the parallelism, and there are simply too many ways to build something that superficially appears to work, but doesn't.

It's an interesting problem, I just wish I could work on it without having to ship a game on a stupidly short timeline.
 
The trees mentioned could be similar to Alpha-beta pruned decision trees that can be used for games like othello, checkers, and tic-tac-toe. A tree like that is very helpful in descrete game worlds with simple interactions.

One potential way to handle pruning of bad moves would be to pick a series of boundary nodes at a desired search depth, then assign the children nodes to their own SPEs--which pass the result on to the PPE, which assigns the proper value at the node. Repeat this for each node, then propogate up the tree.

It has a number of advantages, since Alpha-beta pruning doesn't require a signficant amount of computation beyond the given node and its children, a significant amount of data can be streamed--a Cell strong suit. With the number of SPEs, the depth of search in the tree can be significantly further (maybe) than it would be with a conventional processor. In addition, the evaluation of each node is computation-heavy, which goes well with the SPE architecture, so long as the local data can be local-stored.

Trees are a very good brute force way for computers to make decisions, and Cell can power through a lot more nodes with its many cores. Pruning is absolutely essential, since even small game trees can explode without a good way to remove bad decisions.

Unfortunately, not all situations fit tree structures very well. If interactions are complex, the game world continuous and changing, or even the number of agents being relatively large, the tree will simply be too huge and require a different algorithm be used.
Alpha-beta pruning, for example, fails for situations with 3 or more competing agents.

Different algorithms may not be as stream-friendly or play well with the limited local store space of the SPEs. If they are particularly branchy and unpredictable at compile time, then SPEs will founder even more with their long pipelines, almost non-existant branch prediction, and a lack of multithreading to hide some of the gaps.
 
Back
Top