SPM said:
The problem with procedural AI is that the programmer has to explicitly define every logic state.
Not necessarily, it depends heavily on how the simulation is set up. A minor agent within the larger game world can work from a much smaller set of data.
There's a difference between procedural AI and procedural code. One is a higher-level type of decision making for determining the future state of a simulation.
The other is a programming paradigm where the program is composed of various procedures or methods that can be called at any point in a program. It is often preferable over sequential code because it allows for much better maintainability and structure.
Just try programming the herd/crowd behaviour simulation as single threaded procedural AI code, and you will find it requires an immense amount of programming effort
Not much harder in the long run, since the problem being modeled is a very simplistic stimulus-response behavior. Any procedural rules for the behavior of each individual within the herd would degenerate into simple rules.
Whether it's single-threaded is beside the point. A single-threaded vs a multi-threaded makes no difference to the simulation, and procedural code is not inherently single-threaded. Procedural or not, the AI will take a bit more work if it is multithreaded.
Let me give a simple practical example of the difference between the procedural AI approach where the logic state is stored in the program counter and the bottom up object AI where the logic state is stored as data, and the consequences for performance. Consider a decision which is part of an AI routine:
Procedural AI with branching:
If A > B then goto xxxx
If A < B then goto yyyy
C = 0
store C
xxxx C = true
store C
yyyy C = false
store C
How do we know what the result of our test is? Answer - where the program branches, and the content of the stack tells you - ie. the logic state is stored in the program counter.
This example is simplistic and out of context, and depending on what state you are talking about, it may or may not be relevant to the program counter.
From the point of view of the hardware context during the execution of the program, yes, there's some information in the counter. If the AI were free to interrupt the core while it was running itself, then the counter would matter. Since it can't, or at the very least shouldn't if meaningful output is desired, it doesn't matter.
In theory, an improperly threaded AI could do something like that, and nothing but garbage would result.
From the logical state of the AI simulation, it looks like this:
Simulation time slice 1 C = whatever it was
*simulation update: everything happens here*
Simulation time slice 2 C = whatever it turns out to be
More complex logic states in which more than one test value is stored on the program counter are possible: for example if you call a function or subroutine at the branch, you store the current logic status (which represents the results of all the compounded branch decisions up to now that brought to the current execution point) as the return address on the stack, and the latest test result as the address in the program counter of the new branch. If you want to store the results in a logic table, you add code to store it in the appropriate branch.
If you want to do anything with C, there's going to be a branch or a select statement set up anyway.
In this easy 2-way situation, a select statement would work well.
Branchless AI treating results as data:
C = A - B
sign extend C
store C
This isn't an AI concept. It's a code optimization within a procedure for a very specific (and simplistic) relationship between only two data elements.
If the relationship between the data values is that simplistic, then the use of that really bad branch example would be a sign of a coding oversight, not a limitation of procedure. Then again, a good optimizing compiler might be able to do that transformation anyway.
It has nothing to do with the conceptual layout of the AI.
How do we know what the results are? The value of C tells us. The program counter doesn't.
What about later when something is actually done with the value C? Odds are, program flow will not be identical, in which case the counter will reflect the difference.
What is the difference between the two?
1) There are no branches in the branchless AI method.
That's because you selected the sliver of code that had no branches in the program. No decision had been made, the program was going to do its own thing regardless of what C was.
It's just a meaninglessly twitching set of bits. In such case, yes an extension is better than a branch when the result isn't used on anything.
In real life, that code would immediately put a branch after that sign-extension because it would then have to do something with the value of C.
2) While the procedural AI method hogs an entire execution thread meaning that the code has to be repeatedly executed in a different object context for every object that has to processed.
This has more to do with the data layout of the simulation. A procedural program can do the same thing with a properly made procedure. Actually, your branchless AI code snippet would probably be found in the middle of a procedure.
Of course, this is only from the standpoint of a procedural program, which as has already been covered entirely unrelated to a procedural AI.
On the other hand the branchless AI code can be executed on the SPE on 4 objects simultaneously using SIMD because it treats the results as data, even if the results for each of the 4 objects are different.
This is a low-level hardware feature the SPE offers that the assembly code can utilize. A procedural program could do this as well, and it probably would if the simulated behavior is simple enough.
The procedural AI however can only handle one object at a time per thread because it only has one program counter, and results for different objects may branch different ways.
Why?
Watch this:
single version:
load A
load B
UpdateOneObject(A, B)
quad version:
load packed A
load packed B
UpdateFourObjectsUsingSIMD (A, B)
The data layout within the simulation is the limiting factor, not the programming paradigm or AI philosophy.
The branchless AI code can go even further - if the test being performed is a set of boolean bitwise operations on flags, and a 64 bit word is used to store the flag values of 64 objects, then 64 x 4 = 256 objects can be processed in parrellel.
I don't follow your math. The 64 boolean flags are being applied to the simple data states in the packed 4 objects. At the end of the computation, only the 4 objects would be altered.
I also don't get how it is that an SPE can handle the simultaneous computation of (64 + 2) initital data states four times in parallel. It can't, and its masking abilities are not that insanely flexible.
What set those boolean flags?
Is grouping 64 boolean flags even meaningful?
In what situation would they make any sense?
Why would a procedural program be unable to handle it?
What if the simulated environment has relationships that dynamically underfill or overflow the static data structure?
Mathematically speaking, anything that can be achieved by testing and branching can be converted into a set of inline boolean and arithmetic operations applied to data stored in tables.
The complexity and code expansion involved is enough reason to be very careful on where it is used.
Running AI computations in parrallel is possible with branchless operations in inline code and short loops. The inline code may be slightly longer, but it will run a lot faster.
On something beyond a simple herd simulation, the inline is going to get a lot longer. A lot of the code being run in parallel will wind up being discarded if you rely only on bit shifting and masking.
Cell doesn't offer a hyper exponential increase in throughput.
Branches in different threads don't affect one another, so branches don't do much to hinder multithreading. Memory aliasing, synchronization, and contention do, and these are limits of silicon and physics.
In practice some limited branching between batches of inline code will probably be required (more because the procedural mindset of microprocessor designers tends to ensure that more attention is paid to test-and-branch instructions than test-and-store instructions than anything else).
There will be branching because that will be the only way to choose between various sets of inline code.
If code is inlined, each stream must reflect a single logical outcome, or if it uses some kind of predication or bit select, it must burn limited throughput.
This is not a trivial increase in code size and resource pressure.
Branch inefficiency and the fact that the single threaded nature of procedural AI used in games so far isn't a problem for the limited level of AI we have had so far. However it does limit what is possible in terms of making the AI more complex and believable, ie. raising the level of intelligence.
The examples you have given for non-procedural AI are incredibly simplistic.
They are meant for behaviors that, while important, are not meant to be intelligent at all.
For number crunching, SIMD works. But the nucleus of branching can't be eliminated. You can't make a branchless strategic AI for more than a tiny problem because the amount of parallel execution needed to reflect all the outcomes of a series of branches will explode.
As for being deterministic, the same can be said for the weather system of the world or the behaviour of human beings. They are determined by physical laws and so they should be utterly predictable.
There's an entire realm of physics where all things are based on probabilities and are required to be uncertain.
They aren't because they are immensely complex and so can't be predicted. The same is true for the herd/crowd simulation - remember that each object can occupy millions of possible positions and each can interact with every other object.
Also if the object interacts with the user, a millisecond difference in user response may mean that objects end up in completely different locations. Try picking up the herding simulation where the herd interacts in the same way with the user, start at the same starting position, and see if you can end up twice with all the herd in the same positions - it is pretty well impossible.
Easy, do nothing at all. Zero is zero, and a deterministic system will reflect that.
The user's input is a great way to sneak in stochastic behavior. Fluctuations in reaction time or even the responses of the input device can introduce unexpected results. If you were to take a snapshot after every input buffer polling, however, every consequent action would be deterministic.