SN Systems Andy Thomason next gen coding

Eleazar

Newcomer
Gamasutra is running an article on proper coding practices for the next gen. It covers cache misses, branch avoidance, inlining and a whole lotta other fun stuff. So head on over. I like this article because it pretty much somes up how this gen is going to be different than the past gen. I lot of it we have already said on these forums one way or the other but it is nice to see it all put in the perspective of someone in the field and who has done such a good job of organizing that information together.

http://www.gamasutra.com/features/20051220/thomason_01.shtml
 
I don't see how

We can eliminate branches entirely by changing code of this kind:

if( x == 0 ) y = z;

to

y = x == 0 ? z : y;

and

if( ptr != 0 ) ptr->next = prev;

to

*( ptr != 0 ? &ptr->next : &dummy ) = prev;

With a good compiler, this will execute far faster than the code with branches that the “ifâ€￾ form will generate. Next-gen consoles have deep pipelines that require large uninterrupted function bodies to be able to schedule efficiently.

Is true, unless the instruction set you are compiling to contains a conditional assignment or predicated assignment instruction, otherwise a branch will still be required.
 
You can do it without them.
For example, the C expression "a = (!b ? c : d)" is equivalent to the x86 asm
Code:
mov    eax,[b]
or     eax,eax
setz   bl
dec    bl
movsx  ebx,bl
mov    eax,[d]
and    eax,ebx
not    ebx
and    ebx,[c]
or     eax,ebx
mov    [a],eax
EDIT: MfA was quicker. :)
EDIT2: Maybe you can still think of setz as a "conditional assignment", but I'm pretty sure it could be done without it, too. :D
 
Last edited by a moderator:
With x86 probably easier using the cmov instructions introduced with the P6. Microsofts x86_64 compiler uses it.

The instructions for 'a = (!b ? c : d)' will end up being this and only uses 1 register

Code:
mov eax, [b]
test eax, eax
cmovnz eax, [c]
cmovz eax, [d]
mov [a], eax
 
Last edited by a moderator:
  • Like
Reactions: Geo
Yep, but cmov definitely is what DemoCoder referred to as "a conditional assignment or predicated assignment instruction", IMO. :)

EDIT: BTW, you mixed up [c] and [d], so it's equivalent to "a = (b ? c : d)" now. :)
 
Last edited by a moderator:
A good compiler should recognize such cases and use conditional movs automatically. I don't see why I should change my coding practices...
At least pascal compiler,where you have no ? operator, does it that way.
 
Yep, it all depends on the compiler. Even if you change your practices, but the compiler is stupid, you'll still get inefficient code. For example, since the ANSI C specification states that the inline keyword is only a 'hint', it's still up to the compiler. :)
 
Well, I suppose one could store the result of a condition in a boolean, say A, and then use the following boolean equation:

R = (A ? X : Y)

becomes

R = A * X + not(A) * Y

(* = AND, + = OR)

Of course, this is just poor man's predication, with A as the predicate. :)
 
Yep, you got the basic idea. (BTW, simply storing a condition is just not enough (either in the C language, or at the x86 asm level), because it's either 0 or 1. You've to convert it, such that it's either 0 or all 1 bits.)
And yes, it's just "poor man's predication", but you don't need "a conditional assignment or predicated assignment instruction", which was our point. :)
 
This whole micro-optimization stuff seems a bit silly at times.

I personally prefer the "Get it right then get it tight" approach - write some good clean code (free of such ugly micro-optimizations) and then profile it to work out where those micro optimizations will really make a difference.

Theres a lot to be said for writing good quality clean code - maintenance, (lack of) bugs, adaptability, portability...

Jack
 
Mate, do you know if MSVC/GCC __forceinline/ __attribute__ ((always_inline)) actually force inlining, or if they simply force the compiler to consider inlining even when optimizations are turned off?
 
JHoxley said:
I personally prefer the "Get it right then get it tight" approach - write some good clean code (free of such ugly micro-optimizations) and then profile it to work out where those micro optimizations will really make a difference.
Yep.
"Premature optimization is the root of all Evil." (? Knuth ?)

@psurge: I don't know. Honestly. I'll try to poke around. :)
 
JHoxley said:
This whole micro-optimization stuff seems a bit silly at times.

I personally prefer the "Get it right then get it tight" approach - write some good clean code (free of such ugly micro-optimizations) and then profile it to work out where those micro optimizations will really make a difference.

Theres a lot to be said for writing good quality clean code - maintenance, (lack of) bugs, adaptability, portability...

Jack

The issue with this is that these types of micro optimisations are hard to measure, any one might not have a significant impact, but thousands a frame can be significant.

Virtual function overhead is probably the most obvious one (other than it's hard to eliminate them after the fact) one virtual function call doesn't kill you (not even on PS2) but 10's or even 100's of thousands a frame can really hurt.

Anecdote --- A friend of mine was just realying his experience removing a lot of virtual function calls from the inner workings of a fairly major system on a cross platform product. The net result was almost no performance difference on PC and doubling of the performance one particular console. There is no way to estimate the impact of those virtual function calls without actually removing them.
 
DemoCoder said:
unless the instruction set you are compiling to contains a conditional assignment or predicated assignment instruction, otherwise a branch will still be required.
And whether there is such a conditional assignment instruction or not, the lines with if are just as good or better even in the second case.
 
Yes, many compilers will have an almost identical internal representation, except for the fact that ?: is an expression, and 'if' is a statement. But they are otherwise identical. Much like for/while/dowhile.
 
I can't help feel like I'm stepping back 5 years reading that article, when in fact it's aimed as a prediction of the next 5 years of development.

IMO, the choice of algorithms, and overall design structure will have a greater effect on performance than things such as choice of branch style.

He talks about about going to extreme lenghts to reduce memory overhead, then effectivly says 'inline everything'. ?! I've done that before... and I got an 8mb executable instead of 700k. Fantastic advice. Yes, selective inlining is very important, but this is usally done by a smart compiler, and will be obvious when it's needed with proper profiling. He also suggests templating as much as possible. Same deal, Code bloat.



Takes me back to the 'C is faster than C++' wars of days gone by.


"Calling malloc or the default new in a game loop is considered irresponsible". Urgh.
 
balancing texture LOD by adjusting mip-map bias is an important tool.
I strongly disagree. Get the texture content right and leave LOD bias alone, please.

Use anisotropic filtering to sharpen textures instead of positive LOD bias.
That's certainly supposed to read negative.
 
  • Like
Reactions: Geo
Graham said:
I can't help feel like I'm stepping back 5 years reading that article, when in fact it's aimed as a prediction of the next 5 years of development.

IMO, the choice of algorithms, and overall design structure will have a greater effect on performance than things such as choice of branch style.

He talks about about going to extreme lenghts to reduce memory overhead, then effectivly says 'inline everything'. ?! I've done that before... and I got an 8mb executable instead of 700k. Fantastic advice. Yes, selective inlining is very important, but this is usally done by a smart compiler, and will be obvious when it's needed with proper profiling. He also suggests templating as much as possible. Same deal, Code bloat.



Takes me back to the 'C is faster than C++' wars of days gone by.


"Calling malloc or the default new in a game loop is considered irresponsible". Urgh.


To give you some idea of how far games are from general application develpment, many companies have a 0 runtime memory allocation policy (although it's less prevalent than it used to be). Not so long ago my games had no free, the only way to free memory was to revert the heap (actually just a stack) to a previously saved state.

Most of what's in the article can make a significant performance difference. Obviously these types of optimisation go hand in hand with good algorythm choices.

It's harder to do this type of optimisation as teams get bigger, development practices move more towards generally accepted large scale development. But as I mentioned above if you can enforce these types of optimisations they can be a significant performance win on todays console processors. IME on PC they make sod all difference.
 
I don't buy it, ERP. Readable code is, these days, vastly more important than slightly faster code. Better to enforce programming practices that lead to stable, readable code than much less readable but a tiny bit faster code. As JHoxley said, better to write readable code first, then go back and examine where your code is spending all of its time and optimize there.

And, more importantly, most of these optimizations are things that should be handled by the compiler in the first place.
 
Back
Top