Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 20-Dec-2005, 18:06   #1
Eleazar
Junior Member
 
Join Date: Nov 2005
Location: USA
Posts: 95
Send a message via MSN to Eleazar Send a message via Yahoo to Eleazar
Default SN Systems Andy Thomason next gen coding

Gamasutra is running an article on proper coding practices for the next gen. It covers cache misses, branch avoidance, inlining and a whole lotta other fun stuff. So head on over. I like this article because it pretty much somes up how this gen is going to be different than the past gen. I lot of it we have already said on these forums one way or the other but it is nice to see it all put in the perspective of someone in the field and who has done such a good job of organizing that information together.

http://www.gamasutra.com/features/20...mason_01.shtml
Eleazar is offline   Reply With Quote
Old 20-Dec-2005, 23:43   #2
DemoCoder
Regular
 
Join Date: Feb 2002
Location: California
Posts: 4,732
Default

I don't see how

Quote:
We can eliminate branches entirely by changing code of this kind:

if( x == 0 ) y = z;

to

y = x == 0 ? z : y;

and

if( ptr != 0 ) ptr->next = prev;

to

*( ptr != 0 ? &ptr->next : &dummy ) = prev;

With a good compiler, this will execute far faster than the code with branches that the “if” form will generate. Next-gen consoles have deep pipelines that require large uninterrupted function bodies to be able to schedule efficiently.
Is true, unless the instruction set you are compiling to contains a conditional assignment or predicated assignment instruction, otherwise a branch will still be required.
DemoCoder is offline   Reply With Quote
Old 21-Dec-2005, 01:00   #3
MfA
Regular
 
Join Date: Feb 2002
Posts: 5,221
Send a message via ICQ to MfA
Default

You can do it with simple boolean logic too.
MfA is offline   Reply With Quote
Old 21-Dec-2005, 01:11   #4
Mate Kovacs
Member
 
Join Date: Dec 2004
Location: Debrecen, Hungary
Posts: 163
Send a message via ICQ to Mate Kovacs
Default

You can do it without them.
For example, the C expression "a = (!b ? c : d)" is equivalent to the x86 asm
Code:
mov    eax,[b]
or     eax,eax
setz   bl
dec    bl
movsx  ebx,bl
mov    eax,[d]
and    eax,ebx
not    ebx
and    ebx,[c]
or     eax,ebx
mov    [a],eax
EDIT: MfA was quicker.
EDIT2: Maybe you can still think of setz as a "conditional assignment", but I'm pretty sure it could be done without it, too.

Last edited by Mate Kovacs; 21-Dec-2005 at 01:33.
Mate Kovacs is offline   Reply With Quote
Old 21-Dec-2005, 02:20   #5
Colourless
Monochrome wench
 
Join Date: Feb 2002
Location: Somewhere in outback South Australia
Posts: 1,255
Send a message via ICQ to Colourless Send a message via MSN to Colourless
Default

With x86 probably easier using the cmov instructions introduced with the P6. Microsofts x86_64 compiler uses it.

The instructions for 'a = (!b ? c : d)' will end up being this and only uses 1 register

Code:
mov eax, [b]
test eax, eax
cmovnz eax, [c]
cmovz eax, [d]
mov [a], eax
__________________
-Colourless

D3D FSAA Viewer 5.4
Words by Cat - Truely Intelligent Viewing

Last edited by Colourless; 21-Dec-2005 at 02:36. Reason: Added code
Colourless is offline   Reply With Quote
Old 21-Dec-2005, 02:52   #6
Mate Kovacs
Member
 
Join Date: Dec 2004
Location: Debrecen, Hungary
Posts: 163
Send a message via ICQ to Mate Kovacs
Default

Yep, but cmov definitely is what DemoCoder referred to as "a conditional assignment or predicated assignment instruction", IMO.

EDIT: BTW, you mixed up [c] and [d], so it's equivalent to "a = (b ? c : d)" now.

Last edited by Mate Kovacs; 21-Dec-2005 at 03:21.
Mate Kovacs is offline   Reply With Quote
Old 21-Dec-2005, 04:09   #7
Zengar
Member
 
Join Date: Dec 2003
Posts: 288
Default

A good compiler should recognize such cases and use conditional movs automatically. I don't see why I should change my coding practices...
At least pascal compiler,where you have no ? operator, does it that way.
Zengar is offline   Reply With Quote
Old 21-Dec-2005, 04:35   #8
Mate Kovacs
Member
 
Join Date: Dec 2004
Location: Debrecen, Hungary
Posts: 163
Send a message via ICQ to Mate Kovacs
Default

Yep, it all depends on the compiler. Even if you change your practices, but the compiler is stupid, you'll still get inefficient code. For example, since the ANSI C specification states that the inline keyword is only a 'hint', it's still up to the compiler.
Mate Kovacs is offline   Reply With Quote
Old 21-Dec-2005, 10:29   #9
DemoCoder
Regular
 
Join Date: Feb 2002
Location: California
Posts: 4,732
Default

Well, I suppose one could store the result of a condition in a boolean, say A, and then use the following boolean equation:

R = (A ? X : Y)

becomes

R = A * X + not(A) * Y

(* = AND, + = OR)

Of course, this is just poor man's predication, with A as the predicate.
DemoCoder is offline   Reply With Quote
Old 21-Dec-2005, 13:51   #10
Mate Kovacs
Member
 
Join Date: Dec 2004
Location: Debrecen, Hungary
Posts: 163
Send a message via ICQ to Mate Kovacs
Default

Yep, you got the basic idea. (BTW, simply storing a condition is just not enough (either in the C language, or at the x86 asm level), because it's either 0 or 1. You've to convert it, such that it's either 0 or all 1 bits.)
And yes, it's just "poor man's predication", but you don't need "a conditional assignment or predicated assignment instruction", which was our point.
Mate Kovacs is offline   Reply With Quote
Old 21-Dec-2005, 19:28   #11
JHoxley
Member
 
Join Date: Oct 2004
Location: South Coast, England
Posts: 391
DirectX

This whole micro-optimization stuff seems a bit silly at times.

I personally prefer the "Get it right then get it tight" approach - write some good clean code (free of such ugly micro-optimizations) and then profile it to work out where those micro optimizations will really make a difference.

Theres a lot to be said for writing good quality clean code - maintenance, (lack of) bugs, adaptability, portability...

Jack
JHoxley is offline   Reply With Quote
Old 21-Dec-2005, 20:43   #12
psurge
Member
 
Join Date: Feb 2002
Location: LA, California
Posts: 825
Default

Mate, do you know if MSVC/GCC __forceinline/ __attribute__ ((always_inline)) actually force inlining, or if they simply force the compiler to consider inlining even when optimizations are turned off?
psurge is offline   Reply With Quote
Old 21-Dec-2005, 21:22   #13
Mate Kovacs
Member
 
Join Date: Dec 2004
Location: Debrecen, Hungary
Posts: 163
Send a message via ICQ to Mate Kovacs
Default

Quote:
Originally Posted by JHoxley
I personally prefer the "Get it right then get it tight" approach - write some good clean code (free of such ugly micro-optimizations) and then profile it to work out where those micro optimizations will really make a difference.
Yep.
"Premature optimization is the root of all Evil." (? Knuth ?)

@psurge: I don't know. Honestly. I'll try to poke around.
Mate Kovacs is offline   Reply With Quote
Old 22-Dec-2005, 03:21   #14
ERP
Moderator
 
Join Date: Feb 2002
Location: Redmond, WA
Posts: 3,158
Default

Quote:
Originally Posted by JHoxley
This whole micro-optimization stuff seems a bit silly at times.

I personally prefer the "Get it right then get it tight" approach - write some good clean code (free of such ugly micro-optimizations) and then profile it to work out where those micro optimizations will really make a difference.

Theres a lot to be said for writing good quality clean code - maintenance, (lack of) bugs, adaptability, portability...

Jack
The issue with this is that these types of micro optimisations are hard to measure, any one might not have a significant impact, but thousands a frame can be significant.

Virtual function overhead is probably the most obvious one (other than it's hard to eliminate them after the fact) one virtual function call doesn't kill you (not even on PS2) but 10's or even 100's of thousands a frame can really hurt.

Anecdote --- A friend of mine was just realying his experience removing a lot of virtual function calls from the inner workings of a fairly major system on a cross platform product. The net result was almost no performance difference on PC and doubling of the performance one particular console. There is no way to estimate the impact of those virtual function calls without actually removing them.
ERP is online now   Reply With Quote
Old 23-Dec-2005, 17:48   #15
Xmas
Off-season
 
Join Date: Feb 2002
Location: On the pursuit of happiness
Posts: 3,019
Default

Quote:
Originally Posted by DemoCoder
unless the instruction set you are compiling to contains a conditional assignment or predicated assignment instruction, otherwise a branch will still be required.
And whether there is such a conditional assignment instruction or not, the lines with if are just as good or better even in the second case.
Xmas is offline   Reply With Quote
Old 23-Dec-2005, 18:41   #16
DemoCoder
Regular
 
Join Date: Feb 2002
Location: California
Posts: 4,732
Default

Yes, many compilers will have an almost identical internal representation, except for the fact that ?: is an expression, and 'if' is a statement. But they are otherwise identical. Much like for/while/dowhile.
DemoCoder is offline   Reply With Quote
Old 25-Dec-2005, 13:35   #17
Graham
Hello :-)
 
Join Date: Sep 2005
Location: Cambridge, UK
Posts: 1,307
Default

I can't help feel like I'm stepping back 5 years reading that article, when in fact it's aimed as a prediction of the next 5 years of development.

IMO, the choice of algorithms, and overall design structure will have a greater effect on performance than things such as choice of branch style.

He talks about about going to extreme lenghts to reduce memory overhead, then effectivly says 'inline everything'. ?! I've done that before... and I got an 8mb executable instead of 700k. Fantastic advice. Yes, selective inlining is very important, but this is usally done by a smart compiler, and will be obvious when it's needed with proper profiling. He also suggests templating as much as possible. Same deal, Code bloat.



Takes me back to the 'C is faster than C++' wars of days gone by.


"Calling malloc or the default new in a game loop is considered irresponsible". Urgh.
Graham is offline   Reply With Quote
Old 25-Dec-2005, 23:26   #18
Xmas
Off-season
 
Join Date: Feb 2002
Location: On the pursuit of happiness
Posts: 3,019
Default

Quote:
balancing texture LOD by adjusting mip-map bias is an important tool.
I strongly disagree. Get the texture content right and leave LOD bias alone, please.

Quote:
Use anisotropic filtering to sharpen textures instead of positive LOD bias.
That's certainly supposed to read negative.
Xmas is offline   Reply With Quote
Old 27-Dec-2005, 03:20   #19
ERP
Moderator
 
Join Date: Feb 2002
Location: Redmond, WA
Posts: 3,158
Default

Quote:
Originally Posted by Graham
I can't help feel like I'm stepping back 5 years reading that article, when in fact it's aimed as a prediction of the next 5 years of development.

IMO, the choice of algorithms, and overall design structure will have a greater effect on performance than things such as choice of branch style.

He talks about about going to extreme lenghts to reduce memory overhead, then effectivly says 'inline everything'. ?! I've done that before... and I got an 8mb executable instead of 700k. Fantastic advice. Yes, selective inlining is very important, but this is usally done by a smart compiler, and will be obvious when it's needed with proper profiling. He also suggests templating as much as possible. Same deal, Code bloat.



Takes me back to the 'C is faster than C++' wars of days gone by.


"Calling malloc or the default new in a game loop is considered irresponsible". Urgh.

To give you some idea of how far games are from general application develpment, many companies have a 0 runtime memory allocation policy (although it's less prevalent than it used to be). Not so long ago my games had no free, the only way to free memory was to revert the heap (actually just a stack) to a previously saved state.

Most of what's in the article can make a significant performance difference. Obviously these types of optimisation go hand in hand with good algorythm choices.

It's harder to do this type of optimisation as teams get bigger, development practices move more towards generally accepted large scale development. But as I mentioned above if you can enforce these types of optimisations they can be a significant performance win on todays console processors. IME on PC they make sod all difference.
ERP is online now   Reply With Quote
Old 31-Dec-2005, 08:19   #20
Chalnoth
 
Join Date: May 2002
Location: New York, NY
Posts: 12,678
Default

I don't buy it, ERP. Readable code is, these days, vastly more important than slightly faster code. Better to enforce programming practices that lead to stable, readable code than much less readable but a tiny bit faster code. As JHoxley said, better to write readable code first, then go back and examine where your code is spending all of its time and optimize there.

And, more importantly, most of these optimizations are things that should be handled by the compiler in the first place.
Chalnoth is offline   Reply With Quote
Old 31-Dec-2005, 12:30   #21
[maven]
Member
 
Join Date: Apr 2003
Location: DE
Posts: 645
Send a message via MSN to [maven]
Default

Quote:
Originally Posted by Chalnoth
I don't buy it, ERP. Readable code is, these days, vastly more important than slightly faster code. Better to enforce programming practices that lead to stable, readable code than much less readable but a tiny bit faster code. As JHoxley said, better to write readable code first, then go back and examine where your code is spending all of its time and optimize there.

And, more importantly, most of these optimizations are things that should be handled by the compiler in the first place.
But you're talking about different things. I agree that the compiler should worry about stuff like eliminating branches and such (if it's actually a win on the architecture you're on), but especially with the prevalence of streamed worlds, memory management is a very big issue (especially fragmentation), and no compiler can help you with that.
[maven] is offline   Reply With Quote
Old 31-Dec-2005, 18:30   #22
Zengar
Member
 
Join Date: Dec 2003
Posts: 288
Default

use garbage collectors
Zengar is offline   Reply With Quote
Old 31-Dec-2005, 20:22   #23
ERP
Moderator
 
Join Date: Feb 2002
Location: Redmond, WA
Posts: 3,158
Default

Quote:
Originally Posted by Chalnoth
I don't buy it, ERP. Readable code is, these days, vastly more important than slightly faster code. Better to enforce programming practices that lead to stable, readable code than much less readable but a tiny bit faster code. As JHoxley said, better to write readable code first, then go back and examine where your code is spending all of its time and optimize there.

And, more importantly, most of these optimizations are things that should be handled by the compiler in the first place.
This isn't about readable code vs faster code.
It's about picking solutions that won't cripple you in the long run, malloc/new is an inherently expensive operation, even a trivial allocator will do a linear walk of a linked list. And Free is worse.

With some thought you can usually (and this isn'ty true of very dynamic content) eliminate the allocations all together, it is not easy to do this after the fact. Virtual functions are even harder to remove, if you decide you "need to optimise".

I conside spending days to save 1/10th of a millisecond worthwhile. But I'd rather not have to refactor large portions of a codebase to do it.

While I agree in principle with the "premature optimisation is the root of all evil" stuff there is a certain class of optimisations that have to be done during your initial implementation for them to be practical. And while one case of malloc or a virtual function call may not be measurable, the 10's of thousands of them that tend to get made a frame can be a much cost than the 1/10 of a millisecond I was willing to spend several days saving.

C++ gives you a lot of rope (tools) to hang yourself with in terms of performance and if the "bad" patterns are prevalent in the codenase they are extremely difficult (read impossible in any reasonable timeframe) to address after the fact.

What is generally taught as "Good OO" design is not the same as good design for a game necessarilly.

One of the things I lament is that a lot of programmers coming out of college now have no real concept of how the code they are writing will be compiled and what the impact of that on chache and application performance will be.
ERP is online now   Reply With Quote
Old 31-Dec-2005, 20:25   #24
ERP
Moderator
 
Join Date: Feb 2002
Location: Redmond, WA
Posts: 3,158
Default

Quote:
Originally Posted by Zengar
use garbage collectors
I've been thinking about this for a long time, I think it makes a lot of sense for a lot of game code, where performance isn't necessarilly that important. Problem is the compiler ideally needs to support it.

I do think using handle based memory allocation for unpredictably sized allocations in very dynamic environments (say streaming player modified worlds) could be a win, at least you have some recourse when the allocator fails, and you can effectively defrag the heap.
ERP is online now   Reply With Quote
Old 01-Jan-2006, 09:36   #25
MfA
Regular
 
Join Date: Feb 2002
Posts: 5,221
Send a message via ICQ to MfA
Default

With a 64 bit adress space if allocate 4 GB a second (ie. a 32 bit memory space) it would take 1 million hours for fragmentation to hit you even if you didn't try to reclaim free'd memory for allocation
MfA is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 20:36.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.