SN Systems Andy Thomason next gen coding

[maven] · Dec 31, 2005

Chalnoth said:
I don't buy it, ERP. Readable code is, these days, vastly more important than slightly faster code. Better to enforce programming practices that lead to stable, readable code than much less readable but a tiny bit faster code. As JHoxley said, better to write readable code first, then go back and examine where your code is spending all of its time and optimize there.

And, more importantly, most of these optimizations are things that should be handled by the compiler in the first place.

But you're talking about different things. I agree that the compiler should worry about stuff like eliminating branches and such (if it's actually a win on the architecture you're on), but especially with the prevalence of streamed worlds, memory management is a very big issue (especially fragmentation), and no compiler can help you with that.

Zengar · Dec 31, 2005

use garbage collectors

ERP · Dec 31, 2005

Chalnoth said:
I don't buy it, ERP. Readable code is, these days, vastly more important than slightly faster code. Better to enforce programming practices that lead to stable, readable code than much less readable but a tiny bit faster code. As JHoxley said, better to write readable code first, then go back and examine where your code is spending all of its time and optimize there.

And, more importantly, most of these optimizations are things that should be handled by the compiler in the first place.

This isn't about readable code vs faster code.
It's about picking solutions that won't cripple you in the long run, malloc/new is an inherently expensive operation, even a trivial allocator will do a linear walk of a linked list. And Free is worse.

With some thought you can usually (and this isn'ty true of very dynamic content) eliminate the allocations all together, it is not easy to do this after the fact. Virtual functions are even harder to remove, if you decide you "need to optimise".

I conside spending days to save 1/10th of a millisecond worthwhile. But I'd rather not have to refactor large portions of a codebase to do it.

While I agree in principle with the "premature optimisation is the root of all evil" stuff there is a certain class of optimisations that have to be done during your initial implementation for them to be practical. And while one case of malloc or a virtual function call may not be measurable, the 10's of thousands of them that tend to get made a frame can be a much cost than the 1/10 of a millisecond I was willing to spend several days saving.

C++ gives you a lot of rope (tools) to hang yourself with in terms of performance and if the "bad" patterns are prevalent in the codenase they are extremely difficult (read impossible in any reasonable timeframe) to address after the fact.

What is generally taught as "Good OO" design is not the same as good design for a game necessarilly.

One of the things I lament is that a lot of programmers coming out of college now have no real concept of how the code they are writing will be compiled and what the impact of that on chache and application performance will be.

ERP · Dec 31, 2005

Zengar said:
use garbage collectors

I've been thinking about this for a long time, I think it makes a lot of sense for a lot of game code, where performance isn't necessarilly that important. Problem is the compiler ideally needs to support it.

I do think using handle based memory allocation for unpredictably sized allocations in very dynamic environments (say streaming player modified worlds) could be a win, at least you have some recourse when the allocator fails, and you can effectively defrag the heap.

MfA · Jan 1, 2006

With a 64 bit adress space if allocate 4 GB a second (ie. a 32 bit memory space) it would take 1 million hours for fragmentation to hit you even if you didn't try to reclaim free'd memory for allocation

KimB · Jan 1, 2006

MfA said:
With a 64 bit adress space if allocate 4 GB a second (ie. a 32 bit memory space) it would take 1 million hours for fragmentation to hit you even if you didn't try to reclaim free'd memory for allocation

Er, doesn't that assume that you have the full amount of memory in the computer that the 64-bit processor can address?

MfA · Jan 1, 2006

Sorry, my mind jumped ahead a bit there. When I read dynamically sized I assumed the allocations were going to be relatively large ... and for large allocations you simply allocate complete pages.

ERP · Jan 2, 2006

MfA said:
Sorry, my mind jumped ahead a bit there. When I read dynamically sized I assumed the allocations were going to be relatively large ... and for large allocations you simply allocate complete pages.

Except of course that graphics data, and audio data generally has to be in physically contiguous memory and isn't mapped through the page table.

Frank · Jan 2, 2006

I've done some analysis of these things every few years.

Long ago, (in the eighties), it was definitely worthwile. The only compiler that did a good job at optimizing things was Watcom at that time. The compilers from Borland and Microsoft did stupid things like:

Code:

MOV AX, [var1]
MOV BX, AX
MOV BX, [var2]
MOV AX, [var2]
MOV CX, [var1]
ADD AX, CX

That was due to not having much memory and limiting the amount of different passes as much as possible. But that changed drastically.

With Borland Pascal 7/8, and later with Delphi 4, the compiler did a *MUCH* better job at optimizing anything I could come up with, than I could do myself. Just rewriting the code to use different flow structures didn't matter anyting at all, the inner part of the resulting code was mostly the same anyway. And it was pretty hard to understand how it compared to the code I wrote.

Unrolling things, inlining functions, changing all sequences, using very strange (and sometimes even undocumented) opcodes, predicated fast-out, ignoring variables that weren't used or only resulted in a condition, even generating small tables of non-dependent conditions, it did it all.

Objects and data structures did make a difference, though. And although the memory manager did a much better job at it than the default Windows one or malloc, you could see the individual objects being created. I think it was with Delphi 5, that they started only allocating whatever parts that were actually used, for local one-shot ones. Although you will get it all if you use classes.

I think the main difference in optimizing is, where you do it. Are you going to optimize during the parsing step, or during the code generation step? What abstraction level is used? It might be, that most C/C++ compilers that target different architectures do it during the code generation, while compilers that only support a single target arcitecture can do it during the parsing stage.

When you do it at the parsing stage, you still have all the rich syntactical information that tells you what the programmer tried to accompilsh. You lose that if you do it during the code generation stage. In that case, it might make a (slight) difference how you write your code. But when done during the parsing, it doesn't really matter.

So, the way you define and handle your data structures makes a difference, almost all other things don't. Except for interfaces with the API and in general all other libraries. While the libraries itself are optimized, using different functions from different libraries is about the worst case for the code optimizing, as only the local scope is taken into consideration. Then again, it won't matter very much, the most visible case probably being virtual functions.

As for garbage collection: while that protects against memory leaks, there is quite some overhead associated with it, and it runs in batches, often triggered by low-memory conditions or allocations of large blocks of memory. So, it tends to generate much more work during the actual alloc/free process, while increasing the total amount of used memory, IMO.

ERP · Jan 2, 2006

A couple of points, to a certain extent I agree with the most of the comments, modern compilers do some amazing optimisations, I positively loathe having to debug in optimised builds on MS compilers. With LTCG on I've seen the compiler jump into the center of an unrelated function in a completly different modult because the same code pattern happens to exist there.

But it's important to know what compilers are good at and what they aren't, the optimisations done at the intermediate representation level, line inlining and constant folding, are generally done well, although there are limits to compiler inlining. The optimisations done at or close to the final machine representation are where it starts to get shakey. I think part of this is that on X86 there has been no real benefit to optimising beyond issue order, the core is so good at reordering code to hide cache and instruction latency there is really nothing for the compiler to do. The problem with these types of optimisations in general is that the data structures are harder to work with and they compete with each other, loop unrolling to hide a stall, messes with register spilling etc. Since Compilers are generally CPU specific, but platform agnostic they also don't optimise well for cache usage.

On the current crop of console processors IME none of the compilers are doing even a half decent job of instruction reordering to hide latency.

Going back to manually inlining, the issue here is that a compiler inlines based on a heuristic, generally some code size threshold. Some compilers will not inline if they haven't seen the function body before it's invocation (GCC springs to mind). I can inline based on usage patterns. Having said that I see manual inlining misused more often than I see it done in the right places. There is a fine line between removing call overhead and thrashing the ICache.

As for garbage collection, your right it's not an optimisation (at least not in most cases), it's a way to avoid one of the more prevalent bugs in most codebases. I spend more time at the end of a project finding writes to freed memory, memory leaks and NULL pointer dereferences than I care to count. I'm willing to pay some overhead in the 70% of the code where performance doesn't matter to eliminate these.

There are some very good incremental garbage collection algorythms, and in some cases a good garbage collected allocator will be MUCH faster than a generic new/delete implementation in C++. The collector in OCaml is widely reguarded as very good, and the language will out perform C/C++ where there are a lot of allocation/frees.

Frank · Jan 2, 2006

Yes, I agree.

GCC is a good example of a broad-scoped compiler that optimizes in the code generation stage. But then again, I agree as well that inlining things isn't so important, unless done right. I would have to check this, but I dont think the code generated between Delphi 5 (lots of asm in the System unit) and Delphi 7 (everything in Pascal) is much different. The latter might even be better.

I think it's probably better to add a function in one of your own modules instead of using the pre-compiled library one, if you want the best speed. Let the compiler figure it out.

As for garbage collectors, I would like one for a large C/C++ project, just to have that run as intended, but for other languages it would be a toss-up. And looking at the memory usage of my C# projects, I really would want to be able to use free instead.

For an interesting example, try the MS SQL Reporting Services. Written in C#, by Microsoft. It will explode your memory usage until some time after it finished doing it's stuff. I got a lot of errors, that the .NET runtime was killed by the OS because of excessive memory usage...

But a reoccuring thing that I do tend to optimize at some stage is the data structures. From simple things, like using a dynamic array instead of a class, to adjusting the "record" sizes and indices of all the objects in the tree to allow for accessing them in batches.

ERP · Jan 2, 2006

DiGuru said:
Yes, I agree.
As for garbage collectors, I would like one for a large C/C++ project, just to have that run as intended, but for other languages it would be a toss-up. And looking at the memory usage of my C# projects, I really would want to be able to use free instead.

For an interesting example, try the MS SQL Reporting Services. Written in C#, by Microsoft. It will explode your memory usage until some time after it finished doing it's stuff. I got a lot of errors, that the .NET runtime was killed by the OS because of excessive memory usage...

Off topic, but C/C++/C#/Java are really hard to write a good garbage collector for. Most really good garbage collected languages actually determine the scope of a lot allocations at compile time, as you exit the scope they free the memory. So a lot of the memory allocation becomes nothing more than adding and subtracting from a stack. In C/C++/C# all variables are mutable by default and that makes it a lot harder to determine scope.

Having said that it should still be possible to use one of the more complex collectors efficiently, they generally work by having multiple grouping and marking things as "could be free" then migrating them incrementally to the free list as they establish that they really are orphaned.

Frank · Jan 2, 2006

Agreed. Strict scope (and indirectly: strict typing, as that will limit the scope further) will help a lot.

But then again:

ERP said:
Having said that it should still be possible to use one of the more complex collectors efficiently, they generally work by having multiple grouping and marking things as "could be free" then migrating them incrementally to the free list as they establish that they really are orphaned.

That is about how .NET does it. It will migrate them to the list, and put their destructor in the "pending" queue. But, as you said, it's hard to make sure. And, as .NET is actually interpreted p-code (an XML stream), most of those scopes are undefined during runtime.

To see this in action, write a fairly sized .NET program, create a breakpoint in any function that isn't directly accessed during startup but that will surely be hit, and run it from VS. It will show you a question mark in the breakpoint dot, to signify that no code is as yet loaded for that location. Because it's still out of scope. You get the rest.

KimB · Jan 2, 2006

DiGuru said:
I think the main difference in optimizing is, where you do it. Are you going to optimize during the parsing step, or during the code generation step? What abstraction level is used? It might be, that most C/C++ compilers that target different architectures do it during the code generation, while compilers that only support a single target arcitecture can do it during the parsing stage.

When you do it at the parsing stage, you still have all the rich syntactical information that tells you what the programmer tried to accompilsh. You lose that if you do it during the code generation stage. In that case, it might make a (slight) difference how you write your code. But when done during the parsing, it doesn't really matter.

I don't think that has to be the case. It certainly seems possible to me to have a compiler store information important for optimization during the parsing stage. Of course, the amount and type of information that should be stored would vary depending upon the architecture, so it wouldn't be a trivial thing to do, but it certainly should be possible.

Zengar · Jan 2, 2006

ERP said:
I've been thinking about this for a long time, I think it makes a lot of sense for a lot of game code, where performance isn't necessarilly that important. Problem is the compiler ideally needs to support it.

I do think using handle based memory allocation for unpredictably sized allocations in very dynamic environments (say streaming player modified worlds) could be a win, at least you have some recourse when the allocator fails, and you can effectively defrag the heap.

Take a look at www.digitalmars.com

This language is a very nice alternative to C++
Usually I can't stand C and it clones(it's a thing of taste I guess) but for c# and D I make an exception ;-)

As for garbage collection: if you use copying collector, your allocation is basically free(only one addition). The freeing is a rather complex operation, but pretty fast on modern compilers. It is clearly better that standart approach if you allocate small peaces of memory fairly often.

ERP · Jan 3, 2006

Zengar said:
Take a look at www.digitalmars.com

This language is a very nice alternative to C++
Usually I can't stand C and it clones(it's a thing of taste I guess) but for c# and D I make an exception ;-)

As for garbage collection: if you use copying collector, your allocation is basically free(only one addition). The freeing is a rather complex operation, but pretty fast on modern compilers. It is clearly better that standart approach if you allocate small peaces of memory fairly often.

I'm familar with the Hans Boehm collector, it's a conservative collector, meaning that it basically leaks, all beit slowly an probably with some upper bound on the leak. But that in and of itself rules it out for console use.

ERP · Jan 3, 2006

Zengar said:
Take a look at www.digitalmars.com

This language is a very nice alternative to C++
Usually I can't stand C and it clones(it's a thing of taste I guess) but for c# and D I make an exception ;-)

As for garbage collection: if you use copying collector, your allocation is basically free(only one addition). The freeing is a rather complex operation, but pretty fast on modern compilers. It is clearly better that standart approach if you allocate small peaces of memory fairly often.

Just realised you were refering to D.
Problem is the lack of high quality dev tools, that's why languages like this remain niche. They're not just competing with C# as a language but C# and visual studio 8.0.

I have a soft spot for the various ML variants, notably OCaml and F#. I really like the language, but I couldn't see it in widespread use because of the paradigm shift and the relative lack of support.

I think it would be an interesting excercise to design a language to specifically run well within the constraints of an SPE, and it is something I've been messing with, but I have no delusions of what it would take to get from working compiler (even fairly optimal) to usable toolset.

SN Systems Andy Thomason next gen coding

[maven]

Zengar

ERP

ERP

MfA

KimB

MfA

ERP

Frank

Certified not a majority

ERP

Frank

Certified not a majority

ERP

Frank

Certified not a majority

KimB

Zengar

ERP

ERP

Similar threads