Garbage collectors suck for games

Fight slight slowdown with much bigger slowdown?

I had assumed that since this was going down at object creation it wasn't that time sensitive. I think in C++ you're kind of obliged to build a hierarchy from a common base class if you want to keep your sanity anyway, in which case you can write a very fast dynamic cast, among other things. So in short, yeah, I probably missed the point. :oops:
 
About garbage collection and performance:

1) Reference counting has big performance overhead
Every time you are taking new refs to object or removing ref, you have overhead. When doing function calls with the obejcts you practically always do it. And as the overhead of removing a ref includes if (refcount == 0) brach, it's much worse for performance than simple update of value.

And if you try to use ref counting with objects that were not designed for it(not derived from some recountable base class/template), you have to add an additional intermediate layer which means you have 2-level memory references which makes those objects slower to use.

2) generational garbage collection makes memory allocation much faster.
If you have a generational garbage collection, malloc() is just one addition and comparison (actually the whole malloc overhead is same as the overhead of removing reference when ref counting)
That is the best case at best. Garbage collection doesn't do anything magical.

3) There are real-time garbage collectors.
If you are using real-time garbage collector, there are NONE of those "GC pauses" people seem to be so afraid of.
Oh really, how could this be possible?! No system wide garbage collector will ever know for sure what the non-trivial application is intending. Heuristic strategies my a$$! Memory fragmentation can only truly be solved by code that is designed with all the specific data transformations in mind. PERIOD!
 
Last edited by a moderator:
That is the best case at best. Garbage collection doesn't do anything magical.

Oh really, how could this be possible?! No system wide garbage collector will ever know for sure what the non-trivial application is intending. Heuristic strategies my a$$! Memory fragmentation can only truly be solved by code that is designed with all the specific data transformations in mind. PERIOD!

There is nothing magical about real-time-garbage collectors. And the garbage-collector does not have to know what program is intending to do.

The idea is that there will never be a long pause, instead the garbace collection activates often and does it's job in so small parts that it does not ruin the real-time requirements. (just like ref counting does with every ref update)

When no objects are created, there is no need to run gc.
And when objects are created, we can run small part of gc for every new operation. The amount of gc work required is propotional to number of new operations executed.

Do some googling and you will find many papers about real-time GC algorithms.
 
Last edited by a moderator:
There is nothing magical about real-time-garbage collectors. And the garbage-collector does not have to know what program is intending to do.

The idea is that there will never be a long pause, instead the garbace collection activates often and does it's job in so small parts that it does not ruin the real-time requirements. (just like ref counting does with every ref update)

When no objects are created, there is no need to run gc.
And when objects are created, we can run small part of gc for every new operation. The amount of gc work required is propotional to number of new operations executed.

Do some googling and you will find many papers about real-time GC algorithms.
??? Application X needs an extremely heterogenous mass of memory objects allocated within four miliseconds, using partially an even more complicated mass of outdated memory allocations. How could a garbage collector possibly accomplish this? It can't because it doesn't comprehend what objects should be packed in it's context and time frame. It doesn't know that an allocation of 42087 objects of type x,y,z can in fact be seen as a 14 blocks of memory and have to be available at time t.

It is IMPOSSIBLE for generalistic garbage collectors to solve things like the memory fragmentation problems in real time because it will never have the brain of the programmer who understands the flow of data.

What you are saying makes clear that you (like many other people) do not understand what "real time" means.
 
??? Application X needs an extremely heterogenous mass of memory objects allocated within four miliseconds, using partially an even more complicated mass of outdated memory allocations. How could a garbage collector possibly accomplish this? It can't because it doesn't comprehend what objects should be packed in it's context and time frame. It doesn't know that an allocation of 42087 objects of type x,y,z can in fact be seen as a 14 blocks of memory and have to be available at time t.

What do you mean by "memory block" here?
(it's quite general term, can mean quite many different things)

So you will write a custom allocator method to allocate these 42087 objects from you own memory pool?
Maybe you should have allocated/created them before the inner loop then?

It is IMPOSSIBLE for generalistic garbage collectors to solve things like the memory fragmentation problems in real time because it will never have the brain of the programmer who understands the flow of data.

Memory fragmentation is equally problem with manual memory management than GC.
And it can be solved quite effectively by always allocating power of 2 chunks of memory, and requiring full alignment for these chunks. This will however will increase the memory usage by a lot.

What you are saying makes clear that you (like many other people) do not understand what "real time" means.

Or is it you who does not understand that?

Real-time does not mean "high performance", it means "guaranteed performance".

Real-time garbage collection systems can guarantee, for example that "every new call will be executed in less than 10 milliseconds"(but most will happen much faster). If you have an inner loop which tries to do too many things and it's too heavy, and does not fill it's real-time requirements, you have a general performance problem which would occur also without GC.
 
Memory fragmentation is equally problem with manual memory management than GC.
And it can be solved quite effectively by always allocating power of 2 chunks of memory, and requiring full alignment for these chunks. This will however will increase the memory usage by a lot.
Wich remembers me of...
http://msdn.microsoft.com/en-us/library/aa366750.aspx

I couldn't find a better description of how it works, altought it is used for "explicit" memory allocations and a similiar trick could be used for a real time GC it have it's own problems of data locality, generally not a good idea for GC.

On the other hand it doesn't mean GC can't be real time, the real problem is real-time memory allocation not GC itself.
 
What do you mean by "memory block" here?
(it's quite general term, can mean quite many different things)

So you will write a custom allocator method to allocate these 42087 objects from you own memory pool?
Maybe you should have allocated/created them before the inner loop then?

Memory fragmentation is equally problem with manual memory management than GC.
The point is that custom memory management is ALWAYS needed for real time applications with significant workloads, so a garbage collector is superfluous for everything that is performance critical.

And it can be solved quite effectively by always allocating power of 2 chunks of memory, and requiring full alignment for these chunks. This will however will increase the memory usage by a lot.
No, this does not solve memory fragmentation. Everything on the heap is aligned anyway.

Or is it you who does not understand that?

Real-time does not mean "high performance", it means "guaranteed performance".

Real-time garbage collection systems can guarantee, for example that "every new call will be executed in less than 10 milliseconds"(but most will happen much faster). If you have an inner loop which tries to do too many things and it's too heavy, and does not fill it's real-time requirements, you have a general performance problem which would occur also without GC.
Argh! I won't waste my time arguing about such trivial things any further when it is obvious that the topic goes well over your head.
 
I'm clearly missing some nuance of the discussion here, but why not dynamic_cast in this situation? You are trying to ref the parent, correct?
I don't know if I understand what you want to do with that (or how dynamic_cast works :) ), but doesn't that still require you to have at least a pointer to the parent? I expect that you need to know the base type of the parent as well, which can be complicated when there is no single ancestor class.
 
@hkultala: the time a GC needs to spend isn't deterministic. That means: when it starts to work, there is no way to determine beforehand how long it's going to take to complete.

Like, most of the time noting has to happen (but all the pointers have to be checked anyway unless you use ref counting), other times a lot has to happen (for example, a manager object has been disposed, so all it's children have to be disposed as well).

You could schedule the large updates, like: when it takes more than 1ms, stop until the next run and then continue where you left off. But that leaves you with a problem when you run out of memory. You can prevent that by scheduling larger chunks of time, say 2ms, but you still run into a wall if there is a lot of dynamic allocation/deallocation of memory is going on.

And when the time spend isn't deterministic, it can never be real-time.

And it is often impossible to prevent many dynamic allocations/deallocations on an object-oriented platform, unless you can allocate everything you'll ever need right at the start and re-use it: you can only allocate instances of classes (like, a simple string) by calling their constructor. And if you're not happy with how those constructors behave, you're out of luck, because in most GC languages it's simply impossible to specify where in memory that new class is to be created.

If you would want that, you would have to start with a platform that doesn't use a GC in the first place, and write your own.
 
@hkultala: the time a GC needs to spend isn't deterministic. That means: when it starts to work, there is no way to determine beforehand how long it's going to take to complete.

malloc()/free() are non-deterministic as well. The only way to guarantee deterministic is control memory policy yourself. In fact, many parts of game engines are non-deterministic, it's why we get frame-rate sputter, which clearly shows that the problem is not limited to GC, but in general, related to unpredictable workloads.

And when the time spend isn't deterministic, it can never be real-time.

And it is often impossible to prevent many dynamic allocations/deallocations on an object-oriented platform, unless you can allocate everything you'll ever need right at the start and re-use it: you can only allocate instances of classes (like, a simple string) by calling their constructor. And if you're not happy with how those constructors behave, you're out of luck, because in most GC languages it's simply impossible to specify where in memory that new class is to be created.

False on several fronts. First, in systems with real time components, determinism isn't a global requirement, critical tasks just need to be guaranteed to run. So for example, you could write 90% of the system with default memory behavior under lower priority, and 10% using an immortal heap. Determinism isn't an "either-or" requirement and probably 90% of games shipped aren't deterministic, because interactions between artistic workloads are almost never 100% within budget especially in games with physics engines and lots of shit being created and moved around.

Real time garbage collectors do exist and they work. RTSJ includes the Henriksson GC for example.

However, in most cases this isn't needed. Azul has been shipping GCs for years with radically better behavior than regular GCs due to explicit hardware support, their worse case behavior on HUGE heaps is 25ms. They're now starting to bring those over to the x86 vs Zing, see http://www.google.com/url?sa=t&sour...1OTJDQ&usg=AFQjCNGw3hYPO9XZ8HTz4NrKRVHimn-50g

But beyond that, other options are scoped heaps, immortal heaps, and off-heaps. Scoped heaps are like allocation pools or stack allocoation, everything is freed when you're done. Immortal Heaps are heaps where the GC is turned off. It's for permanent datastructures where you know they will never need to be collected.

In any circumstance where you think you'd have a custom memory pool, you could equally well have a per-heap compacting GC. All GC is, is the ability to compute reachability. As long as you can bound the size of the search space, and the work that needs to be done, you have predictability. Moreover, just because you have a language with GC, doesn't mean everything you allocate has to be GC'ed.
 
With garbage collectors there is no real delineation between strong ownership and just references to the object, because ref loops don't generally cause leaks (though it's quite possible to write leaks). I do agree that it's still important to understand ownership at a semantic level even in a GC language.

Many modern GC systems do permit more fined grained reference semantics however. Java for example has Soft references, Weak references, and Phantom references. You can build manager objects or caches using soft which permit them to be collected even though there's a reachable register/field that still points to them.
 
I don't know if I understand what you want to do with that (or how dynamic_cast works :) ), but doesn't that still require you to have at least a pointer to the parent?

Actually I was just presenting it as an option, but basically if you have a scene graph you have to have a common node interface (so, basically anything 3d). dynamic_cast returns the pointer to the parent.

I expect that you need to know the base type of the parent as well, which can be complicated when there is no single ancestor class.

Well, technically you don't, as the op will return null or throw on a mismatch. dynamic_cast is just a type-safe cast. Honestly though I don't know why I suggested it, as personally I just use C-style implicit casts, because you should know the class hierarchy, and in practice you can implement a relatively efficient IsA method.

I try to avoid MI whenever possible, but I think that's where dynamic_cast (or similar in other languages) become worth the (quite hefty) overhead, as it allows proper RTTI, and I think generally allows the compiler to bust you for ambiguous upcasts.
 
Moreover, just because you have a language with GC, doesn't mean everything you allocate has to be GC'ed.

Doesn't that create a situation wherein non-GC'ed object A refs GC'ed object B, and is left dangling when B gets swept-up? Or are you talking about memory allocated as pointer-free?
 
Doesn't that create a situation wherein non-GC'ed object A refs GC'ed object B, and is left dangling when B gets swept-up? Or are you talking about memory allocated as pointer-free?

In general, you seek to avoid that situation. Some implementations perform runtime checks or static escape analysis to verify this doesn't occur. Some provide a type of inter-heap handle (IndirectReference), this is a form of weak reference, which can be nulled out if an object in the main heap is collected.

As with custom manual memory allocator, this doesn't absolute you of all responsibility, but it still provides benefit. In Real Time Java, there is even a special kind of thread, the NoHeapRealTimeThread, which is disallowed from touching any GC'ed heap, which means the code can be run with the assumption that memory operations can't disrupt it.
 
In Real Time Java, there is even a special kind of thread, the NoHeapRealTimeThread, which is disallowed from touching any GC'ed heap, which means the code can be run with the assumption that memory operations can't disrupt it.

I'm very partial to these types of hybrid (or conservative GC, if you prefer) arrangements, because--provided they can be threaded properly--this deals with the only significant long term performance concern, which is paging locality.
 
Back
Top