How does the Garbage Collector work?

Currently the Garbage Collector is not very efficient, and I expect someone to improve it over time (I'd love to do it but I'm kind of busy on other pieces ;)
Are you interested to contribute?

In short, the Garbage Collector is a simple mark and sweep.
  • First we mark all the objects from the various roots (static objects, objects on the stack, registers).
    • Each class implements a Trace() method that calls recursively Trace() on all the referenced objects.
  • Then we parse the memory, skipping object by object (so big objects like arrays don't cost much, but a lot of little objects kill the GC).
  • If we find an object that has not been traced, we call the collect function that usually calls the destructor (read finalizer), then we free the memory.
  • The allocator is a bit optimized so we mark the free block one time for a group of freed objects, but nevertheless, the process is not very efficient.
  • Also because the collector is not moving, we have many holes that we have to maintain and reallocate.
  • In the allocator there is a kind of best fit when looking for blocks. Also we try to allocate the pointer by pushing an address pointer.
  • Note that sometime, we can use a block of 10 Kb to allocate thousands objects of few bytes (that's how we reuse free blocks).

Anyway this page is certainly not very clear... I expect somebody to change it to make the best GC on Earth ;)

I really would like to see a generational / moving GC.
I was thinking to have a special way to detect objects that have not been traced and that don't contain finalizer code.
In short I'd like to be able to collect thousands of objects without parsing each of them.
By using a bit map, it seems we could achieve some of this.

We could have a different list for the objects with finalizers.
Thinking about generation, I would like to skip process for some objects that I know won't be collected.
An example is the interned strings and types. Those are always in memory and will not be collected (at least until I completely implement assembly unloading ;).
So instead of tracing them / looking at collecting them, we might be able to put them in a special place (like end of memory, allocated in opposite manner).
It would be a special generation that no dynamic object could be able to be promoted to.

Another thing if somebody is looking at implementing the GC: currently only a single buffer can be allocated in order for the GC to work correctly.
Although for a limited platform, that's fine, we would need still to expand this a bit so it could manage efficiently multiple memory buffers.
And I guess if the GC becomes moving, the user could free some buffers after a collection.

Anyway, just brainstorming...

Last edited Sep 2, 2007 at 6:25 AM by OlivierNallet, version 2


No comments yet.