A toy generational garbage collector

Had a little downtime today and figured I’d make a toy generational garbage collector, for funsies. A friend of mine was once asked this as an interview question so I thought it might make for some good weekend practice.

For those not familiar, a common way of doing garbage collection in managed languages is to have the concept of multiple generations. All newly created objects go in gen0. New objects are also the most probably to be destroyed, as there is a lot of transient stuff that goes in an application. If an element survives a gc round it gets promoted to gen1. Gen1 doesn’t get GC’d as often. Same with gen2.

A GC cycle usually consists of iterating through application root nodes (so starting at main and traversing down) and checking to see where a reference lays in which generation. If we’re doing a gen1 collection, we’ll also do gen0 and gen1. However, if you’re doing gen0 only and a node is already laying in gen1, you can just bail early and say “meh, this node and all its references are probably ok for now, we’ll try this later“.

For a really great visualization checkout this msdn article on generation garabage collection.

And now on to the code! First lets start with what is an object

@Data
@EqualsAndHashCode(of = "id")
public class Node {
    private final String id;

    public Node(String id) {
        this.id = id;
    }

    private final List<Node> references = new ArrayList<>();

    public void addReference(Node node) {
        references.add(node);
    }

    public void removeReference(Node node) {
        references.removeIf(i -> i.getId().equals(node.getId()));
    }
}

For the purposes of the toy, its just some node with some unique id.

Lets also define an enum of the different generations we’ll support and their ordinal values

public enum Mode {
    Gen0,
    Gen1
}

Next, lets make an allocator who can allocate new nodes. This would be like a new syntax behind the scenes

public class Allocator {
    @Getter
    private Set<Node> gen0 = new HashSet<>();

    @Getter
    private Set<Node> gen1 = new HashSet<>();

    public Node newNode() {
        return newNode("");
    }

    public Node newNode(String tag) {
        final Node node = new Node(tag + UUID.randomUUID());

        getGen0().add(node);

        return node;
    }

    public Mode locateNode(Node tag) {
        if (gen1.contains(tag)) {
            return Mode.Gen1;
        }

        return Mode.Gen0;
    }
    ....

At this point we can allocate a new node, and assign nodes references.

final Allocator allocator = new Allocator();

final Node root = allocator.newNode();

root.addReference(allocator.newNode());
root.addReference(allocator.newNode());
root.addReference(allocator.newNode());

Still haven’t actually collected anything though yet. So lets write a garbage collector

public class Gc {
    private final Allocator allocator;

    public Gc(Allocator allocator) {
        this.allocator = allocator;
    }

    public void collect(Node root, Mode mode) {
        final Allocator.Marker marker = allocator.markBuilder(mode);

        mark(root, marker, mode);

        marker.sweep();
    }

    private void mark(Node root, Allocator.Marker marker, Mode mode) {
        final Mode found = allocator.locateNode(root);

        if (found.ordinal() > mode.ordinal()) {
            return;
        }

        marker.mark(root);

        root.getReferences().forEach(ref -> mark(ref, marker, mode));
    }
}

The GC dos a DFS on the root object reference and marks all visible nodes with some marker builder (yet to be shown). If the generational heap that the node lives in is less than or equal to the mode we are on, we’ll mark it, otherwise just skip it. This works because later we’ll only prune from generation heaps according to the mode

Now comes the fun part, and its the marker

public static class Marker {

    private final Set<String> marks;
    private final Allocator allocator;

    private final Mode mode;

    public Marker(Allocator allocator, final Mode mode) {
        this.allocator = allocator;
        this.mode = mode;
        marks = new HashSet<>();
    }

    public void mark(Node node) {
        marks.add(node.getId());
    }

    public void sweep() {
        Predicate<Node> remove = node -> !marks.contains(node.getId());

        allocator.getGen0().removeIf(remove);

        allocator.promote(Mode.Gen0);

        switch (mode) {
            case Gen0:
                break;
            case Gen1:
                allocator.getGen1().removeIf(remove);
                allocator.promote(Mode.Gen1);
                break;
        }
    }
}

All we do here is when we mark, tag the node in a set. When we go to sweep, go through the generations less than or equal to the current and remove unmarked nodes, as well as promote the surviving nodes to the next heap!

Still missing two last functions in the allocator which are promote and the marker builder

public Marker markBuilder(final Mode mode) {
    return new Marker(this, mode);
}

private void promote(final Mode mode) {
    switch (mode) {
        case Gen0:
            gen1.addAll(gen0);
            gen0.clear();
            break;
        case Gen1:
            break;
    }
}

Now we can put it all together and write some tests:

Below you can see the promotion in action.

final Allocator allocator = new Allocator();

final Gc gc = new Gc(allocator);

final Node root = allocator.newNode();

root.addReference(allocator.newNode());
root.addReference(allocator.newNode());
root.addReference(allocator.newNode());

final Node removable = allocator.newNode("remove");

removable.addReference(allocator.newNode("dangle1"));
removable.addReference(allocator.newNode("dangle2"));

root.addReference(removable);

assertThat(allocator.getGen0().size()).isEqualTo(7);

gc.collect(root, Mode.Gen0);

assertThat(allocator.getGen0().size()).isEqualTo(0);
assertThat(allocator.getGen1().size()).isEqualTo(7);

Nothing can be collected since all nodes have references, but we’ve cleared the gen0 and moved all nodes to gen1

root.removeReference(removable);

gc.collect(root, Mode.Gen1);

assertThat(allocator.getGen0().size()).isEqualTo(0);
assertThat(allocator.getGen1().size()).isEqualTo(4);

Now we can actually remove the reference and do a gen1 collection. You can see now that the gen1 heap size went down by 3 (so the removable node, plus its two children) since those nodes are no longer reachable

And just for fun, lets show that gen1 collection works as well

final Node gen1Remove = allocator.newNode();

root.addReference(gen1Remove);

gc.collect(root, Mode.Gen1);

assertThat(allocator.getGen0().size()).isEqualTo(0);
assertThat(allocator.getGen1().size()).isEqualTo(5);

root.removeReference(gen1Remove);

gc.collect(root, Mode.Gen1);

assertThat(allocator.getGen0().size()).isEqualTo(0);
assertThat(allocator.getGen1().size()).isEqualTo(4);

And there you have it, a toy generational garbage collector :)

For the full code, check out this gist

4 comments

  1. Shai Almog

    That’s a very interesting way to explain GC’s.

    We built a concurrent GC for Codename One but it isn’t generational as that would be quite challenging. In the past we integrated ARC like behavior into the GC but that didn’t perform as nicely as we’d hoped.

    We went thru a lot of challenges to get concurrency to work correctly. The devil is deeply within the details and synchronization is probably the biggest real world challenge when building a GC.

    The ParparVM code is written in Java as generated C source code, you can actually see the generated “mark” methods in the C code that ParparVM generates: https://github.com/codenameone/CodenameOne/tree/master/vm

    • Anton Kropp

      Yeah it’s funny you mention it. I was certainly thinking about the concurrency problem when playing with this as doing an entire traversal of all nodes requires a full stop the world. I can only imagine the difficulties doing a concurrent GC where you need to be able to guarantee throughput of allocation as well as consistent cleanup. Thanks for sharing!

      • Shai Almog

        The problem with representing this in Java is that you don’t have the stacks.

        E.g. when allocating every thread does its own allocation and stack work so individually every thread is “single threaded”. So you just need the one thread to stop so you can mark its stack.

        There are some complexities though e.g. object being passed from one thread to another and static context which is accessed by multiple threads.

        Getting into this really helped me understand the benefits/complexities of GC a lot, we avoided a lot of the locking logic and were able to make a very efficient implementation.

Post a comment

You may use the following HTML:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>