Ramblings on Garbage Collection

An RTSJ Perspective

Peter Dibble

24 April 2005

Neither Java nor RTSJ actually requires a garbage collector, but they don't supply any other way to free unused objects in the heap. Without something like garbage collection the JVM would eventually run out of available heap memory, but if a JVM implementor could depend on an unbounded supply of heap memory garbage collection would not be necessary.

This paper looks at garbage collection from the viewpoint of an RTSJ user and implementor.

Garbage Collection Algorithms

The basic garbage collector is a simple three-step process.

  1. It identifies a root set of objects. These are the objects that it knows are accessible. In the case of Java, these are objects reachable from references on the thread stacks (all types), and class objects (ignoring the complexity added by multiple class loaders).
  2. Then it runs a graph algorithm that finds all objects that can be reached from the root set.
  3. All the objects that can't be reached from the root set are garbage and can be freed.

The simple garbage collection model is "stop and collect." With this model, when the system decides to collect garbage it freezes all activity other than garbage collection. It runs the garbage collector and when collection is done it lets all the threads resume. This is a nice simple model. It is generally made even simpler by running it when there is not enough free memory to satisfy an object allocation request, or when the application requests garbage collection.

A stop-and-collect garbage collector provides a general resource for the runtime. Every algorithm that can benefit from periodic maintenance when the system is quiet can arrange to run that maintenance during the garbage collection pauses.

The maximum amount of time for simple garbage collection is at most some constant times the size of all memory areas plus the size of the thread stacks. This means that while a large heap will defer garbage collection, it may also make it worse when garbage collection finally happens.

If we're willing to introduce some complexity, garbage collection can have less impact on applications. For instance, there is no reason for garbage collection to block threads until they want to use a heap reference. Typical Java code won't go very far before using a heap reference, but with a collector that deferred blocking threads until the last moment a real-time thread with access to immortal and scoped memory might be coded to avoid garbage collection delays entirely.[1]

Garbage collectors that try to reduce their impact on response time fall into the incremental and concurrent families.

An incremental collector does its work in small bundles, perhaps running periodically or at each object allocation. Maintaining the reference graph is a problem for incremental collectors. Each processing bundle can't run the full reachability algorithm, so the garbage collector uses write blocks, which attach a little overhead to operations that store references. That overhead updates the reference graph if necessary. The garbage collector could put heavy overhead into the store operations and let them continuously maintain a current reference graph, but the cost would be high. Instead, the write blocks just pass data to the next bundle of garbage collection; for instance, the write block might queue the value it saw stored in the list of unprocessed references for the garbage collector.

The disadvantages of incremental collection are that it is less efficient than stop-and-collect collection, and the length or frequency of garbage collection has to be matched to the rate of garbage creation. If garbage collection falls behind garbage creation, the system may run out of free memory and need to wait until enough memory is freed.

When it is working right, an incremental collector reduces garbage collection delays to around a millisecond at a cost of a minor decrease in overall performance. Decreasing the garbage collection delay increases the performance penalty, so someone chooses the right balance between those two measures.

It is difficult to execute garbage collection while the heap is in use. At first thought, it doesn't seem that hard; the GC is identifying and freeing unreachable objects, and no amount of execution can make a dead object come back to life. Perhaps the system could let execution continue during GC with the worst case being that an object that becomes unreachable during GC might not be freed until the next pass of the GC.

Unfortunately, straightforward concurrent collection can be tricked into freeing objects that are still reachable, and that will crash the JVM. The following is an example of application code shuffling pointers to make a naive garbage collector mark a live object as unreachable:

obj1.a contains a reference to object x.

obj2.b contains null.

Garbage collection starts. It hasn't seen obj1 yet, but it has inspected obj2.

Now executing code copies the reference to x from obj1.a to obj2.b and stores null into obj1.a.

Garbage collection continues and looks at obj1. It sees that obj1.a contains null.

At the end the garbage collector has not seen any references to x and it frees x even though there is a reference to it in obj2.b

A concurrent collector runs garbage collection in a separate thread, probably on a separate processor. One way to build a working concurrent collector would use virtual memory to give the garbage collection thread a snapshot of the heap and root set. The garbage collector can operate on the copy of the heap and when it finds a free object it can tell the real heap to free it. When it is finished with a pass, the garbage collector gets a new copy of the heap.

A concurrent collector tries to move most of the garbage collection load off the processor(s) running application code. Ideally, concurrent collection would let the application run without garbage collection delays (unless it creates garbage so fast that the collector cannot keep up), but it is hard to reach that ideal. For instance, the kind of concurrent collector I described has to block the application while it is doing virtual-memory magic to make a copy of the heap and each time it does a "copy on write." Also, if the concurrent collector does not find a way to compact free space, the object allocator will have to search for free space. That is slow compared with the algorithm used when free space is always contiguous.

Garbage Collection and Real-Time

A simple garbage collection environment might run a mark and sweep collector each time the heap gets within some margin of full. The advantages of this approach are that it imposes no overhead between collections, and the location and length of the garbage collection delays are understandable. The disadvantage of that simple garbage collection system is that the garbage collection delays can be hundreds of milliseconds and if you defer them by using a bigger heap the duration of the delay lengthens in proportion.

Compaction of free space could be seen as distinct from garbage collection, but it is convenient to combine them since once the garbage collector has collected all the references to each object the compaction stage can easily adjust those references.[2]

Compaction is important for two reasons:

  1. Fragmentation can cause an allocation to be unable to find enough contiguous storage for a new object even when there is adequate free memory.
  2. If free memory is contiguous, allocation is a constant-time operation. If free memory can be fragmented, worst-case allocation time is linear in the size of the heap.

Complicated garbage collection systems can give much better typical performance than mark and sweep. This is a good thing…mostly. As the collector gets more complicated it gets more difficult to understand its operation and the system architect may be forced to simply trust that it will "do the right thing." This is especially unsettling when application correctness depends heavily on the performance details of the garbage collector.

If the system architect understands the garbage collector, garbage collection can be coded into the application. With a simple collector this is done by scheduling garbage collection at convenient points in time—starting it with the System.gc() method. Provided that:

garbage collection will not impact real-time performance.

One minor addition to garbage collection complexity may be clearly beneficial to the above strategy. A generational garbage collector with a copying collector for the young generation will have relatively frequent fast collections with much longer delays between slow collections. If the fast collections are brief enough that the induced jitter is tolerable, the forced garbage collection schedule can consider only objects that are not collected in the young generation. In most applications, this will allow much less frequent scheduled garbage collection.

Some Special Garbage Collection Considerations for RTSJ

In theory, letting no-heap threads preempt garbage collection is trivially easy. No-heap threads are not permitted to touch heap references, so there is no reason for them to interact with the garbage collector.

It is not really that easy; for example:

  1. The JVM may have piled other stuff into the time the JVM is stopped for garbage collection. The RTSJ does not specifically say that no-heap threads won't pause to (for instance) reorganize the type space, but if such delays are longer than the timing "noise" already found in the system (caused by things like cache line loads), it would make the implementation much less useful. A good RTSJ implementation has to find alternate ways to accomplish all the things that a regular JVM does during GC intervals.
  2. The RTSJ allows any memory area to contain references to heap memory, and heap references stored in scoped memory can pose a challenge. Say a no-heap thread exits a scope reducing its reference count to zero, then it re-enters the scope and fills it with data. Suppose that the scope contained some heap references before it was emptied. Suppose further that the no-heap thread clears the scope while a garbage collector that includes compaction of free space is running. From the point of view of the garbage collector the action of the no-heap thread causes the heap references in the scope to change into other data. Unless the implementation handles this situation, the compaction function is likely to write new values where the heap references used to be.

Problem (2) looks bad, but it is really a disguised version of a non-blocking concurrency problem. One approach is to include code in scope exit that lets the garbage collector know when a no-heap thread triggers this problem. The garbage collector can then take measures to handle the problem—such as restarting collection or identifying the heap references that were stored in the scope and weeding them out of the garbage collector structures. The key is that only O(1) time (such as setting a flag) can be spent by the real-time thread.

[1] This is probably implementable, but the performance impact might be more than it is worth.

[2] Compaction of free space does not necessarily need to know all the references to each object. With one extra level of indirection the implementation can have only one pointer to the storage area of each object, and it can be made easy to find that pointer from the object storage.