Ramblings on Garbage Collection
An RTSJ Perspective
24 April 2005
Neither Java nor RTSJ actually requires a garbage collector, but they don't supply any other way to free unused objects in the heap. Without something like garbage collection the JVM would eventually run out of available heap memory, but if a JVM implementor could depend on an unbounded supply of heap memory garbage collection would not be necessary.
This paper looks at garbage collection from the viewpoint of an RTSJ user and implementor.
The basic garbage collector is a simple three-step process.
The simple garbage collection model is "stop and collect." With this model, when the system decides to collect garbage it freezes all activity other than garbage collection. It runs the garbage collector and when collection is done it lets all the threads resume. This is a nice simple model. It is generally made even simpler by running it when there is not enough free memory to satisfy an object allocation request, or when the application requests garbage collection.
A stop and collect garbage collector provides a general resource for the runtime. Every algorithm than can benefit from periodic maintenance when the system is quiet can arrange to run that maintenance during the garbage collection pauses.
The maximum amount of time for simple garbage collection is at most some constant times the size of all memory areas plus the size of the thread stacks. This means that while a large heap will defer garbage collection, it may also make it worse when garbage colleciton finally happens.
If we're willing to introduce some complexity, garbage collection can have less impact on applications. For instance, there is no reason garbage collection to block threads until they want to use a heap reference. Typical Java code won't go very far before using a heap reference, but with a collector that deferred blocking threads until the last moment a real-time thread with access to immortal and scoped memory might be coded to avoid garbage collection delays entirely.
Garbage collectors that try to reduce their impact on response time fall into the incremental and concurrent families.
An incremental collector does its work in small bundles, perhaps running periodically or at each object allocation. Maintaining the reference graph is a problem for incremental collectors. Each processing bundle can't run the full reachability algorithm, so the garbage collector uses write blocks, which attach a little overhead to operations that store references. That overhead updates the reference graph if necessary. The garbage collector could put heavy overhead into the store operations and let them continuously maintain a current reference graph, but the cost would be high. Instead, the write blocks just pass data to the next bundle of garbage collection; for instance, the write block might queue the value it saw stored in the list of unprocessed references for the garbage collector.
The disadvantages of incremental collection are that it is less efficient than stop and collect collection, and the length or frequency of garbage collection has to be matched to the rate of garbage creation. If garbage collection falls behind garbage creation, the system may run out of free memory and need to wait until enough memory is freed.
When it is working right, an incremental collector reduces garbage collection delays to around a millisecond at a cost of a minor decrease in overall performance. Decreasing the garbage collection delay increases the performance penalty, so someone chooses the right balance between those two measures.
It is difficult to execute garbage collection while the heap is in use. At first thought, it doesn't seem that hard; the GC is identifying and freeing unreachable objects, and no amount of execution can make a dead object come back to life. Perhaps the system could let execution continue during GC with the worst case being that an object that becomes unreachable during GC might not be freed until the next pass of the GC.
Unfortunately, straightforward concurrent collection can be tricked into freeing objects that are still reachable, and that will crash the JVM. The following is an example of application code shuffling pointers to make a naive garbage collector mark a live object as unreachable:
contains a reference to object x.
collection starts. It hasn't seen obj1 yet, but it has inspected obj2.
code copies the reference to x from obj1.a to obj2.b and stores null into
collection continues and looks at obj1. It sees that obj1.a contains null.
At the end the
garbage collector has not seen any references to x and it frees x even though
there is a reference to it in obj2.b
A concurrent collector runs garbage collection in a separate thread, probably on a separate processor. One way to build a working concurrent collector would use virtual memory to give the garbage collection thread a snapshot of the heap and root set. The garbage collector can operate on the copy of the heap and when it finds a free object it can tell the real heap to free it. When it is finished with a pass, the garbage collector gets a new copy of the heap.
A concurrent collector tries to move most of the garbage collection load off the processor(s) running application code. Ideally concurrent collection would let the application run without garbage collection delays (unless it creates garbage so fast that the collector cannot keep up), but it is hard to reach that ideal. For instance, the kind of concurrent collector I described has to block the application while it is doing virtual-memory magic to make a copy of the heap and each time it does a "copy on write." Also, if the concurrent collector does not find a way to compact free space, the object allocator will have to search for free space. That is slow compared with the algorithm used when free space is always contiguous.
A simple garbage collection environment might run a mark and sweep collector each time the heap gets within some margin of full. The advantages of this approach are that it imposes no overhead between collections, and the location and length of the garbage collection delays are understandable. The disadvantage of that simple garbage collection system is that the garbage collection delays can be hundreds of milliseconds and if you defer them by using a bigger heap the duration of the delay lengthens in proportion.
Compaction of free space could be seen as distinct from garbage collection, but it is convenient to combine them since once the garbage collector has collected all the references to each object the compaction stage can easily adjust those references.
Compaction is important for two reasons:
Complicated garbage collection systems can give much better typical performance than mark and sweep. This is a good thing…mostly. As the collector gets more complicated it gets more difficult to understand its operation and the system architect may be forced to simply trust that it will "do the right thing." This is especially unsettling when application correctness depends heavily on the performance details of the garbage collector.
If the system architect understands the garbage collector, garbage collection can be coded into the application. With a simple collector this is done by scheduling garbage collection at convenient points in time—starting it with the System.gc() method. Provided that:
garbage collection will not impact real-time performance.
One minor addition to garbage collection complexity may be clearly beneficial to the above strategy. A generational garbage collector with a copying collector for the young generation will have relatively frequent fast collections with much longer delays between slow collections. If the fast collections are brief enough that the induced jitter is tolerable, the forced garbage collection schedule can consider only objects that are not collected in the young generation. In most applications this will allow much less frequent scheduled garbage collection.
In theory, letting no-heap threads preempt garbage collection is trivially easy. No-heap threads are not permitted to touch heap references, so there is no reason for them to interact with the garbage collector.
It is not really that easy; for example:
Problem (2) looks bad, but it is really a disguised version of a non-blocking concurrency problem. One approach is to include code in scope exit that lets the garbage collector know when a no-heap thread triggers this problem. The garbage collector can then take measures to handle the problem—such as restarting collection or identifying the heap references that were stored in the scope and weeding them out of the garbage collector structures. The key is that only O(1) time (such as setting a flag) can be spent by the real-time thread.
 This is probably implementable, but the performance impact might be more than it is worth.
 Compaction of free space does not necessarily need to know all the references to each object. With one extra level of indirection the implementation can have only one pointer to the storage area of each object, and it can be made easy to find that pointer from the object storage.