No-Heap Safe Classes

(Draft for discussion)
14 Aug 2004

Peter Dibble

Note: 6 October 2004

The no-heap class safety analysis included with this paper is flawed. IÕm part way through a new one.

The original analysis missed two potential causes for no-heap class-unsafety:

1.     It contents itself with proving that all static references from a class must always be to immortal instances, but does not go on to trace all paths of multiple-indirection rooted at those static reference fields to show that heap references can never be reached through them.

2.     It looks at static data but ignores static methods. No static methods in a no-heap safe class should be able to reach a heap value when it is called from no-heap context.

IÕll update the results as soon as I have them. So far IÕve got fairly firm results for chasing multiple indirections. It finds about 100 more no-heap unsafe classes to classes.jar.

Introduction

RTSJ was designed for compatibility with all standard Java classes except those with pathological dependencies on such things as the behavior of the standard RTSJ priority scheduler. Compatibility with no-heap threads (and no-heap async event handlers) is more difficult. Even a well-written class may use design patterns that will throw memory reference errors if those classes are used from no-heap context.

It is difficult to avoid invoking the standard Java class libraries. To start with, every method is running in an instance of an object derived, ultimately, from the standard Object class. Every object initialization includes invocation of the constructor for the base Object. Use of the String class also pervades many programs, and the collections classes are so useful that they are hard to avoid.

This paper treats investigation of existing class libraries somewhat like natural science. It defines a taxonomy, and starts to classify the standard classes based on the taxonomy.

First, a definition to make it easier to refer to code that cannot reference heap objects. The RTSJ supports two classes of schedulable object that can preempt garbage collection, no-heap threads, and async event handlers with the no-heap property. I will use the term no-heap context to refer to these.

The taxonomy

A classÕ degree of no-heap safety might be categorized:

Instance Safe: The only restrictions on sharing instances of the class between heap and no-heap contexts are the normal concurrency considerations. (The instance must be in non-heap memory for this category to be useful.)

Instance reusable: An instance of the class may be safely used by heap and no-heap context provided that they do not use the instance concurrently.

Instance conditionally reusable: An instance of the class may be safely used by heap and no-heap context but the application must be aware that the instance is moving between contexts and follow some protocol defined by the class.

Class Safe: The class may be used concurrently by heap and no-heap contexts

Class reusable: The class may be used by heap and no-heap contexts provided that they do not use the class concurrently.

Class conditionally reusable: The class may be used by heap and no-heap contexts provided that they follow some protocol defined by the protocol.

Instance unsafe: Instances of this class cannot be safely shared by heap and no-heap contexts under any circumstances

Class unsafe: Once this class has been used in heap context it cannot under any circumstances be safely used in a no-heap context.

General observations

Some useful generalizations:

o      A class with no static reference fields is always class safe. If the class has static reference fields its classification requires more study.

o      An instance that includes reference fields is likely to be instance unsafe unless it was specifically coded to be instance reusable or instance safe.

And, note that the above categories are static properties of a class. Dynamic analysis would have similar categories, but the classification of a class might depend on the methods it invokes in other classes. In general, useful dynamic classification can only be made for a class as it is used by an application.

The power of the RTSJÕs no-heap context depends in large part on how many standard Java class libraries turn out to be at least class safe. A developer would not expect an random object to be usable by both heap and no-heap threads, but it is a serious inconvenience when an entire class (and perhaps all its subclasses) has to be reserved for exclusively heap or no-heap (probably heap) use.

There has not been a formal study that I know of, but after using RTSJ for a couple of years my experience has been that classes from the J2ME foundation classes are likely to be class safe. Many of them need to be invoked in a scoped memory because they (like most Java code) create temporary objects, but that is the only special provision that is usually required. Exceptions to this rule surprise the developer who encounters them.

One could complain about class library members that are not class safe, but I prefer to be pleased that that no-heap threads can freely use most standard classes.

A list of safe classes would be useful. Dynamic analysis in the general case is a big project, but simple static analysis shows that 92% (9650 out of 10491) of members of the basic set of Java classes are class safe. A quick spot check of some of the classes the static analysis marks as unsafe suggests that many of those classes may actually be safe.

The Analysis

This paper includes information derived from a static analysis of the J2SE 1.4_02 class libraries (Macintosh version.)

The static analysis labels classes that have non-final static reference fields as possibly not class safe. The rational for this rule is:

o      Only static fields can associate a heap reference with a class, so instance fields are not interesting for this analysis

o      Only reference (in which I include array reference) fields can hold a heap reference.

o      If a field is final, and its value is allocated in the static initializer, it must be an immortal object, so it cannot be suspect.

o      A field that could be final (i.e., it is private and the only assignment to the field is in static initialization, or it is package protected and the same rule applies across the entire package) is treated the same as a field that is declared final.

In spot-checking the classes that this static analysis suspects might be unsafe, I find that the analysis may be a little too careful. For instance, the analysis classifies the Integer, Long, and Character classes as class unsafe, but they are all actually class safe:

o      The Character class has a static array of char that is allocated and initialized at class initialization and is never changed in the Character class[1]. It appears that the array could have been declared final, and this static reference does not actually make the class unsafe.

o      The Integer class has a private static ThreadLocal value that is allocated at class initialization and never changed. It appears that the field could have been declared final. Unless ThreadLocal is unsafe (which it doesnÕt appear to be), this does not make the class unsafe.

o      The Long class also has a private static ThreadLocal value that is used as the ThreadLocal buffer in Integer is used.

Looking at the distribution of potentially unsafe classes as shown in Table 1, it appears that CORBA (a total of 192 potentially unsafe classes) and GUI classes (with 201 classesÉdefining GUI broadly) are particularly likely to use static reference fields.

Table 1 Potentially Class Unsafe Classes

Package

Count

Package

Count

com.sun.corba

70

com.sun.imageio

8

com.sun.java

33

com.sun.jdi

1

com.sun.jndi

6

com.sun.media

23

com.sun.naming

1

com.sun.org

32

com.sun.security

10

com.sun.tools

62

java.awt.

18

java.awt.color

2

java.awt.datatransfer

2

java.awt.dnd

3

java.awt.event

1

java.awt.image

2

java.beans.

6

java.io.

5

java.lang.

10

java.lang.ref

2

java.lang.reflect

2

java.math.

1

java.net.

11

java.nio.

15

java.nio.channels

2

java.nio.charset

1

java.rmi.activation

4

java.rmi.dgc

1

java.rmi.server

4

java.security.

12

java.security.cert

6

java.security.spec

1

java.sql.

1

java.text.

2

java.util.

4

java.util.jar

2

java.util.logging

2

java.util.prefs

5

java.util.regex

2

javax.imageio.

1

javax.imageio.metadata

1

javax.imageio.plugins

1

javax.imageio.spi

2

javax.naming.

1

javax.naming.spi

1

javax.print.

4

javax.rmi.CORBA

1

javax.rmi.

1

javax.security.auth

6

javax.sound.midi

1

javax.sound.sampled

2

javax.swing.

12

javax.swing.border

1

javax.swing.colorchooser

2

javax.swing.filechooser

4

javax.swing.plaf

42

javax.swing.table

1

javax.swing.text

20

org.apache.crimson

1

org.apache.xalan

18

org.apache.xml

3

org.apache.xpath

4

org.ietf.jgss

1

org.omg.CORBA

45

org.omg.CosNaming

19

org.omg.DynamicAny

19

org.omg.IOP

14

org.omg.Messaging

1

org.omg.PortableInterceptor

6

org.omg.PortableServer

17

org.xml.sax

1

sun.applet.

5

sun.audio.

2

sun.awt.

10

sun.awt.color

2

sun.awt.datatransfer

4

sun.awt.dnd

2

sun.awt.font

5

sun.awt.im

4

sun.awt.image

3

sun.awt.print

2

sun.awt.shell

1

sun.io.

1

sun.java2d.

3

sun.java2d.loops

17

sun.java2d.pipe

2

sun.misc.

16

sun.net.

2

sun.net.dns

1

sun.net.ftp

1

sun.net.www

10

sun.nio.ch

3

sun.nio.cs

1

sun.print.

3

sun.reflect.

4

sun.rmi.log

1

sun.rmi.registry

1

sun.rmi.rmic

4

sun.rmi.runtime

3

sun.rmi.server

11

sun.rmi.transport

10

sun.security.jgss

6

sun.security.krb5

11

sun.security.pkcs

3

sun.security.provider

26

sun.security.tools

5

sun.security.util

4

sun.security.validator

3

sun.security.x509

7

sun.text.

1

sun.text.resources

1

sun.tools.jar

1

sun.tools.java

3

sun.tools.javac

1

sun.tools.javap

1

sun.tools.native2ascii

1

sun.tools.serialver

2

sun.tools.tree

1

 

Three pitfalls

1.     At startup, the JVM creates a lot of configuration values. These are a lot of String values. Since they are allocated at startup (not from a real-time thread with immortal as its allocation context) and are not interned, all these values are stored in heap. If a class library appeals retrieves a configuration value from system properties, it is likely to encounter a heap value. This usually happens when the application is allocating resources (e.g., opening files), so it is not generally a problem for no-heap threads.

2.     I/O classes are often hard to share between no-heap context and heap context. They tend to buffer data that might be allocated in heap, and code in no-heap context may inadvertently touch a heap object waiting in the buffer. Well-written I/O classes are probably class-safe, but an instance-safe I/O class would be surprising. (The System.out class in the RI appears to be class safe. This is a happy surprise.)

3.     It is useful to know which classes are class safe under static analysis, but it not enough. In actual use, classes that look unsafe to static analysis may be safe and if safe classes invoke unsafe classes the combination may be unsafe.

Dynamic analysis for class reusable classes

A class with non-final static reference fields may still be no-heap class safe. The analysis above marks them as suspect because it is not able to establish that the initial immortal value is never replaced with a heap value, nor can it establish in general that objects reachable from the static reference never contain heap references.

I donÕt think this property can be proven statically, but it can be enforced dynamically.

Imagine that RTSJ included a sub-class of immortal memory called closed immortal memory[2] that could not contain references to scoped memory, heap memory, or normal immortal memory; that is, the only permissible reference values (static or instance) in closed immortal memory are to other objects in that memory area. Call this property the enforced no-heap class safety property.

This restriction is an inconvenience, but it is not crippling. If the JVM knew which classes were supposed to be no-heap class safe and used closed immortal memory as the allocation context for those classes and their static initializers, then heap could not be reached from those static fields.[3]

HereÕs an informal proof:

1.     Let x be a heap object that is reachable from a static field in an object, y, that has enforced no-heap class safety. Moreover, let it be the heap object that is reachable from y in the fewest steps.

2.     Since y is the nearest heap object to x, the reference to y must be from an object in the special type of immortal memory.

3.     But storing the reference to y in the special immortal memory is illegal.

Looking at it a different way, obviously the static field cannot reference a heap object directly, but canÕt we sneak up on it? Maybe allocate an object containing a heap reference, and then hook it to y? No, the rule that closed immortal can only contain references within itself closes that option.

The next step in no-heap safety?

Can code executed in a no-heap context assume that it can safely use an instance of a class with enforced no-heap class safety? It depends.

How can no-heap code get a reference in the first place?

1.     It can be passed in any number of ways from elsewhere in that no-heap context. That is not interesting. No-heap code cannot contaminate the instance with a heap reference.

2.     It can be found in a static reference value. Those are safe if the objects are no-heap class safe.

3.     It can be found in a scoped objectÕs portal. This could contain a heap reference

4.     It can be found in an object read from a wait-free-queue. This could contain a heap reference.

5.     Heap-context code can store a reference in an object that the no-heap context is using.

Cases 3, 4, and 5 look dangerous, but they are only dangerous if there is an unsafe object involved.

So, if every object used in a no-heap context and directly accessible to heap-using code has the enforced no-heap class safe property, the no-heap code will not be able to reach a heap value.

Take an example that stretches the RTSJ a little. The getPortal method returns a value that behaves as if it were stored in the scoped memory object. If the scoped memory object has the enforced no-heap class safe property then the value returned by getPortal must be to an object allocated in closed immortal memory.[4] Similar arguments apply to cases 4 and 5.

Scoped memories are a problem in general. The enforced no-heap safe property effectively disables scope portals; it also forces all no-heap thread and AEH instances into closed immortal memory (which forces related objects such as parameter objects into closed immortal.) This is a problem, but fortunately it can be remedied by a generalization of closed immortal.

If an object, allocated in closed immortal memory obeys the enforced no-heap safe property[5], it will also obey the general enforced no-heap class safe property. This is a generalization of the property that adds the concept of scoped memories that are closed to heap references and extends the RTSJ assignment rules appropriately; that is, objects in closed immortal and closed scoped memory are not permitted to contain reference to heap objects.

General enforced no-heap class safe property two has two restrictions that could be considered unnecessary.

1.     Objects that have the property cannot reach heap memory even on code paths that are only used in heap context. There is no room in the property for code with ad hoc rules that preserve no-heap safety.

2.     As implemented (in the RTSJ Simulator), enforced no-heap class safety is a property of a class. This is fine for testing purposes, but the argument here has extended the general enforced no-heap safe property into a proxy for instance safety. This is adequate for testing, but for general use the property should be attached to instances, not classes.

Future work

The static analysis marks too many classes as potentially unsafe. Hand checking 841 classes would be tedious. The static analyzer now includes all the easy and obvious patterns for class safe classes, but further inspection of the unsafe classes would probably yield patterns that could be added to the analyzer.

Many of the CORBA classes are marked as potentially not class safe because they declare static references to a String called _id and a TypeCode named __typeCode. If these fields and some other similar fields were (or could be made) class safe the standard J2SE CORBA classes would look more promising for no-heap contexts.

Conclusions

The list of potentially unsafe classes suggests that heap and no-heap context can generally share classes provided that the no-heap code avoids CORBA, and GUI classes.

The classification in this paper is of J2SE classes as they exist. It does not suggest that distributed and GUI classes are inherently not class safe. Perhaps it should be viewed as a challenge. All classes should be class safe.

References

This thinking started with discussions with En-Kuang Lung. He thinks about practical issues with using RTSJ on a massive scale.

The raw output from the static analysis can be found at www.rtsj.org/docs/noheapSafe1/classes_analyzed4.txt. This report includes a list of all the fields that were labeled as pseudo-final.

Oupdated raw output from November 7th can be found in www.rtsj.org/docs/noheapSafe1/nhs_rept_7Nov.txt



[1] The static array of char is package protected. It is not modified (or even referenced) in any obvious class, but to be careful the whole java.lang package should be searched for modifications of java.lang.Character.sharpsMap.

[2] It is convenient to think of closed immortal memory, and closed scoped memory as separate classes, and that would work, but it is an implementation detail. For purposes of the enforced no-heap class safety properties being closed to heap references is actually an attribute of objects, not the memory areas where they are allocated.

[3] The TimeSys RTSJ Simulator can enforce this property.

[4] Actually, this means that no value can be stored in the portal since the portal can only reference objects in that scope and objects in the scope are not in closed immortal memory.

[5] It is sloppy to say that the class has the enforced no-heap class safe property since the property depends on each instance of the class and on all the classes that can reach them. Saying that the class obeys the property isnÕt right either, but maybe itÕs better.