Getting Real-(Time)

Peter Dibble
July 2007

This paper makes two arguments:

The Real-Time Specification for Java (RTSJ) is not the only way to improve real-time performance of Java applications.
The RTSJ, or something similar, is the only way to have a system that supports real-time and legacy Java code on the same JVM.

What’s Real-Time?

Activities are real-time if they have a meaningful deadline. We are surrounded by time-sensitive tasks; it is probably fair to say that almost all activities are real-time to some extent.

Real-time Travel

Daily life is full of real-time problems: getting to work, getting children to school, cooking, music, many aspects of sports, and so on. Here’s an example that illustrates the commonplace nature of real-time analysis: I get from my home to downtown Pittsburgh on a trolley, which runs every half hour. I usually select one of three places to meet the trolley.

The North Library park and ride lot is eight minute's drive from my home and 45 minutes trolley ride from Pittsburgh. It has never been too full for me to find a place to park.
The Lytle park and ride lot is about five minutes drive from my home and about 35 minutes trolley ride from Pittsburgh. It is full about 10 percent of the time.
The King School Road trolley stop is about 20 minutes walk from my home and about 40 trolley ride minutes from Pittsburgh. Since I walk there parking is not an issue.

What does a trip to Pittsburgh have to do with real-time analysis? On days when I place little weight on the day’s deadline for arriving in Pittsburgh, I compute the door-to-door travel time without considering the possible full lot: 53 minutes for Library, 40 minutes for Lytle, and an hour for King School Road. From that viewpoint, Lytle is the clear choice for times when I want to get to Pittsburgh in a hurry.

When there is a real deadline, worst-case travel time is useful, and typical travel time is misleading. Lytle’s tendency to have a full parking lot gives that route undependable “response time.” When I get to Lytle and find that the lot is full I have to drive to another stop; that adds at least a half hour to the trip…maybe an hour. That gives the Lytle route worst-case travel time of 100 minutes, a bad choice.

Computing in Real-Time

Moving to computers, today’s processors can perform billions of operations per second, but that brute strength is not enough for a Windows PC to even guarantee to echo keyboard input or respond to the mouse without a noticeable lag. The problem is not inadequate performance. It’s that the system, from chips up to application software, is designed for maximum throughput instead of predictable performance.

Improving Real-Time Performance

Here are some of the behaviors that cause poor real-time performance for Java applications:

Garbage collection. The JVM can stop everything and collect garbage. Most modern garbage collectors keep almost all garbage collection pauses down to a few milliseconds, but when memory gets tangled enough they need to use a slower type of garbage collection. The pauses generated by that garbage collector can take hundreds of milliseconds, or even several seconds. Most people think of garbage collection as Java’s worst real-time problem.
Failure to respect application priorities. On a priority-based platform (like Java), the application controls which threads execute using locks and priorities. If the system does not respect those priority assignments the system architect has no control and cannot be responsible for timely execution.
Priority inversions. What if a high-priority thread needs a lock that is held by a low-priority thread? It has to wait for the low-priority thread to release the lock. While the low-priority thread holds the lock the high-priority thread will wait. This causes priority inversion.
Uncontrolled load. A system designed to handle 500 events per second without missing deadlines cannot be expected to continue to meet all deadlines if it is given 1000 events per second.

Garbage collection and priority inversion can be seen as special cases of failure to respect priority, but they are big enough issues to deserve their own bullets.

Any problem in the list can cause an application to miss deadlines by a second or more. The possible delay from priority inversion is unbounded. Normal program optimization is not helpful. Even reducing tasks’ computation time to microseconds does not protect them from long delays.

Garbage collection

Of the above problems, garbage collection is probably the easiest to fix.

Brute force Nearly any JVM can run without garbage collection delays until there is no free heap memory. Say your application uses heap at a rate of one megabyte per second. That’s 60 megabytes a minute or about 3.5G per hour. With about 90G of heap the application could run for 24 hours without needing garbage collection. That much garbage could take a while to collect, but if you could give the system enough time to clean up its memory every 24 hours, giving it 90G of heap would be a simple way to prevent unexpected garbage collection delays.

Clever application code Some applications can be structured so they allocate all the objects they will need and then run without further allocation. Other applications can be structured so they include explicit garbage collection pauses frequently enough that all garbage collection is planned.

Clever JVMs Real-time garbage collection has become a convincing reality. Each real-time collector has tradeoffs, and none of them (yet) promise no garbage collection delays, but they reduce worst-case GC delays into the range of tolerable jitter for many real-time applications.

RTSJ The RTSJ treatment of garbage collection is designed to supplement other approaches. It gives the Java platform ways to allocate memory without using heap memory, and it supports no-neap threads that promise to run without garbage collection delays even if they are called on to run during garbage collection.

Priorities

A real-time application needs some way[1] to tell the JVM to concentrate its efforts on threads at risk of missing their deadlines. The operating system's scheduler will try to deliver good results, but without priorities it can only guess at what the application needs. The result is good for non-real-time applications, but it does not support the guarantees that characterize real-time scheduling.

Java takes a step toward better scheduling by offering ten priorities and suggesting that the processor run threads with higher priority in preference to threads with lower priorities. Ten priorities may be too few, but they would be a good start if they were real.

At a casual read, the Java specification seems to require priority scheduling among those ten priorities, but the language in the specification is permissive enough to permit the JVM to ignore threads’ priorities and many (probably almost all) JVMs do just that. Applications that have been tested on a system without support for priority scheduling may have hidden scheduling-related bugs, and consequently, any JVM that enforces priorities is effectively incompatible with other JVMs. Main-stream JVMs have to time-slice even though that makes it difficult to write real-time applications for them.

If you have the code for a JVM, it should be easy to give it meaningful priorities. Most operating systems (probably all major operating systems) support priority scheduling (Posix calls it SCHED_FIFO). Using SCHED_FIFO would also break some legacy applications, but it would greatly improve the JVM as a real-time platform.

Once you have usable priorities, the system architect can begin to engineer the real-time aspects of the system. An excellent real-time platform lets the architect design the system with little provision for interference by the system software. A poor real-time system will face the architect with a thicket of delays, system activities that run at higher priority than the application, and inversions.

The RTSJ does not tighten the requirements for the ten Java priorities (to permit the implementor to preserve compatibility with applications that have not been debugged with enforced priorities). Instead, the RTSJ adds at least 28 new priorities that are specified to be suitable for real-time applications. This lets RTSJ implementations preserve compatibility with legacy applications and support the strict priorities that enable real-time performance.

Inversions

Unbounded priority inversions are among the worst things that can happen to real-time systems. Here’s the scenario:

Start with three threads at high, medium, and low priorities.

We don’t know much about the low-priority thread except that it will lock resource A and do a little work and then unlock A.

The medium-priority thread has a job that will use the CPU steadily for a long time.

The high-priority thread wants to lock A, use a little CPU time, release the lock and meet its deadline.

The low-priority thread starts first. It runs long enough to lock A, and then the medium- and high-priority threads are released.

The system immediately stops running the low priority-thread and starts running the high-priority thread, but the high-priority thread soon tries to lock A and finds that the low-priority thread already has the lock. It has to wait until the low-priority thread releases A.

This is the inversion since the high-priority thread is waiting for the low-priority thread.

If the system had only the high-priority and low-priority threads, this would not be a problem. When the high-priority thread was blocked at the lock, the low-priority thread would be allowed to run again until it released the lock. Unfortunately, we’ve got the medium-priority thread in the picture. It won’t let the low-priority thread run until it has completed its lengthy computation.

Inversion sounds like an esoteric problem. It is not. When you watch the real-time performance of a system without a priority-inversion avoidance mechanism, it will perform nicely until the system is placed under heavy load; then inversions start causing timing glitches of hundreds of milliseconds. Heavy network use or heavy disk cause trouble particularly well.

Simple mutual-exclusion locks are the most common cause for priority inversion. This kind of priority inversion is easily addressed with priority inheritance. When the code that manages locks sees that a high-priority thread wants a lock held by a low-priority thread, it lends the high priority to the low-priority thread until it releases the lock.

Other inversions are harder to fix. For instance, a real-time system must try to keep all queues associated with scheduling sorted by priority; otherwise a high-priority thread may wait for a low-priority thread ahead of it in a queue. You can imagine the impact rigorous application of this principle would have on network queues and disk access.

A JVM can avoid many inversions by simply running on a real-time operating system. It will do much better if it asks the operating system to use priority inheritance on all mutex locks held by the JVM, especially the locks supporting Java synchronized methods and statements. The JVM should also make certain that its scheduling queues, particularly the queues associated with “condition waits” are sorted by priority.

Uncontrolled Load

Some systems can understand their load at design time:

List all the tasks the system must perform:
- When they will be released for execution
- The deadline
- The cost (CPU time)
- The resources (including locks) that they need
Analyze the system to find the priority assignment (if any) that will let the system meet all the deadlines.

Those steps may be tedious, but they lead to a dependable real-time system.

More complicated systems fail in step one. It’s hard to predict the load if the system includes aperiodic input, such as input from a switch or an internet connection. If the aperiodic tasks are unimportant, the system can accommodate them by ignoring their deadlines and giving them low priorities. Unfortunately, aperiodic tasks sometimes serve the most important events in the system. (Keyboard and mouse input are aperiodic, but I think responding to them should be among the most important tasks for a personal computer.)

The standard real-time approach to aperiodic tasks is to make them sporadic. A sporadic task is an aperiodic task with its rate bounded by an interval called the Minimum Interarrival Time or MIT. Using minimum interarrival times, a real-time system can budget for aperiodic tasks and give them a priority that will let them meet their deadlines. In some cases the MIT is a physical limitation of the system, but in the more interesting cases it’s a design parameter like, “this system is designed to handle up to 200 type-7 web inquiries per second, giving that task an MIT of 5 milliseconds.”

What should the system do if a sporadic task fires too often? That’s a policy question. It can shed load by ignoring some of the input, it can defer some of the input (accepting missed deadlines for the sporadic task during the overload), or it can engage an application-specific overload policy like decreasing the MIT and bringing new CPU resources online to handle the added load.

Sporadic scheduling, and sporadic servers that manage the load, are well-understood. They are included in the RTSJ, and can be added to any system that supports a real-time scheduling policy (like priority scheduling.)

So

Garbage collection is an obvious problem, and it can be addressed several ways:

Approach	Cost
Huge heap	Lots of RAM, and periodic intervals to stop responding and collect garbage.
Don’t do dynamic allocation	This is easy for some applications, but impossible for most Java applications
Use a real-time garbage collector	They generally hurt JVM performance. The less impact they have on real-time performance the more they hurt average performance
Use RTSJ	RTSJ helps with the above strategies by supporting non-heap allocation of objects. It also supports non-heap mode, which has no garbage collection delays.

Priorities are another problem. The problem is not so obvious, but the point is that you can’t be certain you will meet your deadlines unless:

the JVM supports a scheduling policy (like priority) that works,
you have only one task, or
you build a scheduler into your application and implement priorities (or something else) there.

Approach	Cost
Make the Java Platform’s 10 priorities work	It will find bugs in legacy code. For practical purposes, it will break the legacy code.
Use RTSJ or some other class library that adds a new sort of thread that can have priority scheduling without breaking legacy code	Need a more sophisticated JVM.

Priority inversions are rare in a lightly-loaded system, but operating system benchmarks show that an OS without a priority inversion avoidance mechanism is likely to have inversions that delay high-priority threads for hundreds of milliseconds. Adding a JVM to the picture makes the problem worse because Java applications tend to use locks heavily.

Approach	Cost
Use priority inheritance through OS, library, and JVM, and avoid other causes of inversion	Replace any component that does not support priority inheritance. It is hard to find all potential inversions. This could be an endless job.
Use priority ceiling emulation protocol through the OS, library and JVM, and avoid other causes of inversion	This paper ignores priority ceiling emulation protocol because its effect is roughly the same as priority inheritance, and it's implementation (while different) has similar issues.

Approach

Cost

Use priority inheritance through OS, library, and JVM, and avoid other causes of inversion

Replace any component that does not support priority inheritance.

It is hard to find all potential inversions. This could be an endless job.

Use priority ceiling emulation protocol through the OS, library and JVM, and avoid other causes of inversion

This paper ignores priority ceiling emulation protocol because its effect is roughly the same as priority inheritance, and it's implementation (while different) has similar issues.

Any system will fail to meet its deadlines if its load exceeds the design parameters. Aperiodic tasks are often important (“There’s a fire!”), and a real-time system needs sporadic scheduling to manage them.

Approach	Cost
Sporadic server	Given a system with fixed priority scheduling, there’s no cost, except to write and integrate one.
RTSJ sporadic scheduling	Need RTSJ

Conclusion

You should take several messages away from this paper:

When missed deadlines are acceptable, an application can improve its real-time performance on a regular JVM by managing garbage collection and using a sporadic server.
The supporting operating system is an important factor in real-time performance.
A JVM needs priority scheduling to support real-time performance, and Java platforms have serious problems with priority scheduling that are hard to fix without something like the RTSJ.

[1] Priority scheduling is common, but other types of scheduling would probably be better for real-time. For instance, real-time theorists think highly of earliest deadline first scheduling. Other than this footnote, this paper ignores all real-time scheduling methods other than priority scheduling. It really should use some more generic term than "priority scheduling", but it doesn't.