Concurrent Programming Using Java

The material shown here will be covered in much more detail in the forthcoming book, Concurrent Programming: The Java Programming Language, to be published by Oxford University Press in 1998.

Table of Contents

Introduction

Sequential Example Programs

    A Simple Genetic Algorithm
    Algorithm Animation
Threads

Semaphores

Monitors

    Java Monitors
Message Passing
    Synchronization Package Classes
    Example Programs
Rendezvous
    Synchronization Package Classes
    Example Programs
    Conditional Rendezvous
Remote Method Invocation

Parallel Processing

    Synchronization Package Classes
    Example Programs

Introduction

This is an introduction to using the Java programming language in concurrent or multithreaded applications. The context is the process synchronization material and related concurrent programming in operating systems courses as opposed to software engineering. Topics covered are race conditions when threads share data, critical sections, mutual exclusion, semaphores, monitors, message passing, the rendezvous, remote procedure calls, distributed or network programming, and parallel processing. Solutions to the classical problems talked about in operating systems courses (the dining philosophers, the bounded buffer producers and consumers, and the database readers and writers) are shown in Java. Also shown is how to animate algorithms using the command set of the Xtango animation interpreter, animator.

These example programs were developed and tested using Sun Microsystem's JDK version 1.0.2 and 1.1 for Solaris 2.x and Windows 95/NT (1996--97). They have been updated to remove all ``deprecated'' methods and constructors. The multimachine socket examples use the readObject() and writeObject() methods of the ObjectInputStream and ObjectOutPutStream classes, which are part of the RMI (remote method invocation) add-on for JDK 1.0.2 and included with JDK 1.1.

All of the examples described and hyperlinked here may be retrieved by anonymous ftp from site ftp.mcs.drexel.edu (144.118.40.9) in directory pub/shartley and file concProgsJava.tar.gz. A zipped version is also available. Java is designed to be a platform-independent language, so all of these examples, including the animated ones, will run without change on Sun's Solaris 2.x UNIX for Sparc and Microsoft Windows 95/NT for Intel-based PCs.

In this compressed (with GNU's gzip) tar archive, there is a directory lib that contains three subdirectories: Utilities, Synchronization, and XtangoAnimation. The path to the lib directory needs to be put into your CLASSPATH environment variable so that your Java programs can import the classes in the subdirectories of lib. For example, suppose you unpack the archive so that lib is in directory /home/you/Java. Then on a UNIX system, put the line

    setenv CLASSPATH /usr/local/JDK/lib/classes.zip:/home/you/Java/lib:.
    
into your .cshrc or .login file. (Replace /usr/local/JDK/lib/classes.zip with wherever your system administrator has installed the Java system classes.)

On a Windows 95/NT system, put the line

    SET CLASSPATH=C:\JAVA\JDK\LIB\CLASSES.ZIP;D:\LIB;.
    
in your AUTOEXEC.BAT file (assuming you put lib into D: and the JDK into C:\JAVA).

To test your CLASSPATH setting, try these commands.

    java Utilities.GetOpt
    java XtangoAnimation.XtangoAnimator
    
These will execute the main() methods in those classes, which have been included for testing the classes. You should not get a ``class not found'' error message.

Sequential Example Programs

These examples illustrate some of the sequential features of the language. An on-line tutorial is available from Sun Microsystems for a more detailed discussion of the syntax and semantics of the Java language.

These three examples show how to write a single-class program containing one or more static procedures or functions and a main() method where execution of the program begins.

    Compute Factorials.

    Sort Numbers.

    N-Queens Problem.

This example shows how to access a static method defined in another class.

    Two Class Factorial Program.

This example shows how to read numbers and words (strings) from the keyboard (or stdin in UNIX terminology).

    Tokenizing Keyboard Input.

This example shows how to open a file and write binary floating-point numbers (not translated into ASCII) to the file. The numbers are read from the keyboard with a tokenizer as in the previous example. Then the file is read and the numbers printed out. The file is deleted in the finally block whether or not there is an IO error.

    Writing and Reading a Data File.

This next example shows how to read and parse a text file containing some input data in a specific format.

    Tokenizing File Input.

    The Input File.

This example shows how to create objects from user-defined classes.

    Neighborhood Banking.

The GetOpt class can be used to process command line arguments when a Java program is run, like the following

    javac aClass.java
    java aClass -a -f theFile -w -80 -h3.33 arg1 arg2
    

    Class GetOpt Helps Parse Command Line Options.

This last example shows how to subclass the Bank and Account example.

    Commercial Banking.

A Simple Genetic Algorithm

A genetic algorithm is an optimization technique that uses randomization instead of a deterministic search strategy. To maximize f(x) over some domain, the values of x in the domain are encoded in strings over some alphabet, usually {0,1}. A population of such strings, called chromosomes, is created at random. Then the genetic operators selection for reproduction, crossover, and mutation are applied to the population members. Selection is based on the fitness value of the chromosome, usually f(x). After this is repeated many times, the population will contain mostly highly fit chromosomes, one of which (it is hoped) is close to the maximum value of the function f(x) over the domain.

This example shows how to split a program up into packages. The Chromosome class is defined as a hierarchy of subclasses for each gene data type.

    README file.

    sGA.java

    Debug.java

    Defaults.java

    Globals.java

    MyRandom.java

    ShellSort.java

    Chromosome.java

    BitChromosome.java

    DoubleChromosome.java

    BitCountChromosome.java

    ByteCountChromosome.java

    SinesChromosome.java

    XtoTenthChromosome.java

    MyChromosome.java

    Crossover.java

    OnePointCrossover.java

    UniformCrossover.java

    NPointCrossover.java

    SelectionException.java

    FitnessSumZeroException.java

    Selection.java

    ProportionalSelection.java

Algorithm Animation

The XtangoAnimator class in the XtangoAnimation directory of the lib directory contains a collection of methods that implement the command set of the Xtango algorithm animation interpreter, animator. A Java program can create an instance of XtangoAnimator and call its methods to create graphical icons of various colors in a window and then move them around.

The XtangoAnimator class is designed to be used in a stand-alone application for algorithm animation. In contrast, the Xtango animator program reads commands from a text file or UNIX pipe and interprets them. A group at Duke University has implemented a Samba interpreter, called lambada. Samba is a superset of Xtango. The lambada interpreter also reads a text stream of commands from a file or pipe. The XtangoAnimator class defines methods that correspond to the Xtango animator commands and that can be called in a Java program.

Here are two example programs.

    ``Hello, world!'' using XtangoAnimator.

    Animated Bubble Sorting.

A sorting screen snapshot:

Laboratory Exercises

  1. Write a Java program that generates prime numbers. Try both alternatives for getting input data into the program: command line arguments and keyboard input. Write the program first as one class, then as several classes.
  2. Animate the N-Queens program using the XtangoAnimator class.
  3. Look up the ``Game of Life'' in Scientific American and write a Java program to simulate it. Is the issue available on the Web? Animate your program.

Threads

A process is an executing program. It has been allocated memory by the operating system. A thread is an execution or flow of control in the address space of a process; the program counter register points to the next instruction to be executed. A process is a program with one thread. A process can have more than one thread. All the threads in a process have their own program counter and their own stack for local (also called automatic) variables and return addresses of invoked procedures.

In Java, a thread in the run-time interpreter calls the main() method of the class on the java command line. Each object created can have one or more threads, all sharing access to the data fields of the object.

The article An Introduction to Programming with Threads by Andrew D. Birrell (1989) motivates concurrent programming with threads:

The MyObject class contains some methods useful for multithreaded simulations:

    nap(milliseconds),
    age(),
    random(), and
    seed(number).
A program can sleep for some number of milliseconds. The age() method returns, in milliseconds, the elapsed time since the program started. The random number generator can be seeded and returns a double in the range zero to one. It also contains some ``syntactic sugar'' for use with semaphores, described later.

    Utility methods in Class MyObject.

This is a skeleton (model or template) for a multithreaded simulation.

    A Multiple Thread Simulation.

In JDK 1.0.2 and 1.1 for Solaris 2.x, Java threads are not time sliced, as the following test program, Beeping, shows. Java threads are time sliced on Windows 95/NT, though. The first two sample runs show the difference. Use a Scheduler object to get pseudo time slicing of threads on Sun's Solaris 2.x. The third sample run of Beeping shows it working on Solaris.

    A Class for Time Slicing Threads.

    Testing for Time Sliced Threads.

This program creates two windows, each a subclass of Frame, with some buttons to control the thread in each window that computes Fibonacci numbers. There is another thread in the Java virtual machine that is blocked, waiting for events like button clicks in the windows. When a button is clicked, the action() method is called by this thread to process the click. Meanwhile, the other two threads can be computing concurrently and displaying the Fibonacci numbers in their windows. Try coding this as a purely sequential program with only one thread or flow of control!

    Fibonacci Numbers and Stop Button.

A screen snapshot:

This example shows how to send the output of each thread to a different file. This technique may be useful for debugging.

    Sending Each Thread Output to a Different File.

Two threads doing N=N+1 at about the same time is called a race condition since one of the updates can get lost. In general, race conditions are possible when

Three different kinds of race conditions are illustrated with Java programs.

Multiple threads doing sum=fn(sum,m) to the shared variable sum is a race condition like N=N+1. Note that one Racer object is created; then two threads are created in that object and the two threads share the sum variable. The Windows 95 sample runs show that if M is large enough, the threads run in pseudo parallel due to time slicing; the Solaris sample runs show there is no time slicing.

    A Race Condition.

There is a fixed $10 million in this bank (10,000 accounts that each start out with $1000) but the auditor sometimes sees more, sometimes less. If the bank auditor is adding up the account balances while some funds are being moved from one account to another, an inaccurate total can be calculated. Can you explain how more money than is really there shows up sometimes in the sample output?

    A Different Kind of Race Condition.

If two threads try to manipulate a queue at the same time, a node can get lost. Remember that the two threads may be executing in round-robin fashion on a shared CPU and that context switches can occur at any time.

    A Queue Race Condition.

The following sequence of queue snapshots shows node 4 not in the queue even though thread A appended it.

These examples show that concurrently executing threads that share data need to synchronize their operations and processing in order to avoid race conditions on shared data. Thread synchronization can be done with flag variables and busy waiting, as this example shows. Since it uses a lot of CPU cycles, busy waiting is inefficient. Blocking somehow would be better.

    Busy Waiting Bounded Buffer for a Producer and Consumer.

    Producer and Consumer Driver.

We can try to remove busy waiting from the bounded buffer by using the thead suspend() and resume() methods, but we end up introducing race conditions. It is possible, if a context switch occurs at just the ``right'' time, for both the producer and the consumer to become suspended: the producer because the bounded buffer is full and the consumer waiting for a resume() that will never happen. Do you see how?

    Suspend/Resume Bounded Buffer for a Producer and Consumer.

A critical section is a block of code in a thread that accesses one or more shared variables in a read-update-write fashion. In such a situation we want mutual exclusion: only one thread at a time can access (read-update-write) a shared variable at a time. The mutual exclusion problem is how to keep two or more threads from being in their critical sections at the same time, where we make no assumptions about the number of CPUs or their relative speeds A thread outside its critical section should not keep other threads outside their critical sections from entering, also called a ``safety'' property (absence of unnecessary delay). And no thread should have to wait forever to enter its critical section, also called a ``liveness'' property (eventual entry).

An atomic action ``makes an indivisible state transition: any intermediate state that might exist in the implementation of the action must not be visible to other threads'' (p. 60 Andrews' Concurrent Programming book). This means that nothing from another thread can be interleaved in the implementation of the action for it to be atomic. Critical sections need to be done as if they were one atomic action to avoid race conditions.

The mutual exclusion problem is to devise a pre-protocol and a post-protocol based on either hardware or software

The ground rules are

Thread Ti, i = 1, 2, 3, ...

    while (true) {
       outsideCS();
       wantToEnterCS(i);   // pre-protocol
       insideCS();
       finishedInCS(i);    // post-protocol
    }
    

This sequence of examples shows successful and unsuccessful attempts to solve the mutual exclusion problem in software without specialized hardware instructions like test-and-set.

    Testing the Attempts.

    First Attempt: Strict Alternation.

    Second Attempt: Check Other's Flag Variable, Then Set Own.

    Third Attempt: Set Own Flag Variable, Then Check Other's.

    Fourth Attempt: Back Off.

    Dekker's Solution: Take Turns Backing Off.

    Peterson's Shorter Solution.

    Lamport's Bakery Algorithm: Two Threads Only.

    Lamport's Bakery Algorithm: Arbitrary Number of Threads.

Laboratory Exercises

  1. Use the ``beep'' program to determine if your platform time slices threads.
  2. Try some other values for the -M command line argument of the ``race'' program to determine the approximate time slice size on your system.
  3. What can go wrong (race condition) in the busy waiting bounded buffer if there is more than one producer thread and/or more than one consumer thread?

Semaphores

Semaphores can be used for mutual exclusion and thread synchronization. Instead of busy waiting and wasting CPU cycles, a thread can block on a semaphore (the operating system removes the thread from the CPU scheduling or ``ready'' queue) if it must wait to enter its critical section or if the resource it wants is not available.

Mutual exclusion pseudocode:
semaphore S = 1; ... P(S); N=N+1; V(S);

Condition synchronization pseudocode (resource availability):
semaphore tapeDrives = 7; ... P(tapeDrives); useTapeDrive(); V(tapeDrives);

Java has implicit binary semaphores of the form

    Object mutex = new Object();
       /*...*/
    synchronized (mutex) {
       /*...*/
    }
    
that can be used for mutual exclusion. Only one thread at a time can be executing inside the synchronized block.

    An Implicit Binary Semaphore Prevents the Race Condition.

    An Implicit Binary Semaphore Prevents Another Race Condition.

      Why is the semaphore mutex also used in the Auditor thread? Isn't it enough just to make the moving of money from one account to another in the ATM thread into a single atomic action?

Java does not have explicit binary and counting semaphores, so they are provided as classes in the Synchronization subdirectory of the lib directory. Their implementation will be shown later. Two explicit binary semaphores, one initialized to zero (impossible with an implicit binary semaphore), and one explicit counting semaphore are used to synchronized three threads so they obey the ``rules.'' The ``syntactic sugar'' in MyObject.java lets us write P(S) instead of S.P() for a semaphore S.

    Explicit Binary and Counting Semaphores Synchronize Three Threads.

A producer thread deposits items and blocks if the bounded buffer fills up. A consumer thread fetches items and blocks if the bounded buffer is empty.

The implementation shown uses a circular array and can be used only with a single producer thread and a single consumer thread. Do you see why?

    The Bounded Buffer for a Producer and Consumer.

    The Producer and Consumer Driver.

A linked list can be used to implement a first-in-first-out buffer that is not bounded and can be used by multiple producer threads and multiple consumer threads.

    The Unbounded Buffer for the Producers and Consumers.

    The Producers and Consumers Driver.

Bounded buffers can be used to communicate information from one thread to another in a pipeline.

    A Pipeline Using Bounded Buffers.

Semaphores can be used to solve the so-called ``classical'' synchronization problems found in many operating systems books: the sleeping barber, the five dining philosophers, and the database readers and writers.

A barber waits to cut hair. Customers enter the waiting room and take a seat if one is available. If the waiting room is full, they try again later. Otherwise, they wait until their turn for a hair cut.

    The Sleeping Barber.

Five philosophers sit around a table and think until hungry. Interspersed between the philosophers are five forks. A hungry philosopher must have exclusive access to both its left and right forks in order to eat. If they are not both free, the philosopher waits. The following algorithm does not deadlock (it never happens that all philosophers are hungry, each holding one fork and waiting for the other), allows maximal parallelism (a philosopher never picks up and holds a fork while waiting for the other fork to become available when the fork it is holding could be used for eating by its neighbor), an advantage, but also allows starvation (a philosopher's two neighbors can collaborate and alternate their eating so the one in the middle never can use the forks).

    The Server for the Dining Philosophers.

    The Dining Philosophers Driver.

If a philosopher can hold a fork while waiting for the other fork, deadlock is possible, an extreme case of not having maximal parallelism. However, starvation is not possible. Each fork is represented by a semaphore and each hungry philosopher will do a ``P'' on its left fork and then its right fork.

    The Dining Philosophers Server Where Each Fork is a Semaphore.

We can fix the deadlock problem and retain no starvation, but we still do not have maximal parallelism. All philosophers pick up left then right except one designated philosopher who picks up right then left.

    The Dining Philosophers Server Where One is ``Odd''.

A database can be accessed concurrently by threads that only want to read, but a writer thread must have exclusive access with respect to other readers and writers. The solution here allows writers to starve if enough readers keep coming along to read the database that the number of current readers is always above zero.

    The Database for the Readers and Writers.

    The Readers and Writers Driver.

This sequence of attempts to implement counting semaphores with binary semaphores illustrates some of the subtleties of pure binary semaphores and thread scheduling. The first two attempts have a problem if a context switch occurs at point A since then V's on the binary semaphore blocked may get lost. The third solves that problem but introduces more context switching overhead. The last two are good solutions, with the latter being somewhat simpler.

    Attempt 1.

    Attempt 2.

    Attempt 3.

    Attempt 4.

    Attempt 5.

Jurassic Park consists of a safari area, a number of single-passenger safari cars, some number of people, and a museum. Each person in the park will visit the museum for a random amount of time, then line up to take a safari ride, waiting for an empty car. Each car will wait for a passenger, then go out on safari for a random amount of time. The following solution has a major flaw. What is it? How can the flaw be fixed?

    Jurassic Park.

The XtangoAnimator class in the XtangoAnimation subdirectory of the lib directory can be used to animate the dining philosophers.

    Animated Dining Philosophers.

    A screen snapshot of the dining philosophers:

Laboratory Exercises

  1. Examine the bounded buffer for race conditions if there are multiple producer and/or consumer threads. Fix any race conditions you find.
  2. Look in some operating systems books for semaphore exercises and write the solutions in Java. Examples are classical problems like the cigarette smokers, the baboons crossing a canyon, the unisex bathroom, and the bakery.
  3. Modify the sleeping barber program so there are multiple barbers.
  4. Modify the dining philosophers server so that two philosophers cannot collaborate to starve the one between them.
  5. Modify the readers and writers database so that readers cannot starve writers.
  6. Animate one or more of these programs.

Monitors

Semaphores are like goto's and pointers: mistake prone, work okay but lack structure and ``discipline''.

For example, a disastrous typo:

    V(S); criticalSection(); V(S)

This leads to deadlock:

    P(S); criticalSection(); P(S)

Nested critical sections can lead to deadlock:

    P1: P(Q); P(S); ... V(S); V(Q);

    P2: P(S); P(Q); ... V(Q); V(S);

A monitor is an object with some built-in mutual exclusion and thread synchronization capabilities. They are an integral part of the programming language so the compiler can generate the correct code to implement the monitor. Only one thread can be active at a time in the monitor, where ``active'' means executing a method of the monitor. Monitors also have condition variables, on which a thread can wait if conditions are not right for it to continue executing in the monitor. Some other thread can then get in the monitor and perhaps change the state of the monitor. If conditions are now right, that thread can signal a waiting thread, moving the latter to the ready queue to get back into the monitor when it becomes free.

Monitors can use either signal-and-exit or signal-and-continue signaling discipline. In the former, a signaling thread must leave the monitor immediately, at which point it is guaranteed that the signaled thread is the next one in the monitor. In the latter, the signaled thread is not guaranteed to be the next one in the monitor. In fact, barging can take place: some thread that has called a monitor method and is blocked until the monitor is free can get into the monitor before a signaled thread.

Here are monitors for the three ``classical'' problems using the signal-and-exit signaling discipline and condition variables. They are written in a Java-like but not Java pseudocode. The synchronized attribute in a method definition means that only one thread at a time can be active in any synchronized method.

    Bounded Buffer Pseudocode Monitor.

Philosopher starvation is prevented by introducing a new state: very hungry. A philosopher is put into this state if it is hungry, if one of its neighbors puts down its forks, and if it cannot eat because the other fork is in use. A new rule is added: a hungry philosopher cannot eat if it has a very hungry neighbor. These changes will prevent a collaboration of two philosophers trying to starve the philosopher between them. Notice that signal-and-exit requires leaving and reentering the monitor to generate more than one signal.

    Starvation-Free Dining Philosophers Pseudocode Monitor.

Writer starvation is prevented by requiring readers that come along to read the database to wait if there is a waiting writer even if other readers are currently reading the database. When the current readers finish, the waiting writer writes the database and then signals into the database a waiting reader. Each entering reader signals another waiting reader into the database.

    Starvation-Free Readers and Writers Pseudocode Monitor.

Java Monitors

Java uses the synchronized keyword to indicate that only one thread at a time can be executing in this or any other synchronized method of the object representing the monitor. A thread can call wait() to block and leave the monitor until a notify() or notifyAll() places the thread back in the ready queue to resume execution inside the monitor when scheduled. A thread that has been sent a signal is not guaranteed to be the next thread executing inside the monitor compared to one that is blocked on a call to one of the monitor's synchronized methods. Also, it is not guaranteed that the thread that has been waiting the longest will be the one woken up with a notify(); an arbitrary thead is chosen by the JVM. Finally, when a notifyAll() is called to move all waiting threads back into the ready queue, the first thread to get back into the monitor is not necessarily the one that has been waiting the longest.

Each Java monitor has a single nameless anonymous condition variable on which a thread can wait() or signal one waiting thread with notify() or signal all waiting threads with notifyAll(). This nameless condition variable corresponds to a lock on the object that must be obtained whenever a thread calls a synchronized method in the object. Only inside a synchronized method may wait(), notify(), and notifyAll() be called.

Methods that are static can also be synchronized. There is a lock associated with the class that must be obtained when a static synchronized method is called.

Usually all the publicly accessible methods, the service or access methods, will be synchronized. But a Java monitor may be designed with some methods synchronized and some not. The non-synchronized methods may form the public access and call the synchronized methods, which will be private.

An experiment was performed to determine if Java monitors are signal-and-exit or signal-and-continue. They use signal-and-continue. When a thread executes a notify(), Java does not necessarily move to the ready queue the thread that has been waiting the longest. Also, Java allows barging.

    Java Monitors Use Signal-and-Continue.

Here are Java monitors for the three ``classical'' problems. Two important things to be aware of because of signal-and-continue, the lack of named condition variables, and barging. Most of the time it will be necessary to use a while loop instead of an if when doing a wait().

    while (condition) try {wait();} catch (InterruptedException e) {}
    
It is possible that some other thread barged in and got the resource or whatever, requiring a recheck of the waiting condition. Most of the time it will be necessary to use notifyAll() instead of notify() in order to awaken all waiting threads and let them recheck their waiting condition. It is not possible to direct a signal to the particular thread for whom the resource is now available or whatever.

The bounded buffer monitor can only be used by a single producer thread and a single consumer thread. The ``driver'' code is the same as that for the semaphore single-producer single-consumer bounded buffer. What could go wrong if more than one thread of each type used this monitor? How would you fix the monitor?

    The Bounded Buffer Monitor.

    The Producer and Consumer Driver.

Notice how inefficient the dining philosophers monitor is because a broadcast signal with notifyAll() must be sent whenever any philosopher puts down its forks due to Java's lack of named condition variables.

    A Starvation-Free Dining Philosophers Monitor.

    The Dining Philosophers Driver.

Since there are no named condition variables, another technique must be used to prevent starvation in the database readers and writers. The arrival times of readers forced to wait because of a waiting writer is maintained. When the waiting writer enters and then exits the database, all waiting readers that arrived before the time the writer just exiting finished writing are allowed to read the database.

    A Starvation-Free Readers and Writers Monitor.

    The Readers and Writers Driver.

As mentioned, Java does not have semaphores. Here is how they are implemented in the Synchronization package in the lib directory.

    Base Semaphore Class.

    An Exception.

    Binary Semaphore Monitor.

    Counting Semaphore Monitor.

A binary semaphore can be implemented in other ways than the above, for example. Compare and contrast the two implementations. Which do you prefer?

A lock acts like a binary semaphore except only the locking thread can unlock the lock.

    Lock Example.

It is possible to use an object somewhat like a condition variable in a Java monitor. We can pull the code for the elements and spaces semaphores of the bounded buffer semaphore version into the bounded buffer implementation. The resulting bounded buffer can be used with multiple producer and multiple consumer threads.

    Multiple Producer and Consumer Bounded Buffer.

    The Producers and Consumers Driver.

A callback technique can be used to avoid waking up all the philosopher threads with a notifyAll(). An array of callback objects, convey, is used, one for each philosopher. If the forks are not available when the philosopher gets hungry, it waits inside its callback object for a notify().

    Dining Philosophers Callback.

    The Dining Philosophers Driver.

The following implements a starvation-free synchronization algorithm for the readers and writers with a callback object for each thread to wait inside until it can access the database.

    Readers and Writers Callback.

    The Readers and Writers Driver.

In contrast to named condition variables, it is not possible with this callback technique to wait in the middle of a monitor service method for a signal and then continue executing inside the monitor service method at that point after receiving the signal. The signaled thread has to reenter the monitor via a service method.

A skeleton class for implementing named condition variables (exercise for the reader):

    Named Condition Variable.

Here is the code for the XtangoAnimator Class.

Laboratory Exercises

  1. Write a Java monitor for the baboons crossing a canyon (unisex bathroom) that is fair, that is, prevents starvation.
  2. Write a Java monitor for the single sleeping barber. Then modify it for multiple sleeping barbers.
  3. Enhance the XtangoAnimator class so that there are commands for rotation, GIF image icons, grouping icons, lines with arrow heads, hiding, and unhiding icons.
  4. Find all remaining race conditions, if any, in the XtangoAnimator class.

Message Passing

Sometimes the phrase ``send a message to an object'' is used to describe a thread in one object calling a method in another object. Here, that phrase will be used to describe a thread in one object sending a message to a thread in another object, where the message is itself an object.

This technique is used for thread communication and synchronization in a computing environment where the threads do not have shared memory (since the threads reside in different virtual or physical machines). Hence the threads cannot share semaphores or monitors and cannot use shared variables to communicate. Message passing can still be used, of course, in a shared memory platform.

Messages are sent through a port or channel with an operation like send(port, message) and received from a port or channel with an operation like receive(port, message). Messages can be passed synchronously, meaning the sender blocks until the received does a receive and the receiver blocks until the sender does a send. Since the sender and receiver are at specific known points in their code at a known specific instant of time, synchronous message passing is also called a simple rendezvous with a one-way flow of information from the sender to the receiver.

In asynchronous message passing, the sender does not block. If there is not a receiver waiting to receive the message, the message is queued or buffered. The receiver still blocks if there is no queued or buffered message when a receive is executed.

In conditional message passing, the message remains queued until some condition, specified by the receiver, becomes true. At that time, the message is passed to the receiver, unblocking it.

A two-way flow of information, perhaps over the network, is called an extended rendezvous and can be implemented with a pair of sends and receives. Typically a client thread will use this technique to communicate with a server thread and request a service to be performed on its behalf. A similar situation is a worker thread contacting a master thread, asking for more work to do.

    client or worker: send request; receive reply

    server or master: receive request; perform service; send reply

Messages are objects and can be

    passed by reference within the same Java Virtual Machine,

    or serialized through a pipe within the same JVM,

    or serialized through a socket between JVMs that are on the same physical machine or on different physical machines.

The base data types, int, double, etc., can be sent as messages in binary or raw data format through a pipe or socket using the DataInputStream and DataOutputStream methods. They can also be sent as objects using the wrapper classes Integer, Double, etc.

Synchronization Package Classes

Here is a collection of Java message passing classes. All of the message passing port classes implement the methods in the MessagePassing interface or the ConditionalMessagePassing interface. This exception is thrown when an error occurs. This exception is used in implementing restricted rights ports (below). All classes except the conditional ones extend this base class.

    Synchronous Port.

    Asynchronous Port. A Vector is used to queue sent but not yet received messages.

    Asynchronous Conditional Port. The receiver must pass an object that implements the Condition interface, that is the object must contain a checkCondition() method that is used to determine which messages sent are eligible to be received.

    Synchronous Conditional Port.

    Finite Buffer Asynchronous Port.

    Receive-Only Rights Port. Send-Only Rights Port. These two filter classes can be wrapped around a message passing port to permit only sending or receiving on the port. This is done by overriding the restricted method with one that throws NotImplementedMethodException.

    Integers and Floating-Point Numbers as Messages in a Pipe or Socket Port. The numbers are passed as binary or raw data types through a pipe within the same JVM or a socket between different JVMs.

    Serialized Objects as Messages in a Pipe or Socket Port. The objects are serialized and deserialized using the writeObject() and readObject() methods through a pipe within the same JVM or a socket between different JVMs.

Example Programs

This is a simple example illustrating both synchronous and asynchronous message passing.

We can implement the bounded buffer producer and consumer with a set of empty messages representing the buffer slots.

    Message Set Forms a Bounded Buffer.

This is a testing program for asynchronous, synchronous, finite buffer, and piped message passing within the same JVM. There are two types of threads in this collection of threads: those that produce work and those that perform or consume the produced work. A producer puts the work to be done into a message passing port that is called a bag of tasks because consumers reach into the bag to extract the next piece of work to do. If there is just a single thread producing work with many threads reaching into the bag, then this technique is called master/worker or worker crew.

    ``Bag of Tasks'' for Worker Crew.

Threads can use semaphores and monitors to handle mutual exclusion and condition synchronization. Suppose though the threads do not share memory but are in nodes that have private memories and CPUs on a LAN and suppose they still want to do condition synchronization to coordinate access to some shared resource. If all we have is message passing, can we implement some sort of distributed mutual exclusion algorithm? Suppose we also want to avoid a central server to avoid a bottleneck. We want to solve the N node mutual exclusion problem such that it

Assumptions: In other words, nodes eventually respond to all request messages.

Basic Idea:

    while (true) {
      outsideCS();
      chooseNumber();
      sendItToAllOtherNodes();
      waitForMessageFromAllOtherNodes();
      insideCS();
      postProtocol();
    }
    
Each of the N nodes is really three threads executing concurrently (the three threads are executing on the CPU/memory of the node).
  1. One does the above while loop.
  2. Another handles requests from other nodes.
  3. Another one waits for replies from all other nodes.

A node will send a ``reply'' or acknowledgement message to a node that has sent a request message, i.e., when ``asked'':

In the absence of shared memory, a collection of Java threads can use this technique to implement mutual exclusion. The threads send messages to all other threads, asking for permission to enter their critical sections. The threads are all in the same JVM, but we say memory is not shared here because the threads do not share any variables, semaphores, or monitors.

    Distributed Mutual Exclusion.

The quick sort algorithm can be parallelized for a shared memory multiple CPU machine by dedicating each CPU to a worker thread and using a message passing port as a bag of tasks. The main() method puts the whole array to be sorted into the bag. A worker extracts the task, chooses a pivot point, and partitions the array. Each of the two partitions is then put back into the bag for one of the workers to perform. Even though message passing is being used for a bag of tasks, shared memory is still required because the array is being sorted ``in place'' and the work requests being put into the bag are array index pairs and not pieces of the array itself.

    bag of tasks communication channel:
      AsyncMessagePassing task = new AsyncMessagePassing();
      

    quicksort threads get work:

      while (true) {
         m = (Task) receive(task);
         quickSort(id, m.left, m.right);
      }
      

    quicksort threads create work:

      if (right-(l+1) > 0) send(task, new Task(l+1, right));
      if ((l-1)-left > 0) send(task, new Task(left, l-1));
      

    Animated Worker Crew Quick Sort.

These consumers are picky and will only conditionally accept messages that are smaller than some limit. This program tests both synchronous and asynchronous conditional message passing.

    Conditional Consumers.

The distributed dining philosophers do not have a central server they can query for fork availability. Instead each philosopher has a servant who communicates with the two neighboring servants to negotiate the use of the forks. The servants pass needL, needR, passL, and passR messages back and forth. Each fork is always in the possession of some philosopher, one of the two on either side of the fork. When a philosopher finishes eating, it labels its two forks as dirty. A hungry philosopher's servant is required to give up a dirty fork in its possession, if asked for by its hungry neighbor's servant. This prevents starvation. Study carefully how conditional message passing is used. Does it matter if synchronous were used instead?

    Distributed Dining Philosophers.

    Here is a screen snapshot of an animated version.

Rendezvous

An extended rendezvous is also called a remote procedure call from a client to a server (or a worker to the master) because it resembles (and syntactic sugar can make it nearly identical to) a call to a procedure on a remote machine that is executed there. Typically the call represents a request for service, such as reading a file that resides on the remote machine. The server may handle the request in its main thread or the server may spawn a new thread to handle the request while the server's main thread handles additional requests for service from other clients. The latter gives greater throughput and efficiency since a lengthy request would otherwise delay the handling of requests from the other clients.

An addressing mechanism is needed so the client can contact an appropriate server. In the local case (everything in the same JVM), an object can be used as the place for the client and server to ``meet'' and establish a rendezvous. The server calls a method in the object and blocks until the client calls a method. At this point in time, both methods return a newly created object that the client and server subsequently use for the two-way flow of information. This object contains a message passing port shared by them. In the remote case, the client uses the server's machine name and a TCP/IP port number to address the server; the server ''listens'' on the TCP/IP port.

Synchronization Package Classes

The extended rendezvous class implements this interface.

An object created from this class is used for the addressing described above. In the local case, one such object is used by both client and server. In the remote case, a client creates such an object using the server's machine name and port number in the object's constructor; the server uses just the port number.

When the rendezvous occurs, and object constructed from this class is returned to both the client and server. In the local case (within the same JVM), the client and server share this object and use it to transact (synchronous message passing of object references). In the remote case (between JVMs that might be on different physical machines), each gets its own object and the object contains a socket to the other JVM (and machine). Objects are serialized through the socket. The case of sending raw data types through a pipe (same JVM) or a socket (different JVMs) is not implemented and is an exercise for the reader.

Example Programs

This is a local case example. A command line option controls whether or not the server spawns off a new thread to handle the request. The clients and server all share a EstablishRendezvous object for addressing. Each time a client wants to rendezvous with the server, it calls the clientToServer() method to get an ExtendedRendezvous object whose clientMakeRequestAwaitReply() method is used to transact with the server. The client passes a reference to a RendezvousRequestReply object to the server. The object contains the data and a method for the server to call. The ExtendedRendezvous object is only used once by the client; however, it could be reused for multiple clientMakeRequestAwaitReply() calls as is done in the next example.

    Client-Server Rendezvous.

This is a remote case example. Suppose there are workstations named client0, client1, client2, client3, and client4 connected together on a local area network, along with a machine named server. The example compile and run shows for UNIX how to run each philosopher is in its own JVM on a different physical machine. Each philosopher sends an Integer object containing its ID value to the server when it is hungry. Since this is a rendezvous, the philosopher is blocked until it gets a reply indicating that its forks are available. The server spawns a new thread for each philosopher to handle the transactions. Each philosopher sends an Integer object containing its -ID-1 value when putting its forks down. Each philosopher has its own ExtendedRendezvous object whose clientMakeRequestAwaitReply() it calls over and over again (in contrast to the previous example, in which the clients obtained a new ExtendedRendezvous object for each transaction with the server).

    Multi-Machine Dining Philosophers.

This program attempts to measure the amount of time it takes to transact a rendezvous. A client sends a message containing an array of length N to the server. The server adds one to each entry of the array and sends it back. The client does this M times and calculates the number of bytes sent per millisecond. The program can be run in two ways. The local run passes the message as a reference from the client to the server within the same JVM. The remote run serializes the message containing the array through a socket over the network to the server running in a different JVM, possibly on another physical machine.

    Measure Rendezvous Time.

Conditional Rendezvous

Synchronous conditional message passing corresponds to a conditional simple rendezvous, that is, once the condition is met there is a one-way flow of information in the form of a message object from the client to the server. This class implements an extended guarded or conditional rendezvous in which there is a two-way flow of information once a server finds a client message meeting the condition. This class is a combination of EstablishRendezvous and ExtendedRendezvous with the condition added. The client calls the clientTransactServer() method of the ConditionalRendezvous object, while the server first calls serverGetClient() to get a client meeting the condition and then the server uses the Rendezvous object returned to interact with the client (serverGetRequest() and serverMakeReply()). The condition class has been enhanced with more information passed to the checkCondition() method so the condition check can be based on information gathered about all outstanding messages.

An example using a condition that depends only on the message passed and not on other messages is the dining philosophers classical problem. As an exercise, add code to prevent starvation. As another exercise, implement the bounded buffer multiple producers and consumers with a conditional rendezvous. Then implement the starvation-free database readers and writers with a conditional rendezvous.

    Dining Philosophers Conditional Rendezvous.

    The Dining Philosophers Driver.

Multiple calls by servers to the serverGetClient() method can be outstanding, that is blocked, waiting for a client message meeting the condition. A server can call serverGetClient() and before completing the rendezvous with that client, the server can call it again, so a server can be handling several rendezvous with different clients simultaneously. This is illustrated with the following banking example. If starvationFree is false, a large withdrawal can sit in the queue, while later smaller withdrawals succeed. In contrast, if starvationFree is true, then the server (bank thread) will accept only deposits and no withdrawals until the bank balance is large enough to accept the waiting large withdrawal. The -a option sets the requirement that all deposits are processed before any withdrawals. The condition checking now depends not only on the message passed but also on the other messages in the queue. The -B option adds additional banks, which is not realistic, but was done to test simultaneously executing servers.

    Bank Deposits and Withdrawals.

Remote Method Invocation

Sun Microsystems has added a remote method invocation capability to Java, the ability to make remote procedure calls. The latter term we used above to describe an extended rendezvous between two threads in different JVMs, perhaps on different physical machines. Sun's RMI allows a thread in one JVM to invoke (call) a method in an object in another JVM that is perhaps on a different physical machine. A new thread is created in the other (remote) JVM to execute the called method. The following example shows how to use it. The ComputeServer remote object implements a Compute interface containing a compute() method that a local Client can call, passing a Work object whose doWork() method the server will call. The client is using the remote server to have work performed on its behalf (adding vectors). Presumably the server is running on a computer architecture that can perform the work more efficiently. Parameters to the remote method and the method's return result, if any, are passed from one JVM to the other using object serialization over the network.

Two sample runs are attached as a comment to the end of the program file. The first was executed on a PC running Windows 95 using the default port for communication. The second was executed on two Sun workstations, each running Solaris 2.x, using a port other than the default specified by the user.

    Remote Method Invocation Example.

Laboratory Exercises

  1. Will the bounded buffer formed from a set of empty messages work with multiple producers and/or consumers?
  2. Modify the dining philosophers conditional rendezvous so that it prevents starvation.
  3. Write a fair readers and writers database using a (conditional) rendezvous.
  4. Write a fair unisex bathroom using a (conditional) rendezvous.
  5. Write a fair dining philosophers, a fair readers and writers, or a fair unisex bathroom using RMI (remote method invocation).

Parallel Processing

We have seen one of example of parallelizing an algorithm to use multiple CPUs: the animated quick sort example. The algorithm requires shared memory since the array is sorted ``in place'' and needs to be accessed by all the worker threads. A bag of tasks is used to distribute the work.

We can categorize parallel algorithms along several lines, using the abbreviations shown in parentheses:

  1. coarse (CG) versus fine grain (FG), determined by the frequency of thread synchronization or communication relative to the amount of computation being done;

  2. shared memory (SM) multiprocessor versus distributed memory (DM) CPUs, determined by the presence of shared data;

  3. message passing (MP) versus semaphores, monitors, barriers, or join() (SY), where the latter set requires shared memory but the first does not imply distributed memory;

  4. worker crew (WC) with a bag of tasks and a fixed number of workers, perhaps based on the number of CPUs, versus data parallelism (DP), where the amount of data or work to do determines the number of worker threads spawned and CPUs needed.

In some cases the need for shared memory can be relaxed. If the shared data is read-only, it can be replicated or broadcast to each distributed memory and the copy stored there. In other situations, it might be possible to pass the shared data around as a message, to be updated by the currently owning thread. For example, it might be possible to replace

    /* shared */ int N;
    /* shared */ Object mutex = new Object();
                         // ...
    synchronized (mutex) { N = N + 1; }
    
with
    /* local */ int N;
                         // ...
    N = receive(port);
    N = N + 1;
    send(port, N);
    

There will be too much overhead and inefficiency with FG unless the communication and synchronization tools are highly optimized, perhaps with hardware support. DM can be a network of workstations (NOW) or a specialized parallel architecture in which each CPU has its own memory and there is some kind of fast interconnect or switch connecting them.

Synchronization Package Classes

On a shared memory platform, a barrier can be used for thread synchronization. It is an extension of the semaphore idea. No thread can continue past the barrier until all threads have arrived at the barrier; in other words, each thread arriving at the barrier blocks until all the other threads have arrived. There are two constructors and two versions of the gate() method: for a one-dimensional and a two-dimensional structure of the data and threads.

    A Barrier for Thread Synchronization.

Example Programs

These examples will be categorized according to CG or FG, SM or DM, MP or SY, WC or DP.

Calculate the first n prime numbers. Start up n filter threads. FG, SM or DM, MP, DP.

    Prime Number Sieve.

Sort an array of length n by creating a pipeline of length n. FG, SM or DM, MP, DP.

    Pipeline Sort.

Sort an array of length n by creating two sorting threads and a merge thread. Send half the array to each sorting thread. FG, SM or DM, MP, WC. Can you finish the code?

    Merge Sort Skeleton.

Sort an array of length n by creating n/2 worker threads. FG, SM or DM, MP, DP. Can you finish the code?

    Compare-Exchange Sort Skeleton.

For an N-by-N chess board, start up a thread for each row in column one that a queen can be placed. The size of the board could be broadcast to each thread for DM. CG, SM or DM, MP, DP.

    Data Parallel N-Queens.

For each CPU, start up a worker thread that reads a row number for the column one queen from the bag of tasks. CG, SM or DM, MP, WC.

    Master/Worker N-Queens.

Modify the previous example so each worker executes on a networked workstation. The rendezvous technique is used: each worker is like a client that asks the server for more work to do. The program can also be run entirely locally with the -w command line option. This tests the three kinds of EstablishRendezvous objects that can be constructed. CG, DM, MP, WC.

    Multi-Machine Master/Worker N-Queens.

Sort an array of length n. In this example, every thread needs to communicate with every other thread, possible in SM, networked workstations, or a specialized DM architecture. FG, SM or DM, MP, DP.

    Radix Sort.

FG, SM, SY, DP.

    Matrix Multiply Threads Using join().

FG, SM, SY, DP.

    Matrix Multiply Threads Using a Semaphore.

FG, SM, SY, DP.

    Laplace Grid Using a Barrier.

Create a new worker thread each time a section of the array is partitioned. CG, SM, SY, DP.

    Data Parallel Quick Sort.

FG, SM or DM, MP, DP.

    Systolic Array Matrix Multiply.

Replace the shared grid array and the barrier with message passing. FG, SM or DM, MP, DP.

    Laplace Grid Using Message Passing.

Laboratory Exercises

  1. Implement one-way merge sort in Java using message passing.
  2. Implement multi-way (binary tree) merge sort using message passing.
  3. Write a pipeline sieve of Eratosthenes using message passing.
  4. Complete the compare-exchange sort skeleton using message passing.
  5. Convert the sequential ``Game of Life'' simulation program into a multithreaded one where each cell has its own thread. Use semaphores, a barrier, or message passing for synchronization.
  6. Animate one or more of the above programs.

Last modified 07 July 1997.

© 1997 Stephen J. Hartley

SJH
shartley@mcs.drexel.edu