Building programs around threads of execution—that is, around
specific sequences of instructions—delivers significant performance
benefits. Consider, for example, a program that reads large amounts of
data from disk and processes that data before writing it to the screen
(such as a DVD player). On a traditional, single-threaded program (the
kind most client programs use today), where only one task executes at a
time, each of these activities happens as a part of a sequence of
distinct phases. No data is processed until a chunk of a defined size
has been read. So the program logic that could be processing the data
is not executed until disk reads are complete. This leads to inferior
performance.
On a threaded program, one thread can be assigned to read data,
another thread to process it, and a third to write it out to the
graphics card. These three threads can operate in parallel so that data
is being processed while disk reads are going on. And overall
performance improves. Many other examples can be devised in which the
ability to do two things at once will provide better performance. The
Java virtual machine (JVM) is itself heavily threaded for just this
reason.
This article discusses creating multithreaded Java code, a few
best-practices for designing parallel programs, and some of the tools
and resources available to developers. That's a lot to cover in one
article, so I'll just highlight the salient points and direct you to
resources for additional information.
Threading Java Code
All programs use at least one thread. In C/C++ and Java, this is the
thread that is started with the call to main(). The creation of
additional threads requires several steps: creating a new thread and
then assigning it work. Once the work is done, the thread is
automatically killed off by the JVM.
Java provides two methods for creating threads and assigning them
work. The first of these is to subclass Java's Thread class (in
java.lang package) and then over-ride the run() method with the
thread's work function. Here is an example of this:
public class SimpleThread extends Thread {
public SimpleThread(String str) {
super(str);
}
public void run() {
for (int i = 0; i < 10; i++) {
System.out.println(i + " " + getName());
try {
sleep((long)(Math.random() * 1000));
} catch (InterruptedException e) {}
}
System.out.println("DONE! " + getName());
}
}
This class subclasses Thread and provides its own run() method. That
function runs a loop which prints a passed string to the screen and
then waits for a random amount of time. After ten iterations of the
loop, the function prints "DONE!", then exits, which kills the thread.
Here is the main function that creates the threads:
public class TwoThreadsDemo {
public static void main (String[] args) {
new SimpleThread("Do it!").start();
new SimpleThread("Definitely not!").start();
}
}
Note the simplicity of this code: the function is started, given a
name (which is the string the thread will print) and start() is called.
start() will call the run() method. The results of this program are as
one would expect:
0 Do it!
0 Definitely not!
1 Definitely not!
2 Definitely not!
1 Do it!
2 Do it!
3 Do it!
3 Definitely not!
4 Do it!
4 Definitely not!
5 Do it!
5 Definitely not!
6 Do it!
7 Do it!
6 Definitely not!
8 Do it!
7 Definitely not!
8 Definitely not!
9 Do it!
DONE! Do it!
9 Definitely not!
DONE! Definitely not!
As can be seen, the output from the two threads is intermingled. In
a single-threaded program, all the "Do it!" commands would have printed
together, followed by all the "Definitely not!" writes.
Different runs of this program will generate different results. This
uncertainty is due to two aspects: there is a random pause between
iterations of the loop and, more importantly, because the timing of
thread execution is not guaranteed. This is a key principle. The JVM
will run the threads on its own schedule (which generally favors
running them as quickly as possible, but there is no guarantee of when
a given thread will be run). A priority can be associated with
individual threads to make sure that critical threads are handled by
the JVM before less important ones.
The second way to start up a thread is to use a class with the
Runnable interface, which is also defined in java.lang. This Runnable
interface specifies a run() method that then becomes the thread's
principal function, similar to the previous code.
The general style in Java these days is to favor interfaces over
inheritance. This rule of thumb is valid in this choice of options as
well. By using the interface option, a class can still inherit
(subclass) later on, if necessary. (This would occur, for example, if
the class were to be used later as an applet.)
Implications of Threading
While threading enhances performance, it adds complexity to the
program's internal operations. This complexity arises primarily from
interaction among the threads themselves. It is important to become
familiar with these issues, because as more cores are added to Intel®
processors, the number of threads to use will grow correspondingly. If
these issues are not understood when programming for many threads,
difficult to find bugs will undoubtedly ensue. So, let's look at some
of the issues and solutions.
Waiting on another thread to finish: Suppose we have an array of
integers that we want to process. We could step through the array, one
integer at a time, and perform the operation. Or, more efficiently, we
could assign several threads so that each one would process some
portion of the array. Suppose we had to wait for all threads to finish
before moving on to the next step. To synchronize activities temporally
between the threads, threads use the join() method, which makes one
thread wait on another to complete. The joining thread (thread B) waits
on the joined thread (thread A) to finish. An optional timeout in
join() enables the thread B to move forward with the other work if
thread A does not terminate within a given time frame. This provision
touches on a core complexity of threading: the problem of waiting on a
thread. We'll touch on this next.
Waiting on locked objects: Suppose we wrote an airline
seat-assignment system. On large programs of this kind, it is common
that a thread is assigned to every user connected to the software, such
as one to every ticket clerk. (In very large systems, this is not
always the case.) If two users are simultaneously attempting to assign
the same seat, problems will arise. Unless special steps are taken, one
thread will assign the seat while another thread is doing the same
thing. Both users will think they have an assigned place on the flight.
To avoid two threads simultaneously modifying the same data item, we
have one thread lock the data item just prior to modification. In this
way, when the second thread comes along to make the modification, it
will have to wait until the lock is released by the first thread. When
this occurs, the thread will see that the seat has just been assigned,
and the request for seat assignment will fail. The problem of two
threads racing to get the seat assignment completed is known as a race
condition, and it can wreak havoc when it occurs. Best practice is to
lock any code that accesses a variable that is accessible to more than
one thread.
There are several locking options in Java. The most common of these
is the use of the synchronized keyword. When the signature of a method
includes synchronized, only one thread can execute that method at any
given time. The lock on the method is then released when the method
finishes executing. For example,
protected synchronized int reserveSeat ( Seat seat_number )
{
if ( seat_number.getReserved() == false )
{
seat_number.setReserved();
return ( 0 );
}
else
return ( -1 );
}
is a method that can only be run by one thread at a time. This locking breaks the race condition described earlier.
Using synchronized is one of several methods for handling
interactions between threads. Java release J2SE 5.0 adds several
convenient ways to lock objects. Most of these are found in the
java.util.concurrent.locks package, which should be examined carefully
once you're comfortable with Java threads.
While locking mechanisms solve the problem of race conditions, they
introduce new complexity. The most difficult situation, in this regard,
is called deadlock. Suppose thread A is waiting on thread B, and thread
B is waiting on thread A, then both threads will be blocked forever,
which is why the term deadlock is so descriptive. Deadlocks can be very
difficult to identify (they can be hard to reproduce), and so care must
be exerted to make sure there are no dependencies of this kind between
threads.
Using Thread Pools
As mentioned earlier, when threads finish executing they are killed off
by the JVM and the memory allocated to them is eventually
garbage-collected. The trouble with the constant creation and
destruction of threads is that it wastes clock cycles, as creating
threads does have distinct overhead associated with it. A common
solution and best practice is to allocate a group of threads (called a
thread pool) early in the program and then reuse the threads as they
become available. Using this scheme, the function assigned to a thread
at creation is to sit in the pool and wait for a work assignment. Then,
when the assigned work is complete, the thread is returned to the
thread pool.
J2SE 5.0 introduced the java.util.concurrent package that includes a
pre-built thread pooling framework that greatly facilitates this
approach. More information on Java thread pools, including a tutorial
can be found here.
When designing threaded programs and thread pools, the question
naturally arises as to how many threads should be created. The answer
depends on how you plan to use the threads. If you divide the work
across threads on the basis of discrete tasks, then the number of
threads equals the number of tasks. For example, a word processor might
use one thread for display (the primary program thread in almost all
systems is the one entitled to update the user interface), one for
paginating the document, a third one for spell checking, and a fourth
for miscellaneous background operations. In this case, four threads are
ideal and they provide a very natural way to write the software.
However, if the program—like the one discussed earlier—uses multiple
threads to do similar work, then the optimal number of threads tends to
be a reflection of the system resources, especially the number of
execution pipelines on the processor and the number of processors. On
systems with Intel processors that rely on Hyper-Threading Technology
(HT Technology), there are currently two execution pipelines per
processor core. The new multi-core chips have two processor cores per
chip. Intel has indicated that future chips are likely to have more
cores, in good part because additional cores generate more performance
without substantially increasing heat or power consumption. So,
pipelines are going to become more plentiful.
Using the arithmetic suggested by these architectures, on a
dual-core Pentium® 4-based system, four execution pipelines will be
available and so four threads will provide ideal performance. On a
dual-processing Intel® Xeon™-based workstation, the ideal number of
thread is four, because at present the Xeon chips offer HT Technology
but no multi-core models.
Final Thoughts
When running threaded Java programs on Intel platforms, you will likely
want to be able to monitor the load on processors and the execution of
threads. One of the best JVMs for obtaining this data and managing how
the JVM handles parallel processing is BEA's WebLogic JRockit. JRockit
has the additional advantage of having been purpose-built for the Intel
platforms and optimized by engineers from BEA and Intel.
Regardless of which JVM you use, Intel® VTune™ Performance Analyzer
will give you a deep view into how the JVM is executing your code,
including per-thread performance bottlenecks. A white paper on using
VTune performance analyzer with Java is located here.
This article has provided an overview of how threading works on the
Java platform. As Intel continues shipping processors with HT
Technology and releases more multi-core chips, pressure to obtain the
performance benefits from these multiple pipelines will increase. And,
as the number of cores increases, the number of pipelines will
proliferate correspondingly. The only way to exploit there advantage is
to use threads, such as those discussed in this article. And the
benefit of threaded applications in Java will become even more clearly
evident. The following section provides additional resources to help
guide you along.
Additional References and Resources