Java堆栈跟踪介绍(An Introduction to Java Stack Traces)

Stack Traces

It's the night before your product release. You run your final tests and then it happens--you get a Java stack trace. You look through your Java programming books, but this seems to be an area that's missing! Well no need to panic, just follow the simple steps suggested here and you'll be home before midnight! This article shows you step by step how to recognize and collect the clues in a stack trace to solve your Java software problems.

What is a Java stack trace? A Java stack trace is a user-friendly snapshot of the threads and monitors in a Java1 Virtual Machine (JVM). Depending on how complex your application or applet is, a stack trace can range from fifty lines to thousands of lines of diagnostics.

However, regardless of the size of the stack trace there are a few key things that anyone can find to help diagnose most Java software problems, whether you are a Java programming expert or very new to the Java platform.

How are Java Stack Traces Generated?

There are three popular ways to generate a Java stack trace: sending a signal to the Java Virtual Machine; the Java Virtual Machine generates a stack trace for you; or using debugging tools or Java API calls.

Sending a signal to the Java Virtual Machine

On UNIX platforms you can send a signal to a program by using the kill command. This is the quit signal, which is handled by the JVM. For example, on Solaris you can use the command kill -QUIT process_id, where process_id is the process number of your Java program.

Alternatively you can enter the key sequence <ctrl>\ in the window where the Java program was started. Sending this signal instructs a signal handler in the JVM, to recursively print out all the information on the threads and monitors inside the JVM.

To generate a stack trace on Windows 95, or Windows NT platforms, enter the key sequence <ctrl><break> in the window where the Java program is running, or click the Close button on the window.

The Java Virtual Machine generates a stack trace for you

If the JVM experienced an internal error, for example a segmentation violation or an illegal page fault, it will call its own signal handler to print out the threads and monitors information.

Using debugging tools or Java API calls

You can generate a partial Java stack trace, which in this case is only the threads information, by using the Thread.dumpStack method, or the printStackTrace method of the Throwable class. You can also obtain similar information by entering the command "where" inside the Java debugger.

If you are successful at generating a stack trace you should see something like this: $ SIGQUIT   3*   quit
    si_signo [3]: SIGQUIT   3*   quit
    si_errno [0]: Error 0
    si_code [0]: SI_USER [pid: 11927, uid: 26432]
        stackbase=EE292000, stackpointer= EE291878

Full thread dump:
    "Thread-5" (TID:0xee703b78, sys_thread_t:0xee261db8, state:R) prio=5
        mythread.stopper(exec3.java:10)
        mythread.run(exec3.java:16)
    "Thread-4" (TID:0xee703bb8, sys_thread_t:0xee291db8, state:R) prio=5 *current thread*
        mythread.stopper(exec3.java:10)
        mythread.run(exec3.java:16)
    "Finalizer thread" (TID:0xee700220, sys_thread_t:0xee2c1d b8, state:R) prio=1
    "Async Garbage Collector" (TID:0xee700268, sys_thread_t:0 xee2f1db8, state:R) prio=1
    "Idle thread" (TID:0xee7002b0, sys_thread_t:0xee3c1db8, state:R) prio=0
    "Clock" (TID:0xee700088, sys_thread_t:0xee3f1db8, state:CW) prio=12
    "main" (TID:0xee7000b0, sys_thread_t:0x693a0, state:CW) prio=5
        exec3.main(exec3.java:32)
Monitor Cache Dump:
    mythread@EE703BB8/EE74E190: owner "Thread-4" (0xee291db8, 1 entry)
    mythread@EE703B78/EE74E270: owner "Thread-5" (0xee261db8, 1 entry)
    <unknown key> (0x693a0): <unowned>
        Waiting to be notified:
            "main" (0x693a0)
Registered Monitor Dump:
    Thread queue lock: <unowned>
    Name and type hash table lock: <unowned>
    String intern lock: <unowned>
    JNI pinning lock: <unowned>
    JNI global reference lock: <unowned>
    BinClass lock: <unowned>
    Class loading lock: <unowned>
    Java stack lock: <unowned>
    Code rewrite lock: <unowned>
    Heap lock: <unowned>
    Has finalization queue lock: <unowned>
    Finalize me queue lock: <unowned>
    Monitor IO lock: <unowned>
    Child death monitor: <unowned>
    Event monitor: <unowned>
    I/O monitor: <unowned>
    Alarm monitor: <unowned>
        Waiting to be notified:
            "Clock" (0xee3f1db8)
    Sbrk lock: <unowned>
    Monitor registry: owner "Thread-4" (0xee291db8, 1 entry)
Thread Alarm Q:
    sys_thread_t 0x693a0   [Timeout in 9997374 ms]
   

What are the First Things to Look for?

By following the next three steps you might find the solution to the application's problem without any more analysis. However, if you don't, the evidence you'll gain from this exercise is vital to any further investigation.

Finding the current thread

You now have a Java stack trace. The first piece of information you need to look for is the current thread. In theory the current thread should be the last thread that was running when the snapshot was taken. If you have seen a Java stack trace before you may have noticed that often the current thread is labeled  *current thread* next to the appropriate thread.

Is that the current thread you ask? Most of the time yes, but unfortunately this is only a good guess by the signal handler. Sometimes the signal handler will be marked as the current thread, which doesn't help your diagnosis at all! And on some platforms none of the threads will be marked as the current thread! But don't despair, more evidence about which thread is the current thread is explained in the section, "Examining Monitors".

Note:If you see the label Compiled Code next to the current thread then this stack trace came from a JVM using the JIT compiler. If possible generate another stack trace without the JIT enabled using the -nojit parameter to the JVM.

Runnable threads

Next you need to track down all the threads that have a state of R, which stands for Runnable. Threads in the R state were running, or were ready to run the next time the threads were scheduled. Make a note of these, because they could indicate where your problem lies.

There is more evidence about which thread is the current thread explained in the section, "Examining Monitors".

Core files

If the JVM generated the stack trace because of an internal error then some native code in your own application, or the JVM was probably to blame. If you are using UNIX, and you find a core file, run the following command to find out which JDK software it came from:

strings core | grep JAVA_HOME

In the JDK 1.2 software release, threads that called methods resulting in a call to native code are indicated in the stack trace.

Which Software Release and which Platform Generated the Stack Trace?

Wouldn't it be easier if the stack trace contained a JDK version string? Well, there is a request for a version string to be included in any stack trace in bug number 4047300. When this request is implemented it will eliminate the need for the next section!

However, without a version string you can take a pretty good guess as which release this stack trace came from. Obviously if you generated the stack trace yourself this shouldn't be much of an issue, but you may see a stack trace posted on a newsgroup or in an email article.

Here are some hints to help in your detective work:

First identify where the Registered Monitor Dump section is in the stack trace: If you see a utf8 hash table lock in the Registered Monitor Dump then this is a JDK 1.2 stack trace. If you see a JNI pinning lock and no utf8 hash lock then this is a JDK 1.1+ release. If neither of these appears in the Registered Monitor Dump then it is probably a  JDK 1.0.2 release.

What platform did the stack trace come from?You can also find out if the stack trace came from a Windows 95, an NT, or a UNIX machine by looking for any waiting threads.  On a UNIX machine the waiting threads are named explicitly. On a Windows 95, or NT machine only a count of the waiting threads is displayed: Windows 95/NT:

Finalize me queue lock: <unowned> Writer: 1 UNIX: Finalize me queue lock: <unowned>

waiting to be notified "Finalizer Thread"

Determining which thread package was used.

Windows 95 and Windows NT Java Virtual Machines are by default native thread JVMs. UNIX JVMs are by default green thread JVMs, they use a pseudo thread implementation. To make your JVM use native threads you need to supply the -native parameter, for example, java -native MyClass.

By verifying the existence of an Alarm monitor in the stack trace output you can identify that this stack trace came from a green threads JVM.

Determining the Thread States

You will see many different threads in many different states in a snapshot from a JVM stack trace. The key used is:

R Running or runnable thread
S Suspended thread
CW Thread waiting on a condition variable
MW Thread waiting on a monitor lock
MS Thread suspended waiting on a monitor lock

Normally only threads in R, S, CW or MW  should appear in the stack trace. If you see a thread in state MS, report it to Sun Microsystems, via the Bug Parade as there is a good chance it is a bug, because most of the time a thread in Monitor Wait state will appear in the S state when it is suspended. Monitors (covered in the next section) are used to manage access to code that should only be run by a single thread at a time.

Examining Monitors

This brings us to the other part of the stack trace: the monitor dump. If you consider that the threads section of a stack trace identifies the multithreaded part of your application then the monitors section represents the parts of your application that are single threaded.

It may be easier to imagine a monitor as a car wash. In most car washes, only one car can be in the wash at a time. In your Java code only one thread can have the lock to a synchronized piece of code at a time. All the other threads queue up to enter the synchronized code just as cars queue up to enter the car wash.

A monitor can be thought of as a lock on an object, and every object has a monitor. When you generate a stack trace, monitors are listed as being either registered or not registered. In the majority of cases these registered monitors, or system monitors, shouldn't be the cause of your software problems.  But it helps to be able to understand and recognize them. The following table describes the common registered monitors:

Thread queue lock  Protects the queue of active threads.
Name and type hash table lock  Protects the JVM hash tables of constants and their types. 
String intern lock Locks the hashtable of defined 
Strings that were loaded from the class constant pool. 
JNI pinning lock  Protects block copies of arrays to native method code. 
JNI global reference lock Locks the global reference table which holds values that need to be explicitly freed, and will outlive the lifetime of the native method call. 
BinClass lock Locks access to the loaded and resolved classes list. 
Class loading lock Ensures only one thread loads a class at a time.
Java stack lock Protects the free stack segments list.
Code rewrite lock  Protects code when an optimization is attempted .
Heap lock  Protects the Java heap during heap memory management 
Has finalization queue lock Protects the list of queue lock objects that have been garbage-collected, and deemed to need finalization. They are copied to the Finalize me queue.
Finalize me queue lock Protects a list of objects that can be finalized at leisure. 
Monitor IO lock Protects physical I/O for example, 
open and read. 

The monitor registry itself is protected by a monitor. This means that the thread that owns the lock is the last thread to use a monitor. It's very likely that this thread may also be the current thread.

Because only one thread can enter a synchronized block at a time, other threads queue up at the start of the synchronized code and appear as thread state MW. In the monitor cache dump they will be denoted as "waiting to enter" threads. In user code a monitor is called into action wherever a synchronized block or method is used.

Any code waiting on an object or event, that is a wait method, also has to be inside a synchronized block. However once the wait method is called, the lock on the synchronized object is given up.

When the thread in the wait state is notified of an event to the object, it has to compete for exclusive access to that object, and it has to obtain the monitor. Even when a thread has sent a "notify event" to the waiting threads, none of the waiting threads can actually gain control of the monitor lock until the notifying thread has left its synchronized code block.

You will see "Waiting to be notified" for threads at the wait method

Putting the Steps into Practice

Consider a real life problem: Bug 4098756 for example. You can find details on this bug in JDC Bug Parade. This bug documents a problem that occurred when using a Choice Component on Windows 95.

When the user selected one of the choices from the Choice Component using the mouse, everything was fine. However when the user tried to use an arrow key to move up or down the list of choices the Java application froze.

Fortunately this problem was reproducible and there was a Java stack trace to help track down the problem. The full stack trace is in the bug report page, but you only need to focus on two key threads, for example:

"AWT-Windows" (TID:0xf54b70, sys_thread_t:0x875a80,Win32ID:0x67, state:MW) prio=5 java.awt.Choice.select(Choice .java:293)         sun.awt.windows.WChoicePeer.handleAction(WChoicePeer.java:86)

"AWT-EventQueue-0" (TID:0xf54a 98,sys_thread_t:0x875c20, Win32ID:0x8f, state:R) prio=5         java.awt.Choice.remove(Choice.java:228) java.awt.Choice.removeAll(Choice.java:246)  

The AWT-EventQueue-0 thread is in a runnable state inside the remove method. Remove is synchronized so that explains why the other thread, AWT-Windows, cannot enter the select method. The AWT-Windows thread is in state MW, monitor wait.

However, if you keep taking stack traces this situation does not change, and the GUI appears to have frozen. This indicates that the remove call never returned. By following the code path to the ChoicePeer class, you can see that this is making a native MFC call that does not return. That's where the real problem lies and is a bug in the Java core classes. The user's code was OK.

Expert's Checklist

This covers the theory about Java stack traces, and you should now know what to look for next time you see one. To save yourself time, be sure to make full use of the JDC bug search to see if the problem you are having has already been reported.

To summarize, here are the steps to take when you next come across a problem Java program:

For hanging, deadlocked or frozen programs: If you think your program is hanging, generate a stack trace and examine the threads in states MW or CW. If the program is deadlocked then some of the system threads will probably show up as the current threads, because there is nothing else for the JVM to do.

For crashed, aborted programs: On UNIX look for a core file. You can analyze this file in a native debugging tool such as gdb or dbx. Look for threads that have called native methods. Because Java technology uses a safe memory model, any corruption probably occurred in the native code. Remember that the JVM also uses native code, so it may not necessarily be a bug in your application.

For busy programs: The best course of action you can take for busy programs is to generate frequent stack traces. This will narrow down the code path that is causing the errors, and you can then start your investigation from there.

Good Luck and Happy Debugging

Calvin Austin is a software engineer on the Java Developer Connection team. This article is based on a presentation he delivered at the 1998 JavaOneSM Developer Conference.

_______
1 As used on this web site, the terms "Java virtual machine" or "JVM" mean a virtual machine for the Java platform.

你可能感兴趣的:(java,jvm)