Debugging hangs in JVM (on AIX but methodology applicable to other platforms)

The JVM is hanging if the process is still present but is not responding in some sense. The end user experience usually is timing out and get a 404 page not found error in the browser. This lack of response can be caused because:

* The process has come to a complete halt because of a deadlock condition
* The process has become caught in an infinite loop
* The process is running very slowly
If the process is not taking up any CPU time, it is deadlocked. Use the "ps -fp [process id]" command to investigate whether the process is still using CPU time. The ps command is described in AIX debugging commands. For example:

root@pw4 # ps -fp 790652
     UID     PID    PPID   C    STIME    TTY  TIME CMD
    root  790652       1   0 13:54:37  pts/0  3:05 /usr/IBM/WebSphere/AppServer/java/bin/java -Declipse.se
[after 5 minutes while the app server is still being hit by requests:]   
root@pw4 # ps -fp 790652
     UID     PID    PPID   C    STIME    TTY  TIME CMD
    root  790652       1   0 13:54:37  pts/0  3:05 /usr/IBM/WebSphere/AppServer/java/bin/java -Declipse.se

If the value of 'TIME' increases over the course of a few minutes, the process is still using the CPU and is not deadlocked. As the example shown above, the TIME column stayed "3:05" for the JVM process, indicating it hangs and mostly likely caused by deadlocks.

Next step is to take a java thread dump on the hanging JVM, by doing a "kill -3 [pid]", and look at the generated javacore file:

root@pw4 # kill -3 790652
root@pw4 # ls *790652*
should display a file like: javacore.20070126.122115.790652.txt

root@pw4 # vi javacore.20070126.122115.790652.txt
[and look for section similar to the following]

| LOCKS subcomponent dump routine
| ===============================
|     
|    Monitor pool info:
|     Current total number of monitors: 2
|           
|Monitor Pool Dump (flat & inflated object-monitors):
|      sys_mon_t:0x00039B40 infl_mon_t: 0x00039B80:
|       java/lang/Integer@004B22A0/004B22AC: Flat locked by "DeadLockThread 1"
|                                                 (0x41DAB100), entry count 1
|            Waiting to enter:
|                "DeadLockThread 0" (0x41DAAD00)
|      sys_mon_t:0x00039B98 infl_mon_t: 0x00039BD8:
|       java/lang/Integer@004B2290/004B229C: Flat locked by "DeadLockThread 0"
|                                                             (0x41DAAD00), entry count 1
|            Waiting to enter:
|                "DeadLockThread 1" (0x41DAB100)
| JVM System Monitor Dump (registered monitors):
|          Thread global lock (0x00034878):
|          Windows native HEAP lock lock (0x000348D0):
|          NLS hash table lock (0x00034928):
|          portLibrary_j9sig_async_monitor lock (0x00034980):
|          Hook Interface lock (0x000349D8):
|            < lines removed for brevity >
|
|=======================
|Deadlock detected !!!
|---------------------
|           
|  Thread "DeadLockThread 1" (0x41DAB100)
|    is waiting for:
|      sys_mon_t:0x00039B98 infl_mon_t: 0x00039BD8:
|      java/lang/Integer@004B2290/004B229C:
|    which is owned by:
|  Thread "DeadLockThread 0" (0x41DAAD00)
|    which is waiting for:
|      sys_mon_t:0x00039B40 infl_mon_t: 0x00039B80:
|      java/lang/Integer@004B22A0/004B22AC:
|    which is owned by:
|  Thread "DeadLockThread 1" (0x41DAB100)
Detecting Busy Hangs
If there is no deadlock between threads, consider other reasons why threads are not carrying out useful work. Usually, this state occurs for one of the following reasons:

1. Threads are in a 'wait' state waiting to be 'notified' of work to be done.
2. Threads are in explicit sleep cycles.
3. Threads are in I/O calls waiting to do work.

The first two reasons imply a fault in the Java code, either that of the application, or that of the standard class files included in the SDK.

The third reason, where threads are waiting (for instance, on sockets) for I/O, requires further investigation. Has the process at the other end of the I/O failed? Do any network problems exist?

To see how the javadump tool is used to diagnose loops, see Threads and stack trace (THREADS). If you cannot diagnose the problem from the javadump and if the process still seems to be using processor cycles, either it has entered an infinite loop or it is suffering from very bad performance. Using "ps -mp [process id] -o THREAD" allows individual threads in a particular process to be monitored to determine which threads are using the CPU time. If the process has entered an infinite loop, it is likely that a small number of threads will be using the time. For example:

$ ps -mp 43824 -o THREAD
    USER   PID  PPID     TID ST  CP PRI SC    WCHAN        F     TT BND COMMAND
  wsuser 43824 51762       - A   66  60 77        *   200001  pts/4   - java ...
       -     -     -    4021 S    0  60  1 22c4d670   c00400      -   - -
       -     -     -   11343 S    0  60  1 e6002cbc  8400400      -   - -
       -     -     -   14289 S    0  60  1 22c4d670   c00400      -   - -
       -     -     -   14379 S    0  60  1 22c4d670   c00400      -   - -
...
       -     -     -   43187 S    0  60  1 701e6114   400400      -   - -
       -     -     -   43939 R   33  76  1 20039c88   c00000      -   - -
       -     -     -   50275 S    0  60  1 22c4d670   c00400      -   - -
       -     -     -   52477 S    0  60  1 e600ccbc  8400400      -   - -
...
       -     -     -   98911 S    0  60  1 7023d46c   400400      -   - -
       -     -     -   99345 R   33  76  0        -   400000      -   - -
       -     -     -   99877 S    0  60  1 22c4d670   c00400      -   - -
       -     -     -  100661 S    0  60  1 22c4d670   c00400      -   - -
       -     -     -  102599 S    0  60  1 22c4d670   c00400      -   - -
...
Those threads with the value 'R' under 'ST' are in the 'runnable' state, and therefore are able to accumulate processor time. What are these threads doing? The output from ps shows the TID (Kernel Thread ID) for each thread. This can be mapped to the Java thread ID using dbx ( AIX debugger). The output of the dbx thread command gives an output of the form of:

root@pw4 # dbx -a 43824

dbx > thread
thread  state-k     wchan    state-u    k-tid   mode held scope function
$t1     wait      0xe60196bc blocked   104099     k   no   sys  _pthread_ksleep
>$t2     run                  blocked    68851     k   no   sys  _pthread_ksleep
$t3     wait      0x2015a458 running    29871     k   no   sys  pthread_mutex_lock
...
$t50    wait                 running    86077     k   no   sys  getLinkRegister
$t51    run                  running    43939     u   no   sys  reverseHandle     
$t52    wait                 running    56273     k   no   sys  getLinkRegister
$t53    wait                 running    37797     k   no   sys  getLinkRegister
$t60    wait                 running     4021     k   no   sys  getLinkRegister
$t61    wait                 running    18791     k   no   sys  getLinkRegister
$t62    wait                 running    99345     k   no   sys  getLinkRegister   
$t63    wait                 running    20995     k   no   sys  getLinkRegister
By matching the TID value from ps to the k-tid value from the dbx thread command, you can see that the currently running methods in this case are reverseHandle and getLinkRegister.

Now you can use dbx to generate the C thread stack for these two threads using the dbx thread command for the corresponding dbx thread numbers ($tx). To obtain the full stack trace including Java frames, map the dbx thread number to the threads pthread_t value, which is listed by the Javadump file, and can be obtained from the ExecEnv structure for each thread using the Dump Viewer. Do this with the dbx command thread info [dbx thread number], which produces an output of the form ( in the above output $t51 is actually running: the one that uses CPU cycles, so we use 51 in the next dbx command):

dbx > thread info 51
thread  state-k     wchan    state-u    k-tid   mode held scope function
$t51    run                  running    43939     u   no   sys  reverseHandle
      general:
         pthread addr = 0x220c2dc0         size         = 0x18c
         vp addr      = 0x22109f94         size         = 0x284
         thread errno = 61
         start pc     = 0xf04b4e64
         joinable     = yes
         pthread_t    = 3233
      scheduler:
         kernel       =
         user         = 1 (other)
      event :
         event        = 0x0
         cancel       = enabled, deferred, not pending
      stack storage:
         base         = 0x220c8018         size         = 0x40000
         limit        = 0x22108018
         sp           = 0x22106930

r   
$t63    wait                 running    20995     k   no   sys  getLinkRegister

Showing that the TID value from ps (k-tid in dbx) corresponds to dbx thread number 51, which has a pthread_t of 3233. Looking for the pthread_t in the Javadump file, you now have a full stack trace:

"Worker#31" (TID:0x36288b10, sys_thread_t:0x220c2db8) Native Thread State:
ThreadID: 00003233 Reuse: 1 USER SUSPENDED Native Stack Data : base: 22107f80
pointer 22106390 used(7152) free(250896)
----- Monitors held -----
java.io.OutputStreamWriter@3636a930
com.ibm.servlet.engine.webapp.BufferedWriter@3636be78
com.ibm.servlet.engine.webapp.WebAppRequestDispatcher@3636c270
com.ibm.servlet.engine.srt.SRTOutputStream@36941820
com.ibm.servlet.engine.oselistener.nativeEntry.NativeServerConnection@36d84490 JNI pinning lock

----- Native stack -----

_spin_lock_global_common pthread_mutex_lock - blocked on Heap Lock
sysMonitorEnterQuicker sysMonitorEnter unpin_object unpinObj
jni_ReleaseScalarArrayElements jni_ReleaseByteArrayElements
Java_com_ibm_servlet_engine_oselistener_nativeEntry_NativeServerConnection_nativeWrite

------ Java stack ------ () prio=5

com.ibm.servlet.engine.oselistener.nativeEntry.NativeServerConnection.write(Compiled Code)
com.ibm.servlet.engine.srp.SRPConnection.write(Compiled Code)
com.ibm.servlet.engine.srt.SRTOutputStream.write(Compiled Code)
java.io.OutputStreamWriter.flushBuffer(Compiled Code)
java.io.OutputStreamWriter.flush(Compiled Code)
java.io.PrintWriter.flush(Compiled Code)
com.ibm.servlet.engine.webapp.BufferedWriter.flushChars(Compiled Code)
com.ibm.servlet.engine.webapp.BufferedWriter.write(Compiled Code)
java.io.Writer.write(Compiled Code)
java.io.PrintWriter.write(Compiled Code)
java.io.PrintWriter.write(Compiled Code)
java.io.PrintWriter.print(Compiled Code)
java.io.PrintWriter.println(Compiled Code)
pagecompile._identifycustomer_xjsp.service(Compiled Code)
javax.servlet.http.HttpServlet.service(Compiled Code)
com.ibm.servlet.jsp.http.pagecompile.JSPState.service(Compiled Code)
com.ibm.servlet.jsp.http.pagecompile.PageCompileServlet.doService(Compiled Code)
com.ibm.servlet.jsp.http.pagecompile.PageCompileServlet.doGet(Compiled Code)
javax.servlet.http.HttpServlet.service(Compiled Code)
javax.servlet.http.HttpServlet.service(Compiled Code)

And, using the full stack trace, it should be possible to identify any infinite loop that might be occurring. The above example shows the use of spin_lock_global_common, which is a busy wait on a lock, hence the use of CPU time. For more information, consult the IBM AIX documentation.

你可能感兴趣的:(method)