Problem Description A server hang is suspected when:
|
Problem Troubleshooting Please note that not all of the following items would need to be done. Some issues can be solved by only following a few of the items.Quick Links:
Why does the problem occur? |
Topic
|
Pattern Name
|
Link
|
RMI, RJVM responses – all threads tied up waiting for RJVM, RMI responses. | EJB_RMI Server Hang | |
Application Deadlock – thread locks resource1 then waits for lock for resource2. Another thread locks resource2 and then waits for lock for resource1. | Application Deadlock Causes Server Hang | |
Threads are all used up, none available for new work. | Thread Usage Server Hang | TBD |
Garbage Collection taking too much time. | Garbage Collection Server Hang | TBD |
JSP improper settings for servlet times, e.g. PageCheckSeconds. | JSP cause Server Hang | TBD |
Long Running JDBC calls or JDBC deadlocks lead to a hang. | JDBC Causes Server Hang | JDBC Causes Server Hang |
JVM hang during (code optimization), looks like server hang. | Server Hang in Code Optimization | TBD |
JSP compilation causes server hang under heavy load. | JSP Compilation Server Hang | TBD |
SUN JVM bugs, e.g. Light weight thread library. | Sun JVM Bugs that Cause Server Hangs | TBD |
Top of Page
Ensure that the server is actually hanging and not doing garbage collection. To verify, restart the server with -verbosegc turned on, and redirect stdout and stderr to one file. When the server stops responding, it can be determined if it’s doing garbage collection or it is really hanging. If the garbage collection is taking too long (>10 seconds), the server may miss the heartbeats that servers use to keep each other informed of the topoplogy of the cluster. WebLogic Server uses the ‘default’ thread queue or a configured application specific thread queue to service client requests. Client requests will only be handled in the default queue if no application specific thread queue is defined. Please see Tuning WebLogic Server Applications, Tuning the Default Execute Queue Threads, and Tuning WebLogic Server Performance Parameters for more information on defining application specific thread queues. In release 8.1, a change was made to the thread architecture in WebLogic Server. A specific kernel thread group for internal WebLogic tasks was created. This was found to be necessary to avoid deadlocks that occurred in earlier releases when all threads in the ‘default’ thread queue were used and none were thus available for WebLogic internal tasks. The threads in the ‘default’ queue or the application specific thread queue (if one has been configured) are the threads that should be examined in the event of a server hang. Here’s an example of what one of these threads looks like in a thread dump. Execute Thread ‘14′ from the ‘default’ queue looks like in a thread dump when the thread is waiting for work. The latest method called by this thread is Object.wait(). This thread is in a state “waiting on monitor”. |
“ExecuteThread: ‘14′ for queue: ‘default’” daemon prio=5 tid=0×8b0ab30 nid=0×1f4 waiting on monitor [0x96af000..0x96afdc4] at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:420) at weblogic.kernel.ExecuteThread.waitForRequest(ExecuteThread.java:94) at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:118) |
Threads can be in one of several states. Please see the table below for a description of the thread states. The format of the thread dump varies with the vendor. Check on the vendor’s website for information regarding the format.
Below is an example of threads that may be hanging. ExecuteThread ‘9′ is waiting to lock some object <dde51520>. Notice the “waiting to lock <dde51520>” line in the stack trace for this thread. ExecuteThread ‘6′ is also “waiting to lock the same object <dde51520>”. The third thread, ExecuteThread ‘5′ has locked this object <dde51520>and is doing work. This example demonstrates why one thread dump is not enough. If the server is hanging, and it is suspected that the cause is the locked object <dde51520>, then subsequent thread dumps will show whether or not that object was released and a new thread has locked object <dde51520>. If after several thread dumps, you do not see that the threads have progressed, that object <dde51520> has not been released, you may suspect that there is a problem with the routine(s) in the ExecuteThread ‘5′ call stack because the lock is not being released. |
“ExecuteThread: ‘9′ for queue: ‘weblogic.kernel.Default’” daemon prio=5 tid=0xf684c8 nid=0×13 waiting for monitor entry [cc2ff000..cc2ffc24] at weblogic.cluster.MemberManager.done(MemberManager.java:306) - waiting to lock <dde51520> (a weblogic.cluster.MemberManager) at weblogic.cluster.MulticastManager.execute(MulticastManager.java:399) at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:197) at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:170)
“ExecuteThread: ‘6′ for queue: ‘weblogic.kernel.Default’” daemon prio=5 tid=0×9df020 nid=0×10 waiting for monitor entry [cc5ff000..cc5ffc24] “ExecuteThread: ‘5′ for queue: ‘weblogic.kernel.Default’” daemon prio=5 tid=0×9df020 nid=0×12 waiting for monitor entry [cc5ff000..cc5ffc24] at weblogic.cluster.MemberManager.checkTimeouts(MemberManager.java:346)
|
Determine if the”default” ExecuteThread queue is overloaded. Use the console to determine if any of the ExecuteThreads in the ‘default’ queue are idle. If none are idle, then the application probably needs to be configured with a larger number of ExecuteThreads. This value can be changed through the console and is in the config.xml file.
If the Execute Queue has idle threads, it is possible that not enough socket reader threads are allocated. By default, a WebLogic Server instance creates three socket reader threads upon booting. If a cluster system utilizes more than three sockets during peak periods, increase the number of socket reader threads. The number of socket reader threads should usually be small. However, configure one thread for each Weblogic Server that acts as a client of the server instance that is hanging. If using a JDBC connection pool, ensure that the JDBC connections have been configured to be equivalent to the number of simultaneous requests, i.e., execute threads, for the pool.
Unix Systems (Solaris, HP, AIX) Windows, XP, NT If you have installed WebLogic as a Windows service, you will not be able to see the messages from the JVM or WebLogic Server that are printed to standard out or standard error. To view these messages, you must direct standard out and standard error to a file. To do this, take the following steps:
Linux
Use a grep argument that is a string that will be found in the process stack that matches the server startup command. The first PID reported will be the root process, assuming that the ps command has not been piped to another routine.
Another method of getting a thread dump is to use the THREAD_DUMP admin command. This method is independent of the OS on which the server instance is running.
NOTE: This command cannot be used if unable to ping the server instance. If the JVM in use is Sun’s, the thread dump goes to stdout. Sun has enhanced the thread dump format between JVM 1.3.1 and 1.4. To obtain Sun’s 1.4 style of thread dump add the following option to the java command line for starting the 1.3.1 JVM:
Threads can be in one of the following states:
More information on thread states can be found at http://java.sun.com/developer/onlineTraining/Programming/JDCBook/stack.html#states. There is also a thread analysis tool at http://dev2dev.bea.com/resourcelibrary/utilitiestools/adminmgmt.jsp. |
“ListenThread.Default” prio=10 tid=0×00037888 nid=93 lwp_id=6888343 runnable [0x 1a81b000..0x1a81b530] at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:353) - locked <0×26d9d490> (a java.net.PlainSocketImpl) at java.net.ServerSocket.implAccept(ServerSocket.java:439) at java.net.ServerSocket.accept(ServerSocket.java:410) at weblogic.socket.WeblogicServerSocket.accept(WeblogicServerSocket.java:24) at weblogic.t3.srvr.ListenThread.accept(ListenThread.java:713) at weblogic.t3.srvr.ListenThread.run(ListenThread.java:290) |
Socket Reader Threads accept the incoming request from the Listen Thread Queue and put it on the Execute Thread Queue. If there are no socket reader threads in the thread dump, then there is a bug somewhere that is causing the socket reader thread to vanish. There should always be at least 3 socket reader threads. One socket reader thread is usually in the poll function, while the other two are available to process requests. Below are Socket Reader threads from a sample thread dump. |
“ExecuteThread: ‘2′ for queue: ‘weblogic.socket.Muxer’” daemon prio=10 tid=0×000 36128 nid=75 lwp_id=6888070 waiting for monitor entry [0x1b12f000..0x1b12f530] at weblogic.socket.PosixSocketMuxer.processSockets(PosixSocketMuxer.java:92) - waiting to lock <0×25c01198> (a java.lang.String) at weblogic.socket.SocketReaderRequest.execute(SocketReaderRequest.java:32) at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:178) at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:151)
“ExecuteThread: ‘1′ for queue: ‘weblogic.socket.Muxer’” daemon prio=10 tid=0×000 35fc8 nid=74 lwp_id=6888067 runnable [0x1b1b0000..0x1b1b0530] at weblogic.socket.PosixSocketMuxer.poll(Native Method) “ExecuteThread: ‘0′ for queue: ‘weblogic.socket.Muxer’” daemon prio=10 tid=0×000 35e68 nid=73 lwp_id=6888066 waiting for monitor entry [0x1b231000..0x1b231530] |
The ThreadPoolPercentSocketReaders attribute sets the maximum percentage of execute threads that are set to read messages from a java socket. The optimal value for this attribute is application-specific. The default value is 33, and the valid range is 1 to 99.
Allocating execute threads to act as socket reader threads increases the speed and the ability of the server to accept client requests. It is essential to balance the number of execute threads that are devoted to reading messages from a socket and those threads that perform the actual execution of tasks in the server. In release 8.1, the socket reader threads no longer use “ExecuteThreads” in the default queue. Instead they have their own thread group named. Next Steps |