上周在AIX6.1下安装weblogic10.3.0,并配置了hacmp集群环境,但是接下来的几天遇到了挂起问题,为此还加班了一天。
Weblogic启动后,10到30分钟就会hang住,应用和管理控制台都无法访问。强制kill -9 pid后端口无法释放,使用rmsock 命令查看端口显示Wait for exiting processes to be cleaned up before removing the socket。
1. 用ps –ef | grep java找到weblogic进程,每隔三分种执行kill -3 pid,在domain目录下生成javacore文件
2. 分析weblogic日志,发现如下内容
<Aug 21, 2009 4:33:37 AM CDT> <Error> <WebLogicServer> <BEA-000337> <[STUCK] ExecuteThread: ‘1′ for queue: ‘weblogic.kernel.Default (self-tuning)’ has been busy for “620″ seconds working on the request
“weblogic.work.SelfTuningWorkManagerImpl$WorkAdapterImpl@20de20de”, which is more than the configured time (StuckThreadMaxTime) of “600″ seconds. Stack trace:
java.net.SocketOutputStream.socketWrite0(Native Method)
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:103)
……
<Aug 21, 2009 4:34:37 AM CDT> <Error> <WebLogicServer> <BEA-000337> <[STUCK] ExecuteThread: ‘1′ for queue: ‘weblogic.kernel.Default (self-tuning)’ has been busy for “680″ seconds working on the request
“weblogic.work.SelfTuningWorkManagerImpl$WorkAdapterImpl@20de20de”, which is more than the configured time (StuckThreadMaxTime) of “600″ seconds. Stack trace:
java.net.SocketOutputStream.socketWrite0(Native Method)
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:103)
……
3. 用IBM Thread and Monitor Dump Analyzer for java分析刚才生成的thread dump,找到如下两个线程信息:
3XMTHREADINFO “[ACTIVE] ExecuteThread: ‘5′ for queue: ‘weblogic.kernel.Default (self-tuning)’” TID:0×39CBED00, j9thread_t:0×3751C83C, state:R, prio=5
3XMTHREADINFO1 (native thread ID:0xCE1DB, native priority:0×5, native policy:UNKNOWN)
4XESTACKTRACE at java/net/PlainSocketImpl.socketClose0(Native Method)
4XESTACKTRACE at java/net/PlainSocketImpl.socketPreClose(PlainSocketImpl.java:706)
4XESTACKTRACE at java/net/PlainSocketImpl.close(PlainSocketImpl.java:540)
4XESTACKTRACE at java/net/SocksSocketImpl.close(SocksSocketImpl.java:1041)
4XESTACKTRACE at java/net/Socket.close(Socket.java:1343)
4XESTACKTRACE at weblogic/socket/SocketMuxer.closeSocket(SocketMuxer.java:475)
4XESTACKTRACE at weblogic/socket/SocketMuxer.cancelIo(SocketMuxer.java:813)
4XESTACKTRACE at weblogic/socket/SocketMuxer$TimerListenerImpl.timerExpired(SocketMuxer.java:1021(Compiled Code))
4XESTACKTRACE at weblogic/timers/internal/TimerImpl.run(TimerImpl.java:273(Compiled Code))
4XESTACKTRACE at weblogic/work/SelfTuningWorkManagerImpl$WorkAdapterImpl.run(SelfTuningWorkManagerImpl.java:516(Compiled Code))
4XESTACKTRACE at weblogic/work/ExecuteThread.execute(ExecuteThread.java:201(Compiled Code))
4XESTACKTRACE at weblogic/work/ExecuteThread.run(ExecuteThread.java:173)
3XMTHREADINFO “ExecuteThread: ‘7′ for queue: ‘weblogic.socket.Muxer’” TID:0×35381D00, j9thread_t:0×35385864, state:R, prio=5
3XMTHREADINFO1 (native thread ID:0xB916F, native priority:0×5, native policy:UNKNOWN)
4XESTACKTRACE at weblogic/socket/PosixSocketMuxer.poll(Native Method)
4XESTACKTRACE at weblogic/socket/PosixSocketMuxer.processSockets(PosixSocketMuxer.java:102(Compiled Code))
4XESTACKTRACE at weblogic/socket/SocketReaderRequest.run(SocketReaderRequest.java:29)
4XESTACKTRACE at weblogic/socket/SocketReaderRequest.execute(SocketReaderRequest.java:42)
4XESTACKTRACE at weblogic/kernel/ExecuteThread.execute(ExecuteThread.java:145)
4XESTACKTRACE at weblogic/kernel/ExecuteThread.run(ExecuteThread.java:117)
4. 执行线程只有这两个是running状态,一个做CLOSE(),一个做POLL()。别的都是blocked或者wait状态。
5. 经过metalink查询以及和800支持人员确认,这是Weblogic在AIX的JVM上由来已久的bug,从8.1.4就开始在不同版本间出现。原因是IBM的JVM底层socket实现和weblogic配合问题,需要打patch CR370915_1030GA.jar解决。
1.在weblogic的启动脚本中,找到CLASSPATH一行
2.在CLASSPATH变量的第一位添加补丁jar包
Eg: CLASSPATH=”${CLASSPATH}${CLASSPATHSEP}${MEDREC_WEBLOGIC_CLASSPATH}”
—>
CLASSPATH=/路径/CR370915_1030GA.jar:”${CLASSPATH}${CLASSPATHSEP}${MEDREC_WEBLOGIC_CLASSPATH}”
3.以上操作仅对这个domain起作用,为了对所有domain起作用,可以添加到common/bin/的目录中的commEnv.sh文件中WEBLOGIC_CLASSPATH=最前面
这个bug在weblgoic和IBM的JVM相组合的平台上出现较为普遍,如果出现相关日志信息,基本可以断定需要打CR370915补丁。
更新:我这里的补丁仅仅 for weblogic 10.3.0.0,其它版本的可以自行用Smart Update下载
Patches for WLS 8.x can be found in My Oracle Support. Open the Patches & Updates tab. Search for patch ID 8173442 for the patches for WLS 8.1mp3, 8.1mp4, and 8.1mp5. Search for patch ID 8179792 for the patch for WLS 8.1mp6.
Patches for WLS 9.x and higher can be downloaded from Smart Update using these patch IDs and passcodes:
——————————————
PATCH REPOSITORY INFORMATION
——————————————
WLS Version | Patch ID | Passcode
————+———-+—————-
9.2 | T4DV | 7C7PYV9B
9.2mp1 | HZHQ | PTUYCCSI
9.2mp2 | WJD2 | GU1CW2AB
9.2mp3 | GNLT | 8J9L6Q4Y
10.0 | PMAJ | 9UQ69LLT
10.0mp1 | ITVL | K8RBHQQ2
10.3 | 9YT5 | I1DB5QSV
如果生产机无法联网,可以
我一般用第二种,对于单个补丁快捷方便,SmartUpdate可以单独安装,但是会让你选择应用到哪个BEA的主目录,不同的版本和平台能下的补丁不一样。在Windows平台上当然没有AIX的BEA版本,不过只要自己建个目录,然后拷贝一份register.xml进去就可以了。
备注:
本文转载自:http://www.hashei.me/2009/08/cr370915_in_weblogic10-3_and_jdk1-6.html