Jboss服务器down机,从log上找不出任何问题,于是分析了一下dump文件。原来是cpu过高导致的。
top - 13:12:55 up 162 days, 3:05, 3 users, load average: 4.59, 5.35, 5.47 Tasks: 606 total, 5 running, 601 sleeping, 0 stopped, 0 zombie Cpu(s): 98.8%us, 0.7%sy, 0.0%ni, 0.2%id, 0.0%wa, 0.0%hi, 0.2%si, 0.0%st Mem: 32959248k total, 26160344k used, 6798904k free, 792072k buffers Swap: 33551744k total, 0k used, 33551744k free, 7550464k cached
TOP文件显示,down机时有4个thead 的cpu很高。
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 17321 msp 25 0 15.4g 13g 27m R 95.0 44.0 17898:13 /opt/msp/pkg/sunjdk6_25/bin/java -D[Standalone] -server -XX:+UseCo 17704 msp 25 0 15.4g 13g 27m R 94.0 44.0 904:29.49 /opt/msp/pkg/sunjdk6_25/bin/java -D[Standalone] -server -XX:+UseCo 17697 msp 25 0 15.4g 13g 27m R 93.6 44.0 662:31.68 /opt/msp/pkg/sunjdk6_25/bin/java -D[Standalone] -server -XX:+UseCo 17699 msp 25 0 15.4g 13g 27m R 92.7 44.0 127:45.35 /opt/msp/pkg/sunjdk6_25/bin/java -D[Standalone] -server -XX:+UseCo
PID=17321
"ApplePushNotificationReader" prio=10 tid=0x00002aeaa858a800 nid=0x43a9 runnable [0x00002aeaa29e3000] java.lang.Thread.State: RUNNABLE at java.util.Formatter$FormatSpecifier.print(Formatter.java:2821) at java.util.Formatter$FormatSpecifier.printString(Formatter.java:2794) at java.util.Formatter$FormatSpecifier.print(Formatter.java:2677) at java.util.Formatter.format(Formatter.java:2433) at java.util.Formatter.format(Formatter.java:2367) at java.lang.String.format(String.java:2769) at com.asurion.parasol.logging.log4j.Log4jLogger.debug(Log4jLogger.java:151) at com.asurion.parasol.avalonia.apn.impl.ApplePushNotificationServiceImpl$ReaderThread.run(ApplePushNotificationServiceImpl.java:322) Locked ownable synchronizers: - None
PID=17704,17697,17699:
"Remoting "chtsydap02:MANAGEMENT" task-1" prio=10 tid=0x000000004598b800 nid=0x4521 runnable [0x00002aeac7e8d000]
java.lang.Thread.State: RUNNABLE
at java.util.HashMap.put(HashMap.java:374)
at org.jboss.remoting3.jmx.RemotingConnectorServer.connectionOpened(RemotingConnectorServer.java:156)
at org.jboss.remoting3.jmx.protocol.v1.ServerProxy.start(ServerProxy.java:167)
at org.jboss.remoting3.jmx.protocol.v1.VersionOne.getProxy(VersionOne.java:55)
at org.jboss.remoting3.jmx.protocol.Versions.getVersionedProxy(Versions.java:66)
at org.jboss.remoting3.jmx.RemotingConnectorServer$ClientVersionReceiver.handleMessage(RemotingConnectorServer.java:244)
at org.jboss.remoting3.remote.RemoteConnectionChannel$5.run(RemoteConnectionChannel.java:435)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Locked ownable synchronizers:
- 0x000000054b70e5d8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
以上三个threand,在down机的时候,正运行到java.util.HashMap.put(HashMap.java:374)。
Jboss的代码:
Google一下这个问题: java.util.hashmap infinite loop
原因是Map并非线程安全的,多线程并发,会造成Map内无限循环。
org.jboss.remoting3.jmx.RemotingConnectorServer{
private final Map<String, VersionedProxy> registeredConnections = new HashMap<String, VersionedProxy>();
public void connectionOpened(final VersionedProxy proxy) {
String connectionId = proxy.getConnectionId();
log.debugf("Connection '%s' now opened.", connectionId);
registeredConnections.put(connectionId, proxy); // line 156
connectionOpened(connectionId, "", null);
}
}
可以把Jboss这个模块的Map改为线程安全的,但改动第三方应用,回归测试的范围不好控制。只能先重启解决。