最近服务器 一直不太正常,经常cpu占到200%,load到2,随时都有可能报警
通过 top -H -p xx看到
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
27987 admin 25 0 1559m 542m 13m R 100.2 27.0 15995:57 java
27988 admin 25 0 1559m 542m 13m R 100.2 27.0 16027:01 java
看来cpu完全消耗到了2个线程上,通过jstack threaddump,发现
"pool-4-thread-1" prio=10 tid=0x0000000059fa8000 nid=0x6d53 runnable [0x0000000043ce0000..0x0000000043ce0a90]
java.lang.Thread.State: RUNNABLE
at java.util.HashMap.put(HashMap.java:374)
at java.util.HashSet.add(HashSet.java:200)
at com.xx.smsgw.domain.MessagePool.put(MessagePool.java:29)
at com.xx.smsgw.domain.SmsMTQueue.putSmsMT(SmsMTQueue.java:43)
at com.xx.smsgw.smc.mt.MTChannelImpl.put(MTChannelImpl.java:45)
at com.xx.smsgw.smc.route.MTRouteTask.run(MTRouteTask.java:49)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
"pool-4-thread-2" prio=10 tid=0x000000005a621000 nid=0x6d54 runnable [0x00000000455f9000..0x00000000455f9a10]
java.lang.Thread.State: RUNNABLE
at java.util.HashMap.transfer(HashMap.java:484)
at java.util.HashMap.resize(HashMap.java:463)
at java.util.HashMap.addEntry(HashMap.java:755)
at java.util.HashMap.put(HashMap.java:385)
at java.util.HashSet.add(HashSet.java:200)
at com.xx.smsgw.domain.MessagePool.put(MessagePool.java:29)
at com.xx.smsgw.domain.SmsMTQueue.putSmsMT(SmsMTQueue.java:43)
at com.xx.smsgw.smc.mt.MTChannelImpl.put(MTChannelImpl.java:45)
at com.xx.smsgw.smc.route.MTRouteTask.run(MTRouteTask.java:49)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
这就奇怪了,耗在HashMap上了,更奇怪的是我多次threaddump,这个信息完全保持不变,就好像thread僵在那一样
从下面的这篇日志找到答案:
https://jira.jboss.org/jira/browse/JGRP-525
说是并发的HashSet访问导致,解决方案是
我个人认为如果每个线程一个单独的HashSet/HashMap实例也可以解决,关键是要避免在线程不安全的环境中使用HashSet/HashMap