先说说linux下怎么处理
1、使用top命令查看CPU使用情况
top - 17:07:58 up 2 days, 7:36, 4 users, load average: 1.49, 0.45, 0.20
Tasks: 115 total, 2 running, 105 sleeping, 8 stopped, 0 zombie
Cpu(s): 83.7%us, 2.0%sy, 0.0%ni, 14.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1035160k total, 976428k used, 58732k free, 43772k buffers
Swap: 2104472k total, 1138312k used, 966160k free, 289436k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2902 test 15 0 403m 146m 9160 S 83.0 14.5 3:17.50 java
2386 root 17 0 1860 480 464 S 0.3 0.0 0:46.27 hald-addon-stor
2、可以看出,这是一个java进程,属于test用户,使用ps命令可以查看该进程的具体信息
ps -ef|grep 2902
test@linux-nc6d:~> ps -ef|grep 2902
test 2902 1 1 12:26 ? 00:03:20 java -Test -Xms256m -Xmx256m -jar ./plugins/org.eclipse.equinox.launcher_1.3.0.v20130327-1440.jar -clean -refresh
test 17340 9586 0 17:12 pts/6 00:00:00 grep 2902
3、现在已经确定了是2902这个进程占用了大量CPU,继续查看该问题是由2902进程中的哪个线程导致
top -Hp 2902
top - 17:20:19 up 2 days, 7:48, 4 users, load average: 0.38, 0.66, 0.47
Tasks: 46 total, 0 running, 46 sleeping, 0 stopped, 0 zombie
Cpu(s): 82.0%us, 1.7%sy, 0.0%ni, 15.7%id, 0.3%wa, 0.3%hi, 0.0%si, 0.0%st
Mem: 1035160k total, 941700k used, 93460k free, 65152k buffers
Swap: 2104472k total, 1149404k used, 955068k free, 244544k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
17797 test 15 0 403m 146m 9160 S 83.0 14.5 0:02.67 java
2906 test 16 0 403m 146m 9160 S 0.3 14.5 0:06.96 java
2902 test 15 0 403m 146m 9160 S 0.0 14.5 0:00.00 java
2903 test 16 0 403m 146m 9160 S 0.0 14.5 0:01.09 java
2907 test 16 0 403m 146m 9160 S 0.0 14.5 0:00.60 java
2908 test 16 0 403m 146m 9160 S 0.0 14.5 0:00.53 java
4、可以看到17797线程占用了几乎全部的CPU资源, 17797在干什么,让打开堆栈一探究竟
使用jstack拿到进程的执行情况,再把十进制的17797换算成16进制,即4585(window计算器调整为程序员模式,自由换算)
5、现在,从我们拿到的2902.txt中搜索关键字“4585”,线程堆栈信息到手~
jstack -l 2902 > 2902.txt
"Worker of StableTimer" daemon prio=10 tid=0x9f801000 nid=0x4585 waiting on condition [0x9fece000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0xa6b51090> (a java.util.concurrent.Semaphore$NonfairSync)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
at java.util.concurrent.Semaphore.tryAcquire(Semaphore.java:588)
at test.StableTimer$2.run(StableTimer.java:87)
at java.lang.Thread.run(Thread.java:745)
6、根据堆栈信息,去应用中查找问题吧,没准是个死循环~
另外如果拿到的线程信息是与GC相关的,那么恭喜,有可能该问题由内存泄露导致的,淡定的去分析内存对象吧。
下面继续谈谈AIX下出现这种问题时该怎么办
解决思路基本一致
1、使用topas命令查看cpu占用很高
Name PID CPU% PgSp Owner
java 6029454 75.1 172.2 abs
topas 5505138 0.0 4.4 abs
topas 13500534 0.0 4.5 abs
2、此时使用kill -3 6029454 会在abs目录下生成javacore.20131213.085135.8978506.0004.txt文件
3、使用ps -mp 6029454 -o THREAD 查看abs进程中占用cpu最高的线程(CP列(表示CPU占用率)),如下的18612279 25755729
USER PID PPID TID S CP PRI SC WCHAN F TT BND COMMAND
abs 8978506 1 - A 442 60 91 * 242001 - -1 java -Dabs_node=CS1 -Xms1024m -Xmx1024m -verbose:gc -Xverbosegclog:/abslog/fjnxcslog1/gclog/20131212171643.gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintHeapAtGC -XX:+PrintHeapAtGCExtended -XX:+HeapDumpOnOutOfMemoryError -XX:PermSize=512m -XX:MaxPermSize=512m -Xdump:heap:events=user -jar ./plugins/org.eclipse.equinox.launcher_1.2.0.v20110502.jar -clean -refresh
- - - 13172911 S 0 82 1 f1000f0a1000c940 8410400 - -1 -
- - - 17760291 R 2 83 0 - 400000 - -1 -
- - - 18612279 S 57 120 1 f1000f0a10011c40 8410400 - -1 -
- - - 24772643 S 0 78 1 f1000f0a10017a40 8410400 - -1 -
- - - 25755729 S 57 60 1 f1000f0a10018940 8410400 - -1 -
4、使用Integer.toHexString(18612279)把执行线程转换为16进制 11c0037
5、在javacore文件中查找11c0037,即可搜索到对应堆栈信息,可以看出本次问题出现在T671013.java:79行。找到T671013.java后即可查找到占用原因
3XMTHREADINFO "Acceptor0.Processor2.Worker16" J9VMThread:0x00000100163C4300, j9thread_t:0x00000100135046E0, java/lang/Thread:0x07000000014C34D0, state:CW, prio=5
3XMJAVALTHREAD (java/lang/Thread getId:0x1F5, isDaemon:false)
3XMTHREADINFO1 (native thread ID:0x11C0037, native priority:0x5, native policy:UNKNOWN)
3XMTHREADINFO3 Java callstack:
4XESTACKTRACE at trade/zhqy/t671013/T671013.tradeInit(T671013.java:79(Compiled Code))
4XESTACKTRACE at lib/controller/tradeHandler/TransactHandler.tradeInit(TransactHandler.java:33)
4XESTACKTRACE at lib/controller/common/Common.tradeInit(Common.java:21)