Linux服务器之CPU过高解决思路

CPU负载过高,定位思路如下:

1. 先用top命令找出CPU占比最高的

2. ps -ef 或者jps进一步定位,得知是一个怎样的一个后台程序给我们惹事

3. 定位到具体线程或代码

4. 将需要的线程ID转换为16进制格式(英文小写格式)

5. jstack 进程ID | grep tid(16进程线程ID小写英文) -A60


1. 先用top命令找出CPU占比最高的

top - 09:11:37 up 21 min,  3 users,  load average: 0.54, 0.25, 0.16
Tasks:  94 total,   1 running,  93 sleeping,   0 stopped,   0 zombie
%Cpu(s):  3.0 us,  6.4 sy,  0.0 ni, 89.3 id,  0.0 wa,  0.0 hi,  1.3 si,  0.0 st
KiB Mem :   499428 total,    81452 free,   131984 used,   285992 buff/cache
KiB Swap:  1572860 total,  1572860 free,        0 used.   325184 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                                                                                                                              
 2485 root      20   0 2024360  25616  12256 S  8.6  5.1   0:08.54 java                                                                                                                                                                 
 2436 root      20   0  154608   5500   4132 S  1.3  1.1   0:01.50 sshd                                                                                                                                                                 
  580 root      20   0  376240   9256   6804 S  0.3  1.9   0:00.20 NetworkManager                                                                                                                                                       
 1331 root      20   0       0      0      0 S  0.3  0.0   0:00.41 kworker/0:1                                                                                                                                                          
    1 root      20   0  128036   6604   4144 S  0.0  1.3   0:01.48 systemd                                                                                                                                                              
    2 root      20   0       0      0      0 S  0.0  0.0   0:00.00 kthreadd                                                                                                                                                             
    3 root      20   0       0      0      0 S  0.0  0.0   0:00.08 ksoftirqd/0                                                                                                                                                          
    5 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 kworker/0:0H                                                                                                                                                         
    6 root      20   0       0      0      0 S  0.0  0.0   0:00.01 kworker/u2:0                                                                                                                                                         
    7 root      rt   0       0      0      0 S  0.0  0.0   0:00.00 migration/0                                                                                                                                                          
    8 root      20   0       0      0      0 S  0.0  0.0   0:00.00 rcu_bh                                                                                                                                                               
    9 root      20   0       0      0      0 S  0.0  0.0   0:00.53 rcu_sched       

2. ps -ef 或者jps进一步定位,得知是一个怎样的一个后台程序给我们惹事

[root@node3 ~]# jps -l
2516 sun.tools.jps.Jps
2485 com.wu.pratice.jvm.UnableCreateNewThreadDemo
1413 -- process information unavailable
7162 -- process information unavailable
[root@node3 ~]# ps -ef | grep java
root      2485  2440  8 09:09 pts/2    00:00:26 java com.wu.pratice.jvm.UnableCreateNewThreadDemo
root      2527  2495  0 09:14 pts/1    00:00:00 grep --color=auto java

3. 定位到具体线程或代码

ps -mp 进程 -o THREAD,tid,time
-m 显示所有的线程
-p 指定进程id
-o 该参数后是用户自定义格式

[root@node3 ~]# ps -mp 2485 -o THREAD,tid,time
USER     %CPU PRI SCNT WCHAN  USER SYSTEM   TID     TIME
root      8.5   -    - -         -      -     - 00:00:30
root      0.0  19    - futex_    -      -  2485 00:00:00
root      8.3  19    - n_tty_    -      -  2486 00:00:29
root      0.0  19    - futex_    -      -  2487 00:00:00
root      0.0  19    - futex_    -      -  2488 00:00:00
root      0.0  19    - futex_    -      -  2489 00:00:00
root      0.0  19    - futex_    -      -  2490 00:00:00
root      0.0  19    - futex_    -      -  2491 00:00:00
root      0.0  19    - futex_    -      -  2492 00:00:00
root      0.0  19    - futex_    -      -  2493 00:00:00
root      0.0  19    - futex_    -      -  2494 00:00:00

4. 将需要的线程ID转换为16进制格式(英文小写格式)

2486的16进制为9B6

转换方式:printf “%x\n”2486

  注:一定要用英文小写字母,否则监控不到线程代码!!!


5. jstack 进程ID | grep tid(16进程线程ID小写英文) -A60


-A 显示多少行

[root@node3 ~]# jstack 2485 | grep 9b6 -A60
"main" #1 prio=5 os_prio=0 tid=0x00007f165004b800 nid=0x9b6 runnable [0x00007f16590c3000]
   java.lang.Thread.State: RUNNABLE
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:326)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
        - locked <0x00000000fac20580> (a java.io.BufferedOutputStream)
        at java.io.PrintStream.write(PrintStream.java:482)
        - locked <0x00000000fac18170> (a java.io.PrintStream)
        at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
        at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
        at sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:104)
        - locked <0x00000000fac18128> (a java.io.OutputStreamWriter)
        at java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:185)
        at java.io.PrintStream.newLine(PrintStream.java:546)
        - eliminated <0x00000000fac18170> (a java.io.PrintStream)
        at java.io.PrintStream.println(PrintStream.java:807)
        - locked <0x00000000fac18170> (a java.io.PrintStream)
        at com.wu.pratice.jvm.UnableCreateNewThreadDemo.main(UnableCreateNewThreadDemo.java:24)

"VM Thread" os_prio=0 tid=0x00007f16500cb800 nid=0x9b7 runnable 

"VM Periodic Task Thread" os_prio=0 tid=0x00007f165011a000 nid=0x9be waiting on condition 

JNI global references: 5

 总结:

1、对于Java应用而言,一下常见的几个性能问题都可以从线程堆栈入手定位:

  • 系统挂起无响应

  • 系统CPU较高

  • 系统运行的响应时间长

  • 线程死锁等

2、想知道线程是在卖力工作还是偷懒休息,这就需要关注线程的运行状态,常用到的几个线程状态有:RUNNABLE,BLOCKED,WAITING,TIMED_WAITING。

RUNNABLE

从虚拟机的角度看,RUNNABLE状态代表线程正处于运行状态。一般情况下处于运行状态线程是会消耗CPU的,但不是所有的RUNNABLE都会消耗CPU,比如线程进行网络IO时,这时线程状态是挂起的,但由于挂起发生在本地代码,虚拟机并不感知,所以不会像显示调用Java的sleep()或者wait()等方法进入WAITING状态,只有等数据到来时才消耗一点CPU.

TIMED_WAITING/WATING

这两种状态表示线程被挂起,等待被唤醒,当设置超时时间时状态为TIMED_WAITING,如果是未设置超时时间,这时的状态为WATING,必须等待lock.notify()或lock.notifyAll()或接收到interrupt信号才能退出等待状态,TIMED_WAITING/WATING下还需要关注下面几个线程状态:

  • waiting on condition:说明线程等待另一个条件的发生,来把自己唤醒;

  • on object monitor: 说明该线程正在执行obj.wait()方法,放弃了 Monitor,进入 “Wait Set”队列;

BLOCKED

此时的线程处于阻塞状态,一般是在等待进入一个临界区“waiting for monitor entry”,这种状态是需要重点关注的

3、哪些线程状态占用CPU?

处于TIMED_WAITING、WATING、BLOCKED状态的线程是不消耗CPU的,而处于RUNNABLE状态的线程要结合当前线程代码的性质判断是否消耗CPU:

  • 纯java运算代码,并且未被挂起,是消耗CPU的;

  • 网络IO操作,在等待数据时是不消耗CPU的;

你可能感兴趣的:(Linux高级,JUC多线程,Linux,CPU过载)