JVM jstack命令用法及相关知识小结

前言

很久没有写过与JVM相关的文章了。今天搬砖有点累,不太想啃源码,写一篇实用性比较强的吧。

在日常工作中,我们如果遇到JVM方面的问题,一般是采用各种现成的工具辅助定位解决,如VisualVM、JProfiler、Eclipse MAT、Arthas等。但是,我们也有必要了解JVM原生提供的那些命令行工具,本文就针对常用的jstack命令做个简单的总结,顺便聊一些与Java线程与并发相关的周边知识。今后有时间的话,也会将jmap、jhat、jstat等命令的用法总结出来。

jstack简介

jstack命令用来生成JVM中的线程快照(thread dump),其中包含有每个线程的方法调用栈以及其状态、锁信息等。其用法说明如下所示。

~ jstack -h
Usage:
    jstack [-l] 
        (to connect to running process)
    jstack -F [-m] [-l] 
        (to connect to a hung process)
    jstack [-m] [-l]  
        (to connect to a core file)
    jstack [-m] [-l] [server_id@]
        (to connect to a remote debug server)

Options:
    -F  to force a thread dump. Use when jstack  does not respond (process is hung)
    -m  to print both java and native frames (mixed mode)
    -l  long listing. Prints additional information about locks
    -h or -help to print this help message

说明一下三个参数的含义:

  • -F:如果正常执行jstack命令没有响应(比如进程hung住了),可以加上此参数强制执行thread dump。
  • -m:除了打印Java的方法调用栈之外,还会输出native方法的栈帧。
  • -l:打印与锁有关的附加信息。使用此参数会导致JVM停止时间变长,在生产环境需慎用。

jstack是在线程级别定位JVM问题的利器,但前提是得读懂thread dump,我们举例说明。

线程快照

下面是对一个正常运行的Spark Streaming作业执行jstack命令产生的线程快照的一部分。

~ jstack 18747
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.172-b11 mixed mode):

"SparkUI-1510250" #1510250 daemon prio=5 os_prio=0 tid=0x00007f9a6c00c800 nid=0x45cc waiting on condition [0x00007f9ce86e7000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000c0420db8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
        at org.spark_project.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:392)
        at org.spark_project.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:563)
        at org.spark_project.jetty.util.thread.QueuedThreadPool.access$800(QueuedThreadPool.java:48)
        at org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:626)
        at java.lang.Thread.run(Thread.java:748)

"shuffle-server-6-7" #190 daemon prio=5 os_prio=0 tid=0x00007f9b44009000 nid=0x4d80 runnable [0x00007f9ce99f8000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
        - locked <0x00000000c58e6498> (a io.netty.channel.nio.SelectedSelectionKeySet)
        - locked <0x00000000c59c1528> (a java.util.Collections$UnmodifiableSet)
        - locked <0x00000000c59c1450> (a sun.nio.ch.EPollSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
        at io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:62)
        at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:753)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:409)
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
        at java.lang.Thread.run(Thread.java:748)

"SparkUI-84-acceptor-3@3b331d23-ServerConnector@9826a7d{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}" #84 daemon prio=3 os_prio=0 tid=0x00007f9d7decc800 nid=0x4500 waiting for monitor entry [0x00007f9d112c8000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:234)
        - waiting to lock <0x00000000c045e868> (a java.lang.Object)
        at org.spark_project.jetty.server.ServerConnector.accept(ServerConnector.java:373)
        at org.spark_project.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:593)
        at org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
        at org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
        at java.lang.Thread.run(Thread.java:748)

"org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner" #25 daemon prio=5 os_prio=0 tid=0x00007f9d7d8da000 nid=0x44c2 in Object.wait() [0x00007f9d19dc0000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
        - locked <0x00000000c0031c98> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165)
        at org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner.run(FileSystem.java:3213)
        at java.lang.Thread.run(Thread.java:748)

"RecurringTimer - JobGenerator" #120 daemon prio=5 os_prio=0 tid=0x00007f9b04045000 nid=0x4cc3 waiting on condition [0x00007f9d10cd4000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.apache.spark.util.SystemClock.waitTillTime(Clock.scala:63)
        at org.apache.spark.streaming.util.RecurringTimer.triggerActionForNextInterval(RecurringTimer.scala:93)
        at org.apache.spark.streaming.util.RecurringTimer.org$apache$spark$streaming$util$RecurringTimer$$loop(RecurringTimer.scala:106)
        at org.apache.spark.streaming.util.RecurringTimer$$anon$1.run(RecurringTimer.scala:29)

在每个线程的快照的第一行,包含有线程名、是否为守护线程、优先级、线程ID等信息,第二行则是线程状态,下面就是方法调用栈了。下图是Java线程状态转换的示意,老生常谈。

jstack线程快照中的状态与图示相同,只是没有NEW状态而已。我们逐一进行分析,在分析之前,先放出Java管程对象ObjectMonitor的简图。看官也可以通过我之前写的这篇文章来了解管程。

RUNNABLE

线程正在运行。如果在其调用栈中看到locked <地址>的提示,表示该方法正持有锁,即该线程位于Owner区内。

BLOCKED

线程处于阻塞状态,即正在等待锁被其他线程释放。在其调用栈的栈顶方法会看到waiting to lock <地址>的提示,表示该方法试图持有锁,线程正在Entry Set区等待。

WAITING

线程处于无限等待的状态。又分为两种情况:

  • on object monitor:线程已经获得锁,调用了不带超时参数的Object.wait()/Thread.join()方法,线程进入管程的Wait Set区。在其调用栈中会看到locked <地址>的提示。
  • parking:调用了LockSupport.park()方法,线程直接进入挂起状态(park是Unsafe提供的低级原语)。在其调用栈的栈顶方法会看到parking to wait for <地址>的提示。
TIMED_WAITING

线程处于有限等待的状态。它分为三种情况,除了与WAITING相同的on object monitor(获得锁并调用带超时的Object.wait()/Thread.join()方法)和parking(调用带超时的LockSupport.parkNanos()/parkUntil()方法)之外,还有一种sleep,即通过Thread.sleep()使线程睡眠。

通过分析线程快照的状态和调用栈,可以让我们快速地定位造成Java程序表现异常的症结,如死锁、热锁(很多线程竞争同一块临界区造成大量BLOCKED)、高CPU占用、I/O长时间阻塞(注意此时线程状态可能是RUNNABLE)等。下面举两个具体的例子。

用jstack诊断死锁

死锁(deadlock)是操作系统理论中的基础概念,即在并发环境下,一个或多个线程在等待资源,但该资源又被其他进程所占用的困局。死锁的四个必要条件是:

  • 互斥(mutual exclusion)
  • 不可抢占(no preemption)
  • 持有并等待(hold and wait)
  • 循环等待(circular wait)

下面我们用Java造一个死锁,并用jstack来诊断它。

public class DeadlockExample {
  private static final Object lock1 = new Object();
  private static final Object lock2 = new Object();

  public static void main(String[] args) throws Exception {
    new Thread(() -> {
      for (int i = 0; i < 100; i++) {
        synchronized (lock1) {
          System.out.println("thread1 synchronized lock1");
          synchronized (lock2) {
            System.out.println("thread1 synchronized lock2");
          }
        }
      }
    }, "thread1").start();

    new Thread(() -> {
      for (int i = 0; i < 100; i++) {
        synchronized (lock2) {
          System.out.println("thread2 synchronized lock2");
          synchronized (lock1) {
            System.out.println("thread2 synchronized lock1");
          }
        }
      }
    }, "thread2").start();
  }
}

运行之,发现只输出了几句就停止了。

thread1 synchronized lock1
thread1 synchronized lock2
thread1 synchronized lock1
thread1 synchronized lock2
thread1 synchronized lock1
thread2 synchronized lock2

用jstack打印线程快照,节选部分如下。

"thread2" #20 prio=5 os_prio=31 tid=0x00007fad74020800 nid=0x6203 waiting for monitor entry [0x0000700006364000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at me.lmagics.DeadlockExample.lambda$main$1(DeadlockExample.java:28)
    - waiting to lock <0x00000007157d2a58> (a java.lang.Object)
    - locked <0x00000007157d2a68> (a java.lang.Object)
    at me.lmagics.DeadlockExample$$Lambda$2/501263526.run(Unknown Source)
    at java.lang.Thread.run(Thread.java:748)

"thread1" #19 prio=5 os_prio=31 tid=0x00007fad7401c000 nid=0x9903 waiting for monitor entry [0x0000700006261000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at me.lmagics.DeadlockExample.lambda$main$0(DeadlockExample.java:17)
    - waiting to lock <0x00000007157d2a68> (a java.lang.Object)
    - locked <0x00000007157d2a58> (a java.lang.Object)
    at me.lmagics.DeadlockExample$$Lambda$1/1394438858.run(Unknown Source)
    at java.lang.Thread.run(Thread.java:748)

Found one Java-level deadlock:
=============================
"thread2":
  waiting to lock monitor 0x00007fad65004168 (object 0x00000007157d2a58, a java.lang.Object),
  which is held by "thread1"
"thread1":
  waiting to lock monitor 0x00007fad650056b8 (object 0x00000007157d2a68, a java.lang.Object),
  which is held by "thread2"

Java stack information for the threads listed above:
===================================================
"thread2":
    at me.lmagics.DeadlockExample.lambda$main$1(DeadlockExample.java:28)
    - waiting to lock <0x00000007157d2a58> (a java.lang.Object)
    - locked <0x00000007157d2a68> (a java.lang.Object)
    at me.lmagics.DeadlockExample$$Lambda$2/501263526.run(Unknown Source)
    at java.lang.Thread.run(Thread.java:748)
"thread1":
    at me.lmagics.DeadlockExample.lambda$main$0(DeadlockExample.java:17)
    - waiting to lock <0x00000007157d2a68> (a java.lang.Object)
    - locked <0x00000007157d2a58> (a java.lang.Object)
    at me.lmagics.DeadlockExample$$Lambda$1/1394438858.run(Unknown Source)
    at java.lang.Thread.run(Thread.java:748)

Found 1 deadlock.

可见,我们不仅能够发现两个线程都处于BLOCKED状态,并且jstack还直接给出了死锁的详细信息,方便我们修改代码消除死锁。

用jstack诊断高CPU占用

接下来造一个有死循环的程序,模拟异常的CPU占用。

public class InfiniteLoopExample {
  private static final Object lock = new Object();

  static class InfiniteLoopRunnable implements Runnable {
    @Override
    public void run() {
      synchronized (lock) {
        long l = 0;
        while (true) {
          l++;
        }
      }
    }
  }

  public static void main(String[] args) throws Exception {
    new Thread(new InfiniteLoopRunnable(), "thread1").start();
    new Thread(new InfiniteLoopRunnable(), "thread2").start();
  }
}

运行该程序,用jps命令找出该Java进程的PID,然后利用top -Hp 命令找出吃CPU最多的那个线程。

使用jstack导出线程快照到文件中。由于线程ID是十六进制表示的,所以我们要将线程ID转换成十六进制再grep。

这下我们就可以定位到异常代码的位置,并进行修改了。

The End

晚安晚安。

你可能感兴趣的:(JVM jstack命令用法及相关知识小结)