在日常开发运维中,经常需要对进程及其线程信息进行获取,来排查内存溢出,死锁,阻塞等问题。本文主要是查看进程及其进程堆栈信息
一.PSTREE
pstree [-acGhlnpuUV][-H <程序识别码>][<程序识别码>/<用户名称>
说明:如果不指定程序识别码或用户名称,则会把系统启动时的第一个程序视为基层,并显示之后的所有程序。若指定用户名称,便会以隶属该用户的第一个程序当作基层,然后显示该用户的所有程序。
使用ps命令得到的数据精确,但数据庞大,这一点对掌握系统整体概况来说是不容易的。pstree命令正好可以弥补这个缺憾。它能将当前的执行程序以树状结构显示。pstree命令支持指定特定程序(PID)或使用者(USER)作为显示的起始。
PSTREE(1) User Commands PSTREE(1)
NAME
pstree - display a tree of processes
SYNOPSIS
pstree [-a] [-c] [-h|-Hpid] [-l] [-n] [-p] [-u] [-Z] [-A|-G|-U] [pid|user]
pstree -V
DESCRIPTION
pstree shows running processes as a tree. The tree is rooted at either pid or init if pid is omitted. If a user name is specified, all process trees rooted at processes
owned by that user are shown.
pstree visually merges identical branches by putting them in square brackets and prefixing them with the repetition count, e.g.
init-+-getty
|-getty
|-getty
‘-getty
becomes
init---4*[getty]
Child threads of a process are found under the parent process and are shown with the process name in curly braces, e.g.
icecast2---13*[{icecast2}]
If pstree is called as pstree.x11 then it will prompt the user at the end of the line to press return and will not return until that has happened. This is useful for when
pstree is run in a xterminal.
OPTIONS
-a Show command line arguments. If the command line of a process is swapped out, that process is shown in parentheses. -a implicitly disables compaction.
-A Use ASCII characters to draw the tree.
-c Disable compaction of identical subtrees. By default, subtrees are compacted whenever possible.
-G Use VT100 line drawing characters.
-h Highlight the current process and its ancestors. This is a no-op if the terminal doesn’t support highlighting or if neither the current process nor any of its
ancestors are in the subtree being shown.
-H Like -h, but highlight the specified process instead. Unlike with -h, pstree fails when using -H if highlighting is not available.
-l Display long lines. By default, lines are truncated to the display width or 132 if output is sent to a non-tty or if the display width is unknown.
-n Sort processes with the same ancestor by PID instead of by name. (Numeric sort.)
-p Show PIDs. PIDs are shown as decimal numbers in parentheses after each process name. -p implicitly disables compaction.
-u Show uid transitions. Whenever the uid of a process differs from the uid of its parent, the new uid is shown in parentheses after the process name.
-U Use UTF-8 (Unicode) line drawing characters. Under Linux 1.1-54 and above, UTF-8 mode is entered on the console with echo -e m\033%8 nd left with echo -e
-V Display version information.
-Z (SELinux) Show security context for each process.
FILES
/proc location of the proc file system
AUTHORS
Werner Almesberger Craig Small id.au>
BUGS
Some character sets may be incompatible with the VT100 characters.
SEE ALSO
ps(1), top(1).
Linux 2004-11-09 PSTREE(1)
问题背景
某日,在主机上不行运行java进程,每次运行都返回 “Error occurred during initialization of VM java.lang.OutOfMemoryError”。
首先认为内存资源已满,或者java OPT参数过低,导致内存不足,故free -g查看后返现机器还有110G内存。故排除,修改启动脚本,-Xmx1g由1G修改为2G,问题依旧。
此时怀疑是否为线程资源耗干导致,故通过 pstree -p |wc 获取当前用户运行的线程总数,ulimit -u 获取当前用户的最多可运行线程数,果然线程满了.
此时再通过 pstree -p |more 查找到线程id
一个是id为30265的C进程,一个是id为3637的java进程,其中java进程开启了10540个线程,应该是程序逻辑错误,导致线程溢出。
|-masaike(30265)---masaike(30266)-+-{masaike}(30336)
| |-{masaike}(30339)
| |-{masaike}(30340)
| |-{masaike}(30341)
| |-{masaike}(30342)
| |-{masaike}(30343)
| |-{masaike}(30344)
| |-{masaike}(30345)
| |-{masaike}(30346)
| |-{masaike}(30347)
| |-{masaike}(30348)
| |-{masaike}(30349)
| |-{masaike}(30350)
| |-{masaike}(30351)
| |-{masaike}(30352)
| |-{masaike}(30353)
| |-{masaike}(30354)
| |-{masaike}(30355)
| |-{masaike}(30356)
| |-{masaike}(30357)
| |-{masaike}(30358)
| |-{masaike}(30359)
| |-{masaike}(30360)
| |-{masaike}(30361)
| |-{masaike}(30362)
| |-{masaike}(30382)
| |-{masaike}(30383)
| |-{masaike}(30384)
| |-{masaike}(30385)
| |-{masaike}(30386)
| |-{masaike}(30391)
| |-{masaike}(30392)
| |-{masaike}(30393)
| |-{masaike}(30394)
| |-{masaike}(30395)
| |-{masaike}(30396)
| |-{masaike}(30397)
| |-{masaike}(30398)
| |-{masaike}(30399)
| |-{masaike}(30400)
| |-{masaike}(30401)
| |-{masaike}(30402)
| |-{masaike}(30403)
| |-{masaike}(30404)
| |-{masaike}(30405)
| |-{masaike}(30406)
| |-{masaike}(30407)
| |-{masaike}(30408)
| |-{masaike}(30409)
| |-{masaike}(30410)
| |-{masaike}(30411)
| |-{masaike}(30412)
| |-{masaike}(30413)
| |-{masaike}(30414)
| |-{masaike}(30415)
| |-{masaike}(30416)
| |-{masaike}(30417)
| |-{masaike}(30418)
| |-{masaike}(30419)
| |-{masaike}(30420)
| |-{masaike}(30421)
| |-{masaike}(30422)
| |-{masaike}(30423)
| |-{masaike}(30424)
| |-{masaike}(30425)
| |-{masaike}(30426)
| |-{masaike}(30427)
-java(3637)-+-{java}(3638)
| |-{java}(3639)
| |-{java}(3640)
| |-{java}(3641)
| |-{java}(3642)
| |-{java}(3643)
| |-{java}(3644)
| |-{java}(3645)
| |-{java}(3646)
| |-{java}(3647)
| |-{java}(3648)
| |-{java}(3649)
| |-{java}(3650)
| |-{java}(3651)
| |-{java}(3652)
| |-{java}(3653)
| |-{java}(3654)
| |-{java}(3655)
| |-{java}(3656)
| |-{java}(3657)
| |-{java}(3658)
| |-{java}(3659)
| |-{java}(3660)
| |-{java}(3661)
| |-{java}(3662)
| |-{java}(3663)
| |-{java}(3664)
| |-{java}(3665)
| |-{java}(3666)
| |-{java}(3667)
| |-{java}(3668)
| |-{java}(3669)
| |-{java}(3670)
| |-{java}(3671)
| |-{java}(3678)
| |-{java}(3679)
| |-{java}(3680)
| |-{java}(3681)
| |-{java}(3682)
| |-{java}(3683)
| |-{java}(3691)
| |-{java}(3692)
| |-{java}(3693)
| |-{java}(3695)
| |-{java}(3696)
| |-{java}(3697)
| |-{java}(3698)
| |-{java}(3699)
| |-{java}(3701)
| |-{java}(3702)
| |-{java}(3703)
| |-{java}(3704)
| |-{java}(3705)
| |-{java}(3706)
| |-{java}(3707)
| |-{java}(3708)
| |-{java}(3709)
| |-{java}(3710)
获得进程PID之后我们来获取具体线程堆栈信息,其中C进程使用pstack查看线程堆栈信息,这里不进行逻辑上细致分析,java进程采用jstack查看线程堆栈信息.
pstack 30265
Thread 7 (Thread 1084229984 (LWP 4552)): #0 0x000000302afc63dc in epoll_wait () from /lib64/tls/libc.so.6 #1 0x00000000006f0730 in ub::EPollEx::poll () #2 0x00000000006f172a in ub::NetReactor::callback () #3 0x00000000006fbbbb in ub::UBTask::CALLBACK () #4 0x000000302b80610a in start_thread () from /lib64/tls/libpthread.so.0 #5 0x000000302afc6003 in clone () from /lib64/tls/libc.so.6 #6 0x0000000000000000 in ?? ()
Thread 6 (Thread 1094719840 (LWP 4553)): #0 0x000000302afc63dc in epoll_wait () from /lib64/tls/libc.so.6 #1 0x00000000006f0730 in ub::EPollEx::poll () #2 0x00000000006f172a in ub::NetReactor::callback () #3 0x00000000006fbbbb in ub::UBTask::CALLBACK () #4 0x000000302b80610a in start_thread () from /lib64/tls/libpthread.so.0 #5 0x000000302afc6003 in clone () from /lib64/tls/libc.so.6 #6 0x0000000000000000 in ?? ()
Thread 5 (Thread 1105209696 (LWP 4554)): #0 0x000000302b80baa5 in __nanosleep_nocancel () #1 0x000000000079e758 in comcm::ms_sleep () #2 0x00000000006c8581 in ub::UbClientManager::healthyCheck () #3 0x00000000006c8471 in ub::UbClientManager::start_healthy_check () #4 0x000000302b80610a in start_thread () from /lib64/tls/libpthread.so.0 #5 0x000000302afc6003 in clone () from /lib64/tls/libc.so.6 #6 0x0000000000000000 in ?? ()
Thread 4 (Thread 1115699552 (LWP 4555)): #0 0x000000302b80baa5 in __nanosleep_nocancel () #1 0x0000000000482b0e in armor::armor_check_thread () #2 0x000000302b80610a in start_thread () from /lib64/tls/libpthread.so.0 #3 0x000000302afc6003 in clone () from /lib64/tls/libc.so.6 #4 0x0000000000000000 in ?? ()
Thread 3 (Thread 1126189408 (LWP 4556)): #0 0x000000302af8f1a5 in __nanosleep_nocancel () from /lib64/tls/libc.so.6 #1 0x000000302af8f010 in sleep () from /lib64/tls/libc.so.6 #2 0x000000000044c972 in Business_config_manager::run () #3 0x0000000000457b83 in Thread::run_thread () #4 0x000000302b80610a in start_thread () from /lib64/tls/libpthread.so.0 #5 0x000000302afc6003 in clone () from /lib64/tls/libc.so.6 #6 0x0000000000000000 in ?? ()
Thread 2 (Thread 1136679264 (LWP 4557)): #0 0x000000302af8f1a5 in __nanosleep_nocancel () from /lib64/tls/libc.so.6 #1 0x000000302af8f010 in sleep () from /lib64/tls/libc.so.6 #2 0x00000000004524bb in Process_thread::sleep_period () #3 0x0000000000452641 in Process_thread::run () #4 0x0000000000457b83 in Thread::run_thread () #5 0x000000302b80610a in start_thread () from /lib64/tls/libpthread.so.0 #6 0x000000302afc6003 in clone () from /lib64/tls/libc.so.6 #7 0x0000000000000000 in ?? ()
Thread 1 (Thread 182894129792 (LWP 4551)): #0 0x000000302af8f1a5 in __nanosleep_nocancel () from /lib64/tls/libc.so.6 #1 0x000000302af8f010 in sleep () from /lib64/tls/libc.so.6 #2 0x0000000000420d79 in Ad_preprocess::run () #3 0x0000000000450ad0 in main ()
jstack
Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.45-b01 mixed mode):
"Attach Listener" daemon prio=10 tid=0x0000000054995000 nid=0x69b6 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"p: default-threadpool; w: Idle" daemon prio=10 tid=0x00002aaab02b5800 nid=0x7f11 in Object.wait() [0x0000000040dbe000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x000000078000d188> (a com.sun.corba.se.impl.orbutil.threadpool.WorkQueueImpl)
at com.sun.corba.se.impl.orbutil.threadpool.WorkQueueImpl.requestWork(WorkQueueImpl.java:121)
- locked <0x000000078000d188> (a com.sun.corba.se.impl.orbutil.threadpool.WorkQueueImpl)
at com.sun.corba.se.impl.orbutil.threadpool.ThreadPoolImpl$WorkerThread.run(ThreadPoolImpl.java:484)
"MultiThreadedHttpConnectionManager cleanup" daemon prio=10 tid=0x0000000054c58000 nid=0x7b6a in Object.wait() [0x00000000439e1000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x0000000780004078> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
- locked <0x0000000780004078> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ReferenceQueueThread.run(MultiThreadedHttpConnectionManager.java:1122)
"Thread-6" prio=10 tid=0x00002aaab0753000 nid=0x7b47 waiting on condition [0x00000000438e0000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000780000108> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
at com.linkage.serv.hwcall.HWCallMessageSender$1.run(HWCallMessageSender.java:33)
"Thread-4" daemon prio=10 tid=0x0000000054aa8800 nid=0x7b45 waiting on condition [0x00000000437df000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at com.linkage.system.utils.corba.CorbaService$NSHeartbeat.run(CorbaService.java:793)
"p: default-threadpool; w: Idle" daemon prio=10 tid=0x00002aaab029e800 nid=0x7b44 in Object.wait() [0x00000000436de000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x000000078000d188> (a com.sun.corba.se.impl.orbutil.threadpool.WorkQueueImpl)
at com.sun.corba.se.impl.orbutil.threadpool.WorkQueueImpl.requestWork(WorkQueueImpl.java:121)
- locked <0x000000078000d188> (a com.sun.corba.se.impl.orbutil.threadpool.WorkQueueImpl)
at com.sun.corba.se.impl.orbutil.threadpool.ThreadPoolImpl$WorkerThread.run(ThreadPoolImpl.java:484)
时间有限,不赘述。