一次线程死锁分析

现像:
服务进程只收消息,无回包,日志正常打印,CPU、内存正常
初步分析是死锁

分析过程
1.ps -ef |grep xxx找出进程号,16207
2.top -H -p 16207查看进程开启的线程sid
3.对每个线程,分别使用
strace -T -tt -e trace=all -p sid
watch pastck sid
如果strace不断有输出,则表明该线程正常处理
结合自己代码中开启的线程的系统调用情况,排除不可能的线程

根据现像,日志正常打印,日志线程的strace正常输出,没问题。
唯有4个主处理线程,strace都是阻塞在系统调用futex,输出如下

09:38:03.493247 futex(0x2577760, FUTEX_WAIT_PRIVATE, 2, NULL^Cstrace: Process 16041 detached

pstack输出分别如下:

root@CenterSvr_LiftTest:~# pstack 16040

16040: ./family_mgt_svr.bin -s family_mgt_svr.xml
(No symbols found in )
(No symbols found in /lib/x86_64-linux-gnu/libdl.so.2)
(No symbols found in /lib/x86_64-linux-gnu/librt.so.1)
(No symbols found in /lib/x86_64-linux-gnu/libz.so.1)
(No symbols found in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
(No symbols found in /lib/x86_64-linux-gnu/libm.so.6)
(No symbols found in /lib/x86_64-linux-gnu/libgcc_s.so.1)
(No symbols found in /lib/x86_64-linux-gnu/libc.so.6)
(No symbols found in /lib64/ld-linux-x86-64.so.2)
0x7ffaf019a26d: _fini + 0x7ffaefa7adcd (2577760, 7ffae3ffdc10)
0x005d3604: _ZN4base18Thread_Mutex_GuardC2ERNS_12Thread_MutexE + 0x2a (567bf8, ce4d00009b0, 7ffac8000c70, 257cf50, 2577760, 4a0691) + 10
0x0058f28c: _ZN12CommTimerMgt3addERKSsjPFiS1_E + 0x32 (7ffac8000c70, 7ffae3ffdcf0, 7ffae3ffdd90, 0, 7ffadc008030, 7ffadc007b80) + 290
0x00487581: _ZN16GetFamilyHandler7processEv + 0xf7b (7ffae3ffe800, 7ffae3ffe7f0, 7ffae3ffe7e0, 7ffae3ffe7d8, 7ffae3ffe7d0, 7ffae3ffe850) + fa0
0x004ae92e: _ZN9Processor3svcEv + 0x3dce (0, 7ffc5df6bc10, 0, 7ffc5df6bc10)
0x005d3757: _ZN4base11thread_procEPv + 0xd7 (0, 7ffae3fff700, 7ffae3fff700, 378e9993de61449d, 0, 7ffc5df6b9ff) + ffff80051c0010b0


root@CenterSvr_LiftTest:~# pstack 16041

16041: ./family_mgt_svr.bin -s family_mgt_svr.xml
(No symbols found in )
(No symbols found in /lib/x86_64-linux-gnu/libdl.so.2)
(No symbols found in /lib/x86_64-linux-gnu/librt.so.1)
(No symbols found in /lib/x86_64-linux-gnu/libz.so.1)
(No symbols found in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
(No symbols found in /lib/x86_64-linux-gnu/libm.so.6)
(No symbols found in /lib/x86_64-linux-gnu/libgcc_s.so.1)
(No symbols found in /lib/x86_64-linux-gnu/libc.so.6)
(No symbols found in /lib64/ld-linux-x86-64.so.2)
0x7ffaf019a26d: _fini + 0x7ffaefa7adcd (2577760, 7ffae96ebd80)
0x005d3604: _ZN4base18Thread_Mutex_GuardC2ERNS_12Thread_MutexE + 0x2a (567bf8, ce4d00009b0, 7ffac8001e30, 257cf50, 2577760, 4a0691) + 10
0x0058f28c: _ZN12CommTimerMgt3addERKSsjPFiS1_E + 0x32 (7ffac8001e30, 7ffae96ebe00, 7ffae96ebe60, 7ffacc001a10, 7ffae96ebf00, 7ffac8001e30) + 120
0x0048b1b4: _ZN20GetMemberListHandler7processEv + 0x400 (7ffae96ec800, 7ffae96ec7f0, 7ffae96ec7e0, 7ffae96ec7d8, 7ffae96ec7d0, 7ffae96ec850) + fa0
0x004afa15: _ZN9Processor3svcEv + 0x4eb5 (0, 7ffc5df6bc10, 0, 7ffc5df6bc10)
0x005d3757: _ZN4base11thread_procEPv + 0xd7 (0, 7ffae96ed700, 7ffae96ed700, 378e9993de61449d, 0, 7ffc5df6b9ff) + ffff8005169130b0


root@CenterSvr_LiftTest:~# pstack 16042

16042: ./family_mgt_svr.bin -s family_mgt_svr.xml
(No symbols found in )
(No symbols found in /lib/x86_64-linux-gnu/libdl.so.2)
(No symbols found in /lib/x86_64-linux-gnu/librt.so.1)
(No symbols found in /lib/x86_64-linux-gnu/libz.so.1)
(No symbols found in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
(No symbols found in /lib/x86_64-linux-gnu/libm.so.6)
(No symbols found in /lib/x86_64-linux-gnu/libgcc_s.so.1)
(No symbols found in /lib/x86_64-linux-gnu/libc.so.6)
(No symbols found in /lib64/ld-linux-x86-64.so.2)
0x7ffaf019a26d: _fini + 0x7ffaefa7adcd (2577760, 7ffae8eeac40)
0x005d3604: _ZN4base18Thread_Mutex_GuardC2ERNS_12Thread_MutexE + 0x2a (567bf8, ce4d00009b0, 7ffac8001090, 257cf50, 2577760, 4a0691) + 10
0x0058f28c: _ZN12CommTimerMgt3addERKSsjPFiS1_E + 0x32 (7ffac8001090, 7ffae8eead00, 7ffae8eeada0, 7ffad4001420, 0, 0) + 260
0x004894a3: _ZN20GetFamilyListHandler7processEv + 0xd7f (7ffae8eeb800, 7ffae8eeb7f0, 7ffae8eeb7e0, 7ffae8eeb7d8, 7ffae8eeb7d0, 7ffae8eeb850) + fa0
0x004aea7e: _ZN9Processor3svcEv + 0x3f1e (0, 7ffc5df6bc10, 0, 7ffc5df6bc10)
0x005d3757: _ZN4base11thread_procEPv + 0xd7 (0, 7ffae8eec700, 7ffae8eec700, 378e9993de61449d, 0, 7ffc5df6b9ff) + ffff8005171140b0


root@CenterSvr_LiftTest:~# pstack 16043

16043: ./family_mgt_svr.bin -s family_mgt_svr.xml
(No symbols found in )
(No symbols found in /lib/x86_64-linux-gnu/libdl.so.2)
(No symbols found in /lib/x86_64-linux-gnu/librt.so.1)
(No symbols found in /lib/x86_64-linux-gnu/libz.so.1)
(No symbols found in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
(No symbols found in /lib/x86_64-linux-gnu/libm.so.6)
(No symbols found in /lib/x86_64-linux-gnu/libgcc_s.so.1)
(No symbols found in /lib/x86_64-linux-gnu/libc.so.6)
(No symbols found in /lib64/ld-linux-x86-64.so.2)
0x7ffaf019a26d: _fini + 0x7ffaefa7adcd (2577760, 7ffae37fcd80)
0x005d3604: _ZN4base18Thread_Mutex_GuardC2ERNS_12Thread_MutexE + 0x2a (567bf8, ce4d00009b0, 7ffac8001160, 257cf50, 2577760, 4a0691) + 10
0x0058f28c: _ZN12CommTimerMgt3addERKSsjPFiS1_E + 0x32 (7ffac8001160, 7ffae37fce00, 7ffae37fce60, 7ffad0018c50, 7ffae37fcf00, 7ffac8001160) + 120
0x0048b1b4: _ZN20GetMemberListHandler7processEv + 0x400 (7ffae37fd800, 7ffae37fd7f0, 7ffae37fd7e0, 7ffae37fd7d8, 7ffae37fd7d0, 7ffae37fd850) + fa0
0x004afa15: _ZN9Processor3svcEv + 0x4eb5 (0, 7ffc5df6bc10, 0, 7ffc5df6bc10)
0x005d3757: _ZN4base11thread_procEPv + 0xd7 (0, 7ffae37fe700, 7ffae37fe700, 378e9993de61449d, 0, 7ffc5df6b9ff) + ffff80051c8020b0

结合pstack的输出,可以得出主线程死锁,而且是通过CommTimerMgt3add接口进入后发生。

4.利用coredump文件分析
gdb attach pid(因为进程没有挂掉)
generate-core-file 或者kill -6 pid
info threads查看所有线程pid。根据我们上面的分析,我们大概知道是哪个线程卡死了。
thread thread_no切换到对应的线程
bt打印出调用栈
到此,基本可以定位问题所在

你可能感兴趣的:(一次线程死锁分析)