模型预测core dump追查

因为有时候core dump不发生在主线程,所以gdb的时候要切换到相应线程查看其frame
1.记录报错的时候打印的线程地址 (140334385465088)

Thread [140334385465088] Forwarding right_sbinary_fc,
*** Aborted at 1545984713 (unix time) try "date -d @1545984713" if you are using GNU date ***
PC: @                0x0 (unknown)
*** SIGSEGV (@0x7fa1f01da600) received by PID 7017 (TID 0x7fa225320700) from PID 18446744073443059200; stack trace: ***
    @     0x7fa23530d160 (unknown)

2.gdb中 info threads查看所有线程

(gdb) info threads
  Id   Target Id         Frame
  23   Thread 0x7fa11ce49700 (LWP 10682) 0x00007fa23215afc9 in syscall () from /opt/compiler/gcc-4.8.2/lib/libc.so.6
  22   Thread 0x7fa217fff700 (LWP 7031) 0x00007fa23215f753 in epoll_wait () from /opt/compiler/gcc-4.8.2/lib/libc.so.6
  21   Thread 0x7fa226722700 (LWP 7028) 0x00007fa2320f4414 in malloc () from /opt/compiler/gcc-4.8.2/lib/libc.so.6
  20   Thread 0x7fa22bfff700 (LWP 7025) 0x00007fa23215afc9 in syscall () from /opt/compiler/gcc-4.8.2/lib/libc.so.6
  19   Thread 0x7fa228525700 (LWP 7024) 0x00007fa23530979e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /opt/compiler/gcc-4.8.2/lib/libpthread.so.0
  18   Thread 0x7fa2157fb700 (LWP 7038) 0x00007fa23212e9cd in nanosleep () from /opt/compiler/gcc-4.8.2/lib/libc.so.6
  17   Thread 0x7fa22a493700 (LWP 7021) 0x00007fa23530979e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /opt/compiler/gcc-4.8.2/lib/libpthread.so.0
  16   Thread 0x7fa11d84a700 (LWP 10681) 0x00007fa23215afc9 in syscall () from /opt/compiler/gcc-4.8.2/lib/libc.so.6
  15   Thread 0x7fa230c92700 (LWP 7019) 0x00007fa23530979e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /opt/compiler/gcc-4.8.2/lib/libpthread.so.0
  14   Thread 0x7fa229a92700 (LWP 7022) 0x00007fa23530979e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /opt/compiler/gcc-4.8.2/lib/libpthread.so.0
  13   Thread 0x7fa228f26700 (LWP 7023) 0x00007fa23530979e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /opt/compiler/gcc-4.8.2/lib/libpthread.so.0
  12   Thread 0x7fa216bfd700 (LWP 7033) 0x00007fa2320f1fe7 in _int_malloc () from /opt/compiler/gcc-4.8.2/lib/libc.so.6
  11   Thread 0x7fa22afff700 (LWP 7020) 0x00007fa23530979e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /opt/compiler/gcc-4.8.2/lib/libpthread.so.0
  10   Thread 0x7fa231e69700 (LWP 7018) 0x00007fa23212e9cd in nanosleep () from /opt/compiler/gcc-4.8.2/lib/libc.so.6
  9    Thread 0x7fa225320700 (LWP 7030) 0x00007fa2320ad6a0 in sigprocmask () from /opt/compiler/gcc-4.8.2/lib/libc.so.6
  8    Thread 0x7fa227123700 (LWP 7027) 0x00007fa23215afc9 in syscall () from /opt/compiler/gcc-4.8.2/lib/libc.so.6
  7    Thread 0x7fa225d21700 (LWP 7029) 0x00007fa23215afc9 in syscall () from /opt/compiler/gcc-4.8.2/lib/libc.so.6
  6    Thread 0x7fa227b24700 (LWP 7026) ul_utf8_to_gbk (pDest=0x7fa0e090804a "", nDest=32768,
    pSrc=0x7fa1187c3ee8 "{\"keywords\":[],\"title\":\"\346\242\246\345\271\273\350\245\277\346\270\270\357\274\232\344\270\244\344\270\252\345\233\233\346\212\200\350\203\275\345\256\235\345\256\235\345\220\210\345\256\240\357\274\214\345\261\205\347\204\266\345\220\210\345\207\272\346\236\201\345\223\201\357\274\214\347\275\221\345\217\213\357\274\232\347\213\227\346\211\230\346\227\240\347\226\221\357\274\201\",\"fircate\":\"\346\270\270\346\210\217\",\"cls\":[[\"1116\",0.867,\"1116@\"],[\"111\",0.1,\"111@\"],[\"88\",0.028,\"88@"...) at ul_gbk.inl:7757
  5    Thread 0x7fa09f61e700 (LWP 10683) 0x00007fa2320f4414 in malloc () from /opt/compiler/gcc-4.8.2/lib/libc.so.6
  4    Thread 0x7fa2175fe700 (LWP 7032) 0x00007fa2320f0e35 in malloc_consolidate () from /opt/compiler/gcc-4.8.2/lib/libc.so.6
  3    Thread 0x7fa2161fc700 (LWP 7034) 0x00000000004cd33a in operator() (this=0x7fa20c01eb00, __den=47, __num=16758873455973868894)
    at /home/opt/gcc-4.8.2.bpkg-r4/gcc-4.8.2.bpkg-r4/include/c++/4.8.2/bits/hashtable_policy.h:345
  2    Thread 0x7fa0a761e700 (LWP 10684) _M_dispose (__a=..., this=0x7fa0960060f0)
    at /home/opt/gcc-4.8.2.bpkg-r4/gcc-4.8.2.bpkg-r4/include/c++/4.8.2/bits/basic_string.h:245
* 1    Thread 0x7fa235731a40 (LWP 7017) 0x00007fa23212e9cd in nanosleep () from /opt/compiler/gcc-4.8.2/lib/libc.so.6

3.将10进制thread地址转换为16进制:0x7fa225320700
4.找到其对应的thread id,在gdb中输入thread {id},在bt排查即可

(gdb) thread 9
[Switching to thread 9 (Thread 0x7fa225320700 (LWP 7030))]
#0  0x00007fa2320ad6a0 in sigprocmask () from /opt/compiler/gcc-4.8.2/lib/libc.so.6
(gdb) bt
#0  0x00007fa2320ad6a0 in sigprocmask () from /opt/compiler/gcc-4.8.2/lib/libc.so.6
#1  0x00007fa23212e7e9 in sleep () from /opt/compiler/gcc-4.8.2/lib/libc.so.6
#2  0x00007fa232ccd61a in ?? () from /home/work/users/xushuda/video-middlepage-online/rank_run_time/new_recomm_online/bin/libpaddle_capi_shared.so
#3  
#4  0x00007fa2320ad3f7 in raise () from /opt/compiler/gcc-4.8.2/lib/libc.so.6
#5  0x00007fa2320ae7d8 in abort () from /opt/compiler/gcc-4.8.2/lib/libc.so.6
#6  0x00007fa232cce31b in ?? () from /home/work/users/xushuda/video-middlepage-online/rank_run_time/new_recomm_online/bin/libpaddle_capi_shared.so
#7  0x00007fa232ccecda in ?? () from /home/work/users/xushuda/video-middlepage-online/rank_run_time/new_recomm_online/bin/libpaddle_capi_shared.so
#8  0x00007fa232ccf3a2 in ?? () from /home/work/users/xushuda/video-middlepage-online/rank_run_time/new_recomm_online/bin/libpaddle_capi_shared.so
#9  0x00007fa232ccd35b in ?? () from /home/work/users/xushuda/video-middlepage-online/rank_run_time/new_recomm_online/bin/libpaddle_capi_shared.so
#10 0x00007fa232ccdbd7 in ?? () from /home/work/users/xushuda/video-middlepage-online/rank_run_time/new_recomm_online/bin/libpaddle_capi_shared.so
#11 
#12 0x00007fa232f8d3e7 in ?? () from /home/work/users/xushuda/video-middlepage-online/rank_run_time/new_recomm_online/bin/libpaddle_capi_shared.so
#13 0x00007fa232f857a4 in ?? () from /home/work/users/xushuda/video-middlepage-online/rank_run_time/new_recomm_online/bin/libpaddle_capi_shared.so
#14 0x00007fa232f7c5a3 in ?? () from /home/work/users/xushuda/video-middlepage-online/rank_run_time/new_recomm_online/bin/libpaddle_capi_shared.so
#15 0x00007fa232d9df66 in ?? () from /home/work/users/xushuda/video-middlepage-online/rank_run_time/new_recomm_online/bin/libpaddle_capi_shared.so
#16 0x00007fa232e27a69 in ?? () from /home/work/users/xushuda/video-middlepage-online/rank_run_time/new_recomm_online/bin/libpaddle_capi_shared.so
#17 0x00007fa232cbf8e6 in ?? () from /home/work/users/xushuda/video-middlepage-online/rank_run_time/new_recomm_online/bin/libpaddle_capi_shared.so
#18 0x00000000004e4e96 in RankDnnBatchDurV4::batch_predict (this=0x7fa1ec133c40, uid=2499685119, user=..., source_post=..., target_posts=..., ms_input=..., results=...)
    at baidu/tb-recom/recomm-middlepage/src/server/base/hot_object/RankDnnBatchDurV4.cpp:2082
#19 0x0000000000571a6c in recomm_middlepage::DnnModelControlGroup::Predict (this=, ms_data=)
    at baidu/tb-recom/recomm-middlepage/src/server/model/DnnModelControlGroup.cpp:26
#20 0x000000000062ba05 in recomm_middlepage::MultiModel::Run (this=0x7fa0f76097b0) at baidu/tb-recom/recomm-middlepage/src/server/process/video/4rank/MultiModel.cpp:38
#21 0x000000000065dc21 in RunTask (arg=0x7fa0f7710090) at baidu/tb-recom/recomm-baselib/src/recom/base/task_pool.cpp:164
#22 0x000000000065e395 in RunTaskOnSignleThread (arg=0x7fa079c513d0) at baidu/tb-recom/recomm-baselib/src/recom/base/task_pool.cpp:206
#23 0x000000000089ae3a in bthread::TaskGroup::task_runner (skip_remained=) at baidu/base/bthread/bthread/task_group.cpp:290
#24 0x0000000000892231 in bthread_make_fcontext ()
Backtrace stopped: Cannot access memory at address 0x7fa0d8ece000

你可能感兴趣的:(模型预测core dump追查)