如何在AIX机器上使用dbx调试core dump

How to analyse core dump on AIX with dbx

This article explains how to analyse core dump on AIX with dbx.

We have two problems with Totalview.

  • The display of the Visibroker stack trace is not relevant.
  • It is not possible to change with TV the path to the loaded libraries used for the core dump analysis.

This article explains

  1. How to start dbx ?
  2. How to change the path to the loaded libraries with dbx ?
  3. A small set of commands for dbx.
  4. Where to find the documentation for dbx.

1/ How to start dbx ?

Simply by "dbx ./myProgram ./myCoreFile"

2/ How to change the path to the loaded libraries with dbx ?

On AIX, it is possible to use "-p"

-p oldpath=newpath:...| pathfile
Specifies a substitution for library paths when examining core files in the format oldpath=newpath. oldpath specifies the value to be substituted (as stored in the core file) and newpath specifies what it is to be replaced with. These may be complete or partial, relative or absolute paths. Multiple substitutions may be specified, separated by colons. Alternatively, the -p flag may specify the name of a file from which mappings in the previously described format are to be read. Only one mapping per line is allowed when mappings are read from a file.

Example :  dbx -p /soft=/users/username ./myProgram ./myCoreFile

3/ A small set of command for dbx

corefile
        Displays high-level data about a corefile.
where
        stack trace (defaults to faulting thread)
proc
        traits of the process when it coredumped
thread       
        pthreads data
kthread
        information about kernel threads
fd
        file descriptors at the time of the dump
map
        shows which modules were loaded at time of dump
help
        help informations
up
        go up for one level in the stack
down
        go down one level in the stack
print
        display the contents of one variable

example :

 > dbx DEBUG/server_d core_spare
Type 'help' for help.
[using memory image in core_spare]
reading symbolic information ...warning: sep_version_info.cxx is newer than /users/username/xyz/shlib/libProcessAdapter_ss_d.so

Segmentation fault in _event_sleep at 0x9000000001677dc ($t14)
0x9000000001677dc (_event_sleep+0x108) e8410028         ld   r2,0x28(r1)
(dbx) corefile
 Process Name:  DEBUG/server_d
 Version:       500
 Flags:         FULL_CORE | CORE_VERSION_1 | MSTS_VALID | UBLOCK_VALID | USTACK_VALID | LE_VALID
 Signal:        SEGV
 Process Mode:  64 bit
(dbx) thread
 thread  state-k     wchan  state-u    k-tid    mode held scope function
 $t1     run                          running  2236513   u   no   pro      _p_nsleep        
 $t2     run                          blocked  4874355   u   no   pro      _event_sleep     
 $t3     run                          blocked  4923583   u   no   pro      _event_sleep     
 $t4     run                          running  5263371   u   no   pro      poll             
 $t5     run                          running  2875597   u   no   pro      poll             
 $t6     run                          blocked                 u   no   pro      _usched_swtch    
 $t7     run                          running  3203225   u   no   pro      __fd_select      
 $t8     run                          blocked                 u   no   pro      _usched_swtch    
>$t14    run                        blocked  5316739   k   no   pro      _event_sleep     
 $t10    run                         blocked  1404935   u   no   pro      _event_sleep     
 $t11    run                         blocked  1835179   u   no   pro      _event_sleep     
 $t12    run                         blocked  4440309   u   no   pro      _event_sleep     
 $t13    run                         running  1790123   u   no   pro      poll             
 $t15    run                         terminated              u   no   pro                   
(dbx) where
_event_sleep(??, ??, ??, ??, ??, ??) at 0x9000000001677dc
_p_sigtimedwait(??, ??, ??) at 0x90000000016c7a0
pth_signal.sigwait(??, ??) at 0x90000000016d7e4
unnamed block in SignalHandlerImpl(void*)(0x9001000a1c671f8), line 96 in "SigHandler_AIX.cxx"
SignalHandlerImpl(void*)(0x9001000a1c671f8), line 96 in "SigHandler_AIX.cxx"
unnamed block in invoke_i()(0x1100c3e50), line 150 in "Thread_Adapter.cpp"
invoke_i()(0x1100c3e50), line 150 in "Thread_Adapter.cpp"
invoke()(0x1100c3e50), line 94 in "Thread_Adapter.cpp"
ace_thread_adapter(0x1100c3e50), line 132 in "Base_Thread_Adapter.cpp"
(dbx) thread current 1
(dbx) where
_p_nsleep(??, ??) at 0x90000000016cd58
raise.nsleep(??, ??) at 0x9000000002cb49c
nanosleep(??, ??) at 0x9000000002faabc
OS_NS_unistd.sleep(unsigned int)(0x493e0000493e0), line 1093 in "OS_NS_unistd.inl"
unnamed block in run()(0x110002850), line 42 in "server.cxx"
run()(0x110002850), line 42 in "server.cxx"
run()(0x11009c310), line 690 in "ProcessGuts.cxx"
unnamed block in processMain(int,char**)(0x11009c310, 0x300000003, 0xfffffffffffeca0), line 1058 in "ProcessGuts.cxx"
processMain(int,char**)(0x11009c310, 0x300000003, 0xfffffffffffeca0), line 1058 in "ProcessGuts.cxx"
main2(int,char**)(argc = 3, argv = 0x0fffffffffffeca0), line 80 in "main.cxx"
main(argc = 3, argv = 0x0fffffffffffeca0), line 90 in "main.cxx"
(dbx) thread
 thread  state-k     wchan            state-u    k-tid mode held scope function
>$t1     run                          running  2236513   u   no   pro  _p_nsleep        
 $t2     run                          blocked  4874355   u   no   pro  _event_sleep     
 $t3     run                          blocked  4923583   u   no   pro  _event_sleep     
 $t4     run                          running  5263371   u   no   pro  poll             
 $t5     run                          running  2875597   u   no   pro  poll             
 $t6     run                          blocked           u   no   pro  _usched_swtch    
 $t7     run                          running  3203225   u   no   pro  __fd_select      
 $t8     run                          blocked           u   no   pro  _usched_swtch    
*$t14    run                          blocked  5316739   k   no   pro  _event_sleep     
 $t10    run                          blocked  1404935   u   no   pro  _event_sleep     
 $t11    run                          blocked  1835179   u   no   pro  _event_sleep     
 $t12    run                          blocked  4440309   u   no   pro  _event_sleep     
 $t13    run                          running  1790123   u   no   pro  poll             
 $t15    run                          terminated          u   no   pro  
(dbx) up
ProcessGuts.append(const char*,unsigned long)(0x110eda470, 0x1102a0930, 0x24), line 1062 in "cstring.h"
(dbx) print xstr
    Object:(_guts = (nil))
    SEPCString:(data_ = "/hedevecs01/SEPxxx/Main             ")
()
(dbx)

4/ If you have trouble to display the stack

In case you have the following message :

warning: cannot open /soft/nsmsoft/nsm1/CCTServer/current/shlib/5/libSEP_ss.so(libSEP_ss.o)

Please use the -p option (point 2 of this article)

5/ Where to find more explaination for dbx

Simply "man dbx" on the machine.

6/ Automate core analysis

a) Install application that crashed, with all the required libraries:

 
helabct05-operator% ls -l /soft/nsmsoft/nsm2/epm_light/2.0STD9-A5.3/*

 
/soft/nsmsoft/nsm2/epm_light/2.0STD9-A5.3/bin:
-rwxr--r--   1 operator users        613564 May 30 10:08 EPMLight

 
/soft/nsmsoft/nsm2/epm_light/2.0STD9-A5.3/param:
-rw-r--r--   1 operator users      55866955 Nov 23 11:58 core
 
/soft/nsmsoft/nsm2/epm_light/2.0STD9-A5.3/shlib:
lrwxrwxrwx   1 operator users             8 Nov 23 11:56 1 -> ../shlib
lrwxrwxrwx   1 operator users            24 Nov 23 11:56 2 -> /soft/nsmsoft/nsm2/shlib
lrwxrwxrwx   1 operator users             1 Nov 23 11:56 3 -> .
lrwxrwxrwx   1 operator users            45 Nov 23 11:56 4 -> /soft/local/common/sep/2.4STD13-A5.3/vb_shlib
lrwxrwxrwx   1 operator users            42 Nov 23 11:56 5 -> /soft/local/common/sep/2.4STD13-A5.3/shlib
lrwxrwxrwx   1 operator users            40 Nov 23 11:56 6 -> /soft/local/common/osagent/current/shlib
lrwxrwxrwx   1 operator users             8 Nov 23 11:59 7 -> /usr/lib
-rw-r--r--   1 operator users         51975 May 30 10:08 libMOB_EPMEvents.so
-rw-r--r--   1 operator users        317046 May 30 10:08 libMOB_EPM_ss.so
 

b) Go into param directory and start dbx to find out how many threads were active at crash time

 
helabct05-operator% dbx ../bin/EPMLight core
(dbx) thread
... --> lists the threads
(dbx) quit
 

 c) Generate a dbx script for displaying every thread in the core

 
#!/bin/ksh
if [ -f SCRIPT ]
then
   rm SCRIPT
fi
touch SCRIPT
 
I=1
while [ $I -lt 95 ]
do
   echo "print /"/"" >> SCRIPT
   echo "print /"THREAD $I/"" >> SCRIPT
   echo "thread current $I" >> SCRIPT
   echo "where" >> SCRIPT
   let I=$I+1
done
 

 d) Execute dbx script and collect result in flat file

 
helabct05-operator% dbx -c SCRIPT ../bin/EPMLight core > RESULT
(dbx) quit
 

This generated a file called RESULT containing the stack of every thread in EPMLight at crash time


你可能感兴趣的:(AIX)