最近差一些问题,这些问题的现象一开始难以解释,函数的参数地址在函数内部被传递给另外的函数,然后发现地址发生了改变,这样的情况称之为函数的栈被毁坏,导致无法重入。
然后被调用的函数里面,访问了非法的地址导致了segment fault,产生core dump文件。问题比较棘手
查了一些文件,准备从gdb的栈保护设置开始着手。
1) 编译的时候添加编译选项
-fstack-protector 和 -fstack-protector-all 这两个选项指示编译器开启栈保护,这样在栈乱序的第一时间可以dump出来现场。可加在Makefile里面, 顺便扯一句,Makefile这种东西对于搞开源软件的人,还真是得精通,我随便想写个Makefile玩着,突然感觉自己头脑一片空白。
2) gdb的多线程功能
bt 查看当前线程的调用栈
bt full 查看详细的调用栈
info threads 可以查看所有线程的信息
thread
f
i locals 显示当前调用栈的所有变量
i register 显示当前调用栈的寄存器值,主要是查看地址
有了这些命令的帮助我们可以从core dump 的文件里面分析出很多问题。
下面举个例子:
gdb /lab/testtools/rhel664/dallas/testRelease/R10A06_dynamic_udpport_5/mnsserv/bin/mhlif core-mhlif-18310-1384802382
(gdb) bt
#0 0x0000003383488611 in memcpy () from /lib64/libc.so.6
#1 0x000000000041a9aa in ReadFromQueue (q=0x647580, msg=0x4fc780004fc71
(gdb) bt full
#0 0x0000003383488611 in memcpy () from /lib64/libc.so.6
No symbol table info available.
#1 0x000000000041a9aa in ReadFromQueue (q=0x647580, msg=0x4fc780004fc71
一般来说bt full没什么用,但是可以看到一些局部变量的值,但是有些值不可靠,我们还不能准确的定位
(gdb) info threads
16 Thread 0x2b3151cb7100 (LWP 18310) 0x0000003383c0b44c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
15 Thread 0x2b315c54b700 (LWP 18428) 0x00000033834df443 in select () from /lib64/libc.so.6
14 Thread 0x2b315c224700 (LWP 18423) 0x00000033834df443 in select () from /lib64/libc.so.6
13 Thread 0x2b31525e5700 (LWP 18422) 0x0000003383c0b44c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
12 Thread 0x2b3151fb0700 (LWP 18313) 0x00000033834df443 in select () from /lib64/libc.so.6
11 Thread 0x2b3194873700 (LWP 18535) 0x00000033834df443 in select () from /lib64/libc.so.6
10 Thread 0x2b319454c700 (LWP 18534) 0x00000033834df443 in select () from /lib64/libc.so.6
9 Thread 0x2b3194225700 (LWP 18533) 0x00000033834df443 in select () from /lib64/libc.so.6
8 Thread 0x2b3188425700 (LWP 18531) 0x00000033834df443 in select () from /lib64/libc.so.6
7 Thread 0x2b3188200700 (LWP 18530) 0x0000003383c0b44c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
6 Thread 0x2b3178602700 (LWP 18529) 0x0000003383c0b44c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
5 Thread 0x2b3178401700 (LWP 18435) 0x00000033834df443 in select () from /lib64/libc.so.6
4 Thread 0x2b3178200700 (LWP 18434) 0x00000033834df443 in select () from /lib64/libc.so.6
3 Thread 0x2b3169f6b700 (LWP 18433) 0x0000003383c0b44c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
2 Thread 0x2b3169d6a700 (LWP 18432) 0x0000003383c0b44c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
* 1 Thread 0x2b3168200700 (LWP 18429) 0x0000003383488611 in memcpy () from /lib64/libc.so.6
(gdb) thread 1
[Switching to thread 1 (Thread 0x2b3168200700 (LWP 18429))]#0 0x0000003383488611 in memcpy () from /lib64/libc.so.6
(gdb) bt
#0 0x0000003383488611 in memcpy () from /lib64/libc.so.6
#1 0x000000000041a9aa in ReadFromQueue (q=0x647580, msg=0x4fc780004fc71
(gdb) f 1
#1 0x000000000041a9aa in ReadFromQueue (q=0x647580, msg=0x4fc780004fc71
(gdb) i register
rax 0x2b0000001197 47278999998871
rbx 0x4fc780004fc71 1403492233444465
rcx 0x7 7
rdx 0x118 280
rsi 0x2b31682a4340 47491200992064
rdi 0x4fc780004fc71 1403492233444465
rbp 0x2b31681ffb80 0x2b31681ffb80
rsp 0x2b31681ffb20 0x2b31681ffb20
r8 0x1c0002000ce527 7881307938678055
r9 0x2b310003517c 47489453609340
r10 0x0 0
r11 0x202 514
r12 0x525a00005259 90546500555353
r13 0x2b3100005330 47489453413168
r14 0x20c49ba5e353f7cf 2361183241434822607
r15 0x2b316c106d70 47491266407792
rip 0x41a9aa 0x41a9aa
eflags 0x10203 [ CF IF RF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
这里只是演示了一些查看core dump文件的方法,其实在进程alive的时候,我们可以直接attach 到进程上面去分析代码。
(gdb) attach 2467
Attaching to process 2467
Reading symbols from /root/algorithm/testBh...done.
Reading symbols from /usr/lib/libstdc++.so.6...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libstdc++.so.6
Reading symbols from /lib/tls/i686/cmov/libm.so.6...Reading symbols from /usr/lib/debug/lib/tls/i686/cmov/libm-2.11.1.so...done.
done.
Loaded symbols for /lib/tls/i686/cmov/libm.so.6
Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/tls/i686/cmov/libc.so.6...Reading symbols from /usr/lib/debug/lib/tls/i686/cmov/libc-2.11.1.so...done.
done.
Loaded symbols for /lib/tls/i686/cmov/libc.so.6
Reading symbols from /lib/ld-linux.so.2...Reading symbols from /usr/lib/debug/lib/ld-2.11.1.so...done.
done.
Loaded symbols for /lib/ld-linux.so.2
0x005f7422 in __kernel_vsyscall ()
(gdb) break testBh.cc:38
Breakpoint 1 at 0x80488ff: file testBh.cc, line 38.
(gdb) c
Continuing.
这些方法可以让进程挂住,然后单步调试,或者print一些局部变量