善用GDB 调试一些函数栈被毁坏的问题

最近一些问题的现象一开始难以解释,函数的参数地址在函数内部被传递给另外的函数,然后发现地址发生了改变,这样的情况称之为函数的栈被毁坏,导致无法重入。

然后被调用的函数里面,访问了非法的地址导致了segment fault,产生core dump文件。问题比较棘手

查了一些文件,准备从gdb的栈保护设置开始着手。

1) 编译的时候添加编译选项

-fstack-protector 和 -fstack-protector-all 这两个选项指示编译器开启栈保护,这样在栈乱序的第一时间可以dump出来现场。可加在Makefile里面, 顺便扯一句,Makefile这种东西对于搞开源软件的人,还真是得精通,我随便想写个Makefile玩着,突然感觉自己头脑一片空白。

2) gdb的多线程功能

bt 查看当前线程的调用栈

bt full 查看详细的调用栈

info threads 可以查看所有线程的信息

thread 可以具体跳转到某个线程

f 可以跳转到某个栈中位置

i locals 显示当前调用栈的所有变量

i register 显示当前调用栈的寄存器值,主要是查看地址

有了这些命令的帮助我们可以从core dump 的文件里面分析出很多问题。

下面举个例子:

gdb /lab/testtools/rhel664/dallas/testRelease/R10A06_dynamic_udpport_5/mnsserv/bin/mhlif core-mhlif-18310-1384802382 

(gdb) bt

#0  0x0000003383488611 in memcpy () from /lib64/libc.so.6

#1  0x000000000041a9aa in ReadFromQueue (q=0x647580, msg=0x4fc780004fc71 , size=280, 

time=21081) at ltsosdep.c:443

#2  0x000000000041b552 in OSH_ReceiveMsgQMillisec (q=0x647580, msg=0x4fc780004fc71 , 

size=280, time=21081) at ltsosdep.c:1370

#3  0x000000000042d47d in RPS::ReceiveMsg (this=0x2b3100005330, delay=21081) at rps.cc:590

#4  0x000000000042d731 in RPS::Execute (this=0x2b31681ffdf0) at rps.cc:572

#5  0x000000000042dbe8 in StartRps (arg=0x157a680) at rps.cc:181

#6  0x0000003383c077e1 in start_thread () from /lib64/libpthread.so.0

#7  0x00000033834e68ed in clone () from /lib64/libc.so.6

(gdb) bt full

#0  0x0000003383488611 in memcpy () from /lib64/libc.so.6

No symbol table info available.

#1  0x000000000041a9aa in ReadFromQueue (q=0x647580, msg=0x4fc780004fc71 , size=280, 

time=21081) at ltsosdep.c:443

row = 0x2b31682a433c

answer = LTS_OK

#2  0x000000000041b552 in OSH_ReceiveMsgQMillisec (q=0x647580, msg=0x4fc780004fc71 , 

size=280, time=21081) at ltsosdep.c:1370

No locals.

#3  0x000000000042d47d in RPS::ReceiveMsg (this=0x2b3100005330, delay=21081) at rps.cc:590

rpsMsg = {msgId = 4501, type = 0 '\000', data = {loadReplayReq = {

fileName = "Ú%\004\000\002\000\000\000\001\000\000\000\235ú\004\000tQ\003\000GP\003\000¸U\000\000Oû\004\000pR\000\000\206ü\004\000\bú\004\000ÅS\000\000vR\000\000\067P\003\000fP\003\000 ü\004\000Úü\004\000¢P\003\000ÿT\000\000\vý\004\000²O\003\000Z\002\002\000Nú\004\000+ú\004\000>ú\004\000\233T\000\000íÿ\001\000ÊT\000\000G\001\002\000M\001\002\000Y\003\002\000£ú\004\000\020ú\004\000\032\000\002\000ÎU\000\000x\000\002\000\035\001\002\000K\002\002\000æù\004\000\206S\000\000\071U\000\000\232ü\004\000õP\003\000ë\000\002\000\202S\003\000Ø\000\002\000xú\004\000\201\001\002\000=T\000\000oR\000\000"..., natType = 48 '0', timeStretch = 11057, 

rpsType = 2156588448}, replayConReq = {msIndex = 271834, contextIndex = 2 '\002', resend = 0 '\000', replayId = 0, 

sessionId = 1, sessionTime = 326301, destIp1 = {addr64 = {932690803249524, 1402216627852728}, 

b = "tQ\003\000GP\003\000¸U\000\000Oû\004", addr16 = {20852, 3, 20551, 3, 21944, 0, 64335, 4}, ui = {i1 = 217460, 

i2 = 217159, i3 = 21944, ipv4 = 326479}}, destIp2 = {addr64 = {1403552362680944, 92105573988872}, 

b = "pR\000\000\206ü\004\000\bú\004\000ÅS\000", addr16 = {21104, 0, 64646, 4, 64008, 4, 21445, 0}, ui = {

i1 = 21104, i2 = 326790, i3 = 326152, ipv4 = 21445}}, reqPackets = 21110, timeStretch = 217143, type = 102 'f', 

radiotype = 80 'P', kernelMsId = 933081645382874}, msgQid = {_qId = 0x2000425da}, payloadPropReq = {

payloadPropId = 271834, groupId = 2 '\002', msgLength = 0, userBw = 1}, connectionReq = {msIndex = 271834, 

contextIndex = 2 '\002', payloadPropId = 0, sessionId = 1, addresses = {GiIpAddr = {addr64 = {932690803249524, 

1402216627852728}, b = "tQ\003\000GP\003\000¸U\000\000Oû\004", addr16 = {20852, 3, 20551, 3, 21944, 0, 64335, 

4}, ui = {i1 = 217460, i2 = 217159, i3 = 21944, ipv4 = 326479}}, msPortNo = 21104, GiPortNo = 0}, 

reqPackets = 326152, initiator = 197 'Å', type = 83 'S', radiotype = 0 '\000', kernelMsId = 932622083576438}, 

rpsDeactReq = {msIndex = 271834, contextIndex = 2 '\002', sendMhlResponse = LTS_TRUE, sessionId = {326301, 217460, 

217159, 21944, 326479, 21104, 326790, 326152, 21445, 21110, 217143, 217190, 326688, 326874, 217250, 21759, 326923, 

217010, 131674, 326222}, pdpcontextId = 326187, sessionnum = 62 '>'}, moveUpdateDataReq = {msIndex = 271834, 

toDevice = 2 '\002', moveIndex = 1, status = 326301}, suspendResumeReq = {msIndex = 271834, sessionId = {2, 1, 

326301, 217460, 217159, 21944, 326479, 21104, 326790, 326152, 21445, 21110, 217143, 217190, 326688, 326874, 

217250, 21759, 326923, 217010}, sessionnum = 90 'Z', contextIndex = 2 '\002'}, rabCreateReleaseReq = {

msIndex = 271834, contextId = 2 '\002'}, peMoveResp = {msIndex = 271834, toDevice = 2 '\002', moveIndex = 1, 

peIndex = 326301, status = 217460}, scalePayloadReq = {scaleFactor = 271834}, magQid = {_qId = 0x2000425da}}}

count =

#4  0x000000000042d731 in RPS::Execute (this=0x2b31681ffdf0) at rps.cc:572

nowTime = 12394937602

nextTime =

count = 844209533

entry =

pEngine = 0x2b3168518270

#5  0x000000000042dbe8 in StartRps (arg=0x157a680) at rps.cc:181

Rps = {mhlifQId = {_qId = 0x2b315c225000}, magifQId = {_qId = 0x2b319454d000}, initQId = {_qId = 0x2b315c225000}, 

mDeviceNo = 12, mRpsState = RPS_RUNNING_STATE, sessionRepository = {rpsSessionPolymer = {buckets = 100001, 

hash_func = 0x42a5c0 , p_dataRepository = 0x2b316c001070}}, 

log = @0x1538040, apnDev = 10, vpReplayStore = std::vector of length 0, capacity 0, 

mpAlreadyLoaded = std::map with 0 elements}

#6  0x0000003383c077e1 in start_thread () from /lib64/libpthread.so.0

No symbol table info available.

#7  0x00000033834e68ed in clone () from /lib64/libc.so.6

一般来说bt full没什么用,但是可以看到一些局部变量的值,但是有些值不可靠,我们还不能准确的定位

(gdb) info threads

16 Thread 0x2b3151cb7100 (LWP 18310)  0x0000003383c0b44c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

15 Thread 0x2b315c54b700 (LWP 18428)  0x00000033834df443 in select () from /lib64/libc.so.6

14 Thread 0x2b315c224700 (LWP 18423)  0x00000033834df443 in select () from /lib64/libc.so.6

13 Thread 0x2b31525e5700 (LWP 18422)  0x0000003383c0b44c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

12 Thread 0x2b3151fb0700 (LWP 18313)  0x00000033834df443 in select () from /lib64/libc.so.6

11 Thread 0x2b3194873700 (LWP 18535)  0x00000033834df443 in select () from /lib64/libc.so.6

10 Thread 0x2b319454c700 (LWP 18534)  0x00000033834df443 in select () from /lib64/libc.so.6

9 Thread 0x2b3194225700 (LWP 18533)  0x00000033834df443 in select () from /lib64/libc.so.6

8 Thread 0x2b3188425700 (LWP 18531)  0x00000033834df443 in select () from /lib64/libc.so.6

7 Thread 0x2b3188200700 (LWP 18530)  0x0000003383c0b44c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

6 Thread 0x2b3178602700 (LWP 18529)  0x0000003383c0b44c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

5 Thread 0x2b3178401700 (LWP 18435)  0x00000033834df443 in select () from /lib64/libc.so.6

4 Thread 0x2b3178200700 (LWP 18434)  0x00000033834df443 in select () from /lib64/libc.so.6

3 Thread 0x2b3169f6b700 (LWP 18433)  0x0000003383c0b44c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

2 Thread 0x2b3169d6a700 (LWP 18432)  0x0000003383c0b44c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

* 1 Thread 0x2b3168200700 (LWP 18429)  0x0000003383488611 in memcpy () from /lib64/libc.so.6

(gdb) thread 1

[Switching to thread 1 (Thread 0x2b3168200700 (LWP 18429))]#0  0x0000003383488611 in memcpy () from /lib64/libc.so.6

(gdb) bt

#0  0x0000003383488611 in memcpy () from /lib64/libc.so.6

#1  0x000000000041a9aa in ReadFromQueue (q=0x647580, msg=0x4fc780004fc71 , size=280, 

time=21081) at ltsosdep.c:443

#2  0x000000000041b552 in OSH_ReceiveMsgQMillisec (q=0x647580, msg=0x4fc780004fc71 , 

size=280, time=21081) at ltsosdep.c:1370

#3  0x000000000042d47d in RPS::ReceiveMsg (this=0x2b3100005330, delay=21081) at rps.cc:590

#4  0x000000000042d731 in RPS::Execute (this=0x2b31681ffdf0) at rps.cc:572

#5  0x000000000042dbe8 in StartRps (arg=0x157a680) at rps.cc:181

#6  0x0000003383c077e1 in start_thread () from /lib64/libpthread.so.0

#7  0x00000033834e68ed in clone () from /lib64/libc.so.6

(gdb) f 1

#1  0x000000000041a9aa in ReadFromQueue (q=0x647580, msg=0x4fc780004fc71 , size=280, 

time=21081) at ltsosdep.c:443

443     ltsosdep.c: No such file or directory.

in ltsosdep.c

(gdb) i locals

row = 0x2b31682a433c

answer = LTS_OK

(gdb) i register

rax            0x2b0000001197   47278999998871

rbx            0x4fc780004fc71  1403492233444465

rcx            0x7      7

rdx            0x118    280

rsi            0x2b31682a4340   47491200992064

rdi            0x4fc780004fc71  1403492233444465

rbp            0x2b31681ffb80   0x2b31681ffb80

rsp            0x2b31681ffb20   0x2b31681ffb20

r8             0x1c0002000ce527 7881307938678055

r9             0x2b310003517c   47489453609340

r10            0x0      0

r11            0x202    514

r12            0x525a00005259   90546500555353

r13            0x2b3100005330   47489453413168

r14            0x20c49ba5e353f7cf       2361183241434822607

r15            0x2b316c106d70   47491266407792

rip            0x41a9aa 0x41a9aa

eflags         0x10203  [ CF IF RF ]

cs             0x33     51

ss             0x2b     43

ds             0x0      0

es             0x0      0

fs             0x0      0

这里只是演示了一些查看core dump文件的方法,其实在进程alive的时候,我们可以直接attach 到进程上面去分析代码。

(gdb) attach 2467

Attaching to process 2467

Reading symbols from /root/algorithm/testBh...done.

Reading symbols from /usr/lib/libstdc++.so.6...(no debugging symbols found)...done.

Loaded symbols for /usr/lib/libstdc++.so.6

Reading symbols from /lib/tls/i686/cmov/libm.so.6...Reading symbols from /usr/lib/debug/lib/tls/i686/cmov/libm-2.11.1.so...done.

done.

Loaded symbols for /lib/tls/i686/cmov/libm.so.6

Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols found)...done.

Loaded symbols for /lib/libgcc_s.so.1

Reading symbols from /lib/tls/i686/cmov/libc.so.6...Reading symbols from /usr/lib/debug/lib/tls/i686/cmov/libc-2.11.1.so...done.

done.

Loaded symbols for /lib/tls/i686/cmov/libc.so.6

Reading symbols from /lib/ld-linux.so.2...Reading symbols from /usr/lib/debug/lib/ld-2.11.1.so...done.

done.

Loaded symbols for /lib/ld-linux.so.2

0x005f7422 in __kernel_vsyscall ()

(gdb) break testBh.cc:38

Breakpoint 1 at 0x80488ff: file testBh.cc, line 38.

(gdb) c

Continuing.

这些方法可以让进程挂住,然后单步调试,或者print一些局部变量


打印所有线程堆栈

在gdb中使用 thread apply all bt 查看所用线程堆栈信息


善用GDB 调试一些函数栈被毁坏的问题_第1张图片

你可能感兴趣的:(善用GDB 调试一些函数栈被毁坏的问题)