首先我们在写代码的时候会出现coredump
,为什么会出现coredump?因为我们程序访问了未分配的内存地址,当我们访问了未分配地址的内存内核会发送信号(如SIGSEGV)给我们的程序报告访问越界,然后coredump就产生,但是遇到coredump不要慌张,我们有非常好的办法对付他
coredump后会生成一个coredump文件记录我们在crash的那一刻,内存,寄存器,stack等等信息,我们可以通过gdb打开coredump文件,然后定位问题在那里。
首先coredump文件是默认不打开的,所以我们先看系统是否允许coredump文件生成,查看命令如下
root@zhr-workstation:~/test# ulimit -c
0
ulimit查看系统对于用户的限制,-c代表coredump,结果位0代表没有打开,所以我们要先打开这个限制,命令如下,在原命令后面加上一个ulimited表示开启或者说是不限制
root@zhr-workstation:~/test# ulimit -c unlimited
再查看是否开启coredump文件生成的限制,发现已经开启
root@zhr-workstation:~/test# ulimit -c
unlimited
root@zhr-workstation:~/test#
然后我们继续编译我们发生coredump的文件
root@zhr-workstation:~/test# ./search.o
Segmentation fault (core dumped)
再ls -al一下发现本目录下还是没有coredump文件…那么我们怎么办,我们知道coredump文件以pid作为文件名字的一部分,所以我们要知道产生coredump进程的pid,然后搜索他,所以我们在产生coredump的代码中加上这一句
#include
printf("pid is %d\n"getpid());
最后发现pid为212206,最后我们直接用find
命令找到core文件
root@zhr-workstation:~/test# find / -name *.212206.*
find: ‘/run/user/1000/gvfs’: Permission denied
find: ‘/run/user/126/gvfs’: Permission denied
/var/lib/apport/coredump/core._root_test_search_o.0.7304495b-f7bd-4c87-a009-f5c63b165ceb.212206.223266434
我们还可以通过看apport的日志来确定coredump文件的名字apport是Ubuntu’s crash reporting system,coredump就是通过这个apport系统生成的,所以我们看这个日志除了看coredump文件的类型还可以看到我们程序因为收到什么信号发生的coredump,如下
root@zhr-workstation:~/test# tail /var/log/apport.log
我们开始用gdb调试coredump文件,这里注意编译的时候需要加上-g
选项(gcc)
然后我们开始打开GDB调试,首先我们GDB的打开方式是下面的格式分成三部分,首先是gdb,其次是可执行的二进制文件,最后是coredump文件
gdb binary_file core_file
root@zhr-workstation:~/test# gdb search.o core._root_test_search_o.0.7304495b-f7bd-4c87-a009-f5c63b165ceb.212237.223362044
GNU gdb (Ubuntu 11.1-0ubuntu2) 11.1
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from search.o...
[New LWP 212237]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./search.o'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000055f27615227b in binary_search (data=0x7fffcabcd590, key=2, low=0, hight=) at search.c:16
16 int binary_search(int* data, int key, int low, int hight){
(gdb)
由上图可以看到直接跳转到出现问题的地方
我们先在gdb
允许bt
直接打印出出问题的栈
(gdb) bt
#0 0x000055f27615227b in binary_search (data=0x7fffcabcd590, key=2, low=0, hight=) at search.c:16
#1 0x000055f2761522c2 in binary_search (data=0x7fffcabcd590, key=2, low=0, hight=0) at search.c:20
#2 0x000055f2761522c2 in binary_search (data=0x7fffcabcd590, key=2, low=0, hight=0) at search.c:20
#3 0x000055f2761522c2 in binary_search (data=0x7fffcabcd590, key=2, low=0, hight=0) at search.c:20
#4 0x000055f2761522c2 in binary_search (data=0x7fffcabcd590, key=2, low=0, hight=0) at search.c:20
#5 0x000055f2761522c2 in binary_search (data=0x7fffcabcd590, key=2, low=0, hight=0) at search.c:20
#6 0x000055f2761522c2 in binary_search (data=0x7fffcabcd590, key=2, low=0, hight=0) at search.c:20
#7 0x000055f2761522c2 in binary_search (data=0x7fffcabcd590, key=2, low=0, hight=0) at search.c:20
#8 0x000055f2761522c2 in binary_search (data=0x7fffcabcd590, key=2, low=0, hight=0) at search.c:20
#9 0x000055f2761522c2 in binary_search (data=0x7fffcabcd590, key=2, low=0, hight=0) at search.c:20
#10 0x000055f2761522c2 in binary_search (data=0x7fffcabcd590, key=2, low=0, hight=0) at search.c:20
#11 0x000055f2761522c2 in binary_search (data=0x7fffcabcd590, key=2, low=0, hight=0) at search.c:20
#12 0x000055f2761522c2 in binary_search (data=0x7fffcabcd590, key=2, low=0, hight=0) at search.c:20
#13 0x000055f2761522c2 in binary_search (data=0x7fffcabcd590, key=2, low=0, hight=0) at search.c:20
因为这里是一个迭代,所以bt后栈太多我就不一一列举了,我们直接看第一个出问题的栈(#0)
我们用frame 0
跳转到第一个出问题的栈帧上,info local
看变量没问题把,最后发现问题出在访问hight变量报错,
(gdb) frame 0
#0 0x000055f27615227b in binary_search (data=0x7fffcabcd590, key=2, low=0, hight=) at search.c:16
16 int binary_search(int* data, int key, int low, int hight){
(gdb) info local
mid = 0
index = 0
(gdb) p key
$1 = 2
(gdb) p low
$2 = 0
(gdb) p hight
Cannot access memory at address 0x7fffca3cfffc
(gdb)
知道了大概的问题我们开始对代码进行排查我们直接list看出问题地方的代码
(gdb) list
11 int result = binary_search(data, 2, 0, sizeof(data)/4);
12
13 printf("result is %d\n", result);
14 }
15
16 int binary_search(int* data, int key, int low, int hight){
17 int mid = (low + hight) / 2;
18 int index;
19 if( data[mid] > key ){
20 index = binary_search(data, key, low, mid);
(gdb)
21 }else if( data[mid] < key ){
22 index = binary_search(data, key, mid, hight);
23 }else if( data[mid] == key ){
24 return mid;
25 }
26 return index;
27
28 }