[valgrind实战]造成stack smashing detected的问题检测

今天手写归并排序时,出现了这个错误

0 94 25 31 47 76 8 6 21 83 0 19 61 12 46 74 
0 6 6 8 8 12 12 25 25 25 31 31 31 61 74 94 
*** stack smashing detected ***: ./a.out terminated

仔细一看,归并的结果也是错误的,由于主排序函数merge_sort仅仅是判断边界+2次递归+归并有序子数组,所以猜测错误出在merge函数中

void merge(int* a, int mid, int n) {
    int* buf = new int[n];
    memcpy(buf, a, n * sizeof(int));

    int* pL = buf;
    int* pR = buf + mid;
    int* const pEndL = buf + mid;
    int* const pEndR = buf + n;
    
    for (; pL != pEndL && pR != pEndR; ++a) {
        if (*pL <= *pR) {
            *a = *pL;
            ++pL;
        }
        else {
            *a = *pR;
            ++pR;
        }
    }
    if (pL == pEndL)
        memcpy(a, pR, (pEndR - pR) * sizeof(int));
    else
        memcpy(a, pL, (pEndR - pL) * sizeof(int));

    delete[] buf;
}

PS:这里简洁点写是可以*a = *pL++这样的,很多库源码里都这么写,但是前几天腾讯一面的面试官表示这样的风格不太好,还要判断++和*的优先级,于是我就不图方便了。
之前merge函数采用下标访问的方法时没问题,这里采用指针访问则出了问题,仔细点一眼能发现错误在哪,但是调试时的肉眼不是那么可靠的。
依旧是用valgrind检测

$ valgrind ./a.out --leak-check=full
==14866== Memcheck, a memory error detector
==14866== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==14866== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==14866== Command: ./a.out --leak-check=full
==14866== 
6 47 66 81 63 28 31 86 67 95 81 22 89 59 86 12 
6 12 22 22 28 28 47 63 66 67 81 81 81 86 86 95 
*** stack smashing detected ***: ./a.out terminated
==14866== 
==14866== Process terminating with default action of signal 6 (SIGABRT)
==14866==    at 0x540E428: raise (raise.c:54)
==14866==    by 0x5410029: abort (abort.c:89)
==14866==    by 0x54507E9: __libc_message (libc_fatal.c:175)
==14866==    by 0x54F215B: __fortify_fail (fortify_fail.c:37)
==14866==    by 0x54F20FF: __stack_chk_fail (stack_chk_fail.c:28)
==14866==    by 0x400BBF: main (in /home/xyz/cpp/sort/a.out)
==14866== 
==14866== HEAP SUMMARY:
==14866==     in use at exit: 72,704 bytes in 1 blocks
==14866==   total heap usage: 17 allocs, 16 frees, 73,984 bytes allocated
==14866== 
==14866== LEAK SUMMARY:
==14866==    definitely lost: 0 bytes in 0 blocks
==14866==    indirectly lost: 0 bytes in 0 blocks
==14866==      possibly lost: 0 bytes in 0 blocks
==14866==    still reachable: 72,704 bytes in 1 blocks
==14866==         suppressed: 0 bytes in 0 blocks
==14866== Rerun with --leak-check=full to see details of leaked memory
==14866== 
==14866== For counts of detected and suppressed errors, rerun with: -v
==14866== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Aborted (core dumped)

从LEAK SUMMARY可以看到堆上没有内存泄露(这里still reachable是正常的,是用来跟踪内存泄露的内存块),SIGABRT信号是从__stack_chk_fail函数导致的,如同名字定义,是栈上的错误。
用来排序的数组是静态数组,存放于栈上。由于栈上的内存分配是自动回收的(仅仅移动相应的栈指针即可),栈上的越界访问无法用valgrind来检测。不过这里valgrind定位到了出错位置在main函数里。
我的main函数中测试代码如下

    int arr[N];
    // ...
    merge_sort(arr, N);

尝试把arr改成堆上申请的数组

    int* arr = new int[N];
    // ...
    merge_sort(arr, N);
    // ...
    delete[] arr;

g++加上-g选项编译后,再用valgrind检测,错误信息就出来了,截取关键检测结果如下

==15746== Invalid write of size 8
==15746==    at 0x4C326CB: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15746==    by 0x4009EB: merge(int*, int, int) (merge.cpp:31)
==15746==    by 0x400A7C: merge_sort(int*, int) (merge.cpp:43)
==15746==    by 0x400B22: main (merge.cpp:60)
==15746==  Address 0x5ab6cc0 is 0 bytes after a block of size 64 alloc'd
==15746==    at 0x4C2E80F: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15746==    by 0x400A9E: main (merge.cpp:54)

准确地定位到了出错位置,即merge函数中调用的库函数memcpy(第31行),查看附近的代码

$ awk 'NR==31' merge.cpp
        memcpy(a, pL, (pEndR - pL) * sizeof(int));

嗯,定位到这里就很明显了,归并是把左边区域L和右边区域R合并,pEndR是R的右边界,pL是在左边区域中进行迭代的指针,这里的目的是把左边区域剩下的元素全部放在归并后的数组末尾,所以应该是pEndL-pL

最后说一点,对于大小为N的数组,一般情况下访问下标未越界太多的位置不会提示错误,因为该地址可能被其他变量所占用,所以实质上这样的访问被认为是合法的,比如

$ cat a.cc
#include 

int main() {
    int a[3];
    a[3] = 10;
    printf("%d\n", a[3]);
    return 0;
}
$ g++ a.cc
$ ./a.out 
10

运行结果未报错,但如果访问的是a[100]则会Segmentation fault (core dumped)
栈上的陷阱很多,还难以定位,我之前也写了一篇类似的博客C程序的局部变量被重用现象,栈上的数组越界检测比较难检测,可以像本文这样尝试改成堆上的数组来精确检测错误。

你可能感兴趣的:([valgrind实战]造成stack smashing detected的问题检测)