因为项目问题,需要使用 HDR 暗光增强。
于是用了已有的 HDR 暗光增强功能,但是在 Android 环境下测试时,发现会遇到以下错误。
由于不是必现,且没有明显的规律,没有有用的信息。
查阅资料说可能是由于 JNI 层的一些非法地址操作,踩坏了 Android 的堆栈内存,导致线程崩溃。
由于前一版程序经过 20 小时的压力测试,所以矛头指向新加入的 HDR 暗光增强功能。
但是因为没有申请到 HDR 暗光增强源代码的权限,只能把问题反馈给相关人员。
相关人员查了很久,只查到了一处内存泄漏。
修改完后测试,长时间运行正常,以为好了。。。结果吃个饭回来,还是出事了。
好吧,不得已申请到源代码权限,打算自己干。
但是这个功能的确比较复杂,估计老方法一行行分析耗时严重。
最后决定使用内存泄漏检测工具进行查找。
前面废话太多,上正题!
Valgrind 是用于构建动态分析工具的探测框架。它包括一个工具集,每个工具执行某种类型的调试、分析或类似的任务,以帮助完善你的程序。Valgrind 的架构是模块化的,所以可以容易地创建新的工具而又不会扰乱现有的结构。
针对不同的情况,提供多种功能的工具:
工具 | 说明 | 功能 |
---|---|---|
Memcheck | 内存错误检测器 | 能够发现开发中绝大多数内存错误使用情况 |
Callgrind | 调用图缓存生成分析器 | 检查程序中函数调用过程中出现的问题 |
Cachegrind | 缓存和分支预测分析器 | 检查程序中缓存使用出现的问题 |
Helgrind | 线程错误检测 | 检查多线程程序中出现的竞争问题 |
Massif | 堆分析器 | 检查程序中堆栈使用中出现的问题 |
Extension | DHAT、BBV、SGCheck、 | 可以利用 core 提供的功能,自己编写特定的内存调试工具 |
Memcheck 是 valgrind 应用最广泛的工具。它是一个重量级的内存检查器,能够发现 C 或者 C++ 在开发过程中绝大多数导致程序崩溃或者不可预知的行为的内存相关的错误,比如:使用未初始化的内存、使用已释放内存、内存访问越界等。
这次我们主要介绍 Memcheck 工具
下载:valgrind Current Releases
# 解压
bzip2 -d valgrind-3.14.0.tar.bz2
tar -xvf valgrind-3.14.0.tar
# 进入目录
cd valgrind-3.14.0/
# 生成 Makefile 文件
./configure
# 编译
make
# 安装
# sudo make install
make install
测试 valgrind 是否有效
测试命令:valgrind ls -l
(测试 ls -l
命令是否内存情况)
可以看到有类型以下的运行结果:
==17057== Memcheck, a memory error detector
==17057== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==17057== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==17057== Command: ls -l
==17057==
total 372
-rw-r--r-- 1 root root 11699 Feb 27 07:46 CMakeCache.txt
drwxr-xr-x 5 root root 4096 Feb 27 07:30 CMakeFiles
-rw-r--r-- 1 root root 6344 Feb 27 07:46 Makefile
-rw-r--r-- 1 root root 1384 Feb 27 07:46 cmake_install.cmake
-rw-r--r-- 1 root root 351584 Feb 27 07:46 judgeHdr
==17057==
==17057== HEAP SUMMARY:
==17057== in use at exit: 19,666 bytes in 12 blocks
==17057== total heap usage: 122 allocs, 110 frees, 75,309 bytes allocated
==17057==
==17057== LEAK SUMMARY:
==17057== definitely lost: 0 bytes in 0 blocks
==17057== indirectly lost: 0 bytes in 0 blocks
==17057== possibly lost: 0 bytes in 0 blocks
==17057== still reachable: 19,666 bytes in 12 blocks
==17057== suppressed: 0 bytes in 0 blocks
==17057== Rerun with --leak-check=full to see details of leaked memory
==17057==
==17057== For counts of detected and suppressed errors, rerun with: -v
==17057== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
那就是可以使用 valgrind 工具了。
注意:如果遇到类似以下信息的时候
==15505== Memcheck, a memory error detector
==15505== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==15505== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==15505== Command: ./a.out
==15505==
valgrind: Fatal error at startup: a function redirection
valgrind: which is mandatory for this platform-tool combination
valgrind: cannot be set up. Details of the redirection are:
valgrind:
valgrind: A must-be-redirected function
valgrind: whose name matches the pattern: strlen
valgrind: in an object with soname matching: ld-linux-x86-64.so.2
valgrind: was not found whilst processing
valgrind: symbols from the object with soname: ld-linux-x86-64.so.2
valgrind:
valgrind: Possible fixes: (1, short term): install glibc's debuginfo
valgrind: package on this machine. (2, longer term): ask the packagers
valgrind: for your Linux distribution to please in future ship a non-
valgrind: stripped ld.so (or whatever the dynamic linker .so is called)
valgrind: that exports the above-named function using the standard
valgrind: calling conventions for this platform. The package you need
valgrind: to install for fix (1) is called
valgrind:
valgrind: On Debian, Ubuntu: libc6-dbg
valgrind: On SuSE, openSuSE, Fedora, RHEL: glibc-debuginfo
valgrind:
valgrind: Note that if you are debugging a 32 bit process on a
valgrind: 64 bit system, you will need a corresponding 32 bit debuginfo
valgrind: package (e.g. libc6-dbg:i386).
valgrind:
valgrind: Cannot continue -- exiting now. Sorry.
请根据你的 linux 版本,安装相应的依赖库。
Debian, Ubuntu:libc6-dbg
SuSE, openSuSE, Fedora, RHEL:glibc-debuginfo
如果是 32 位系统,请安装 32 位的库版本。
以 Ubuntu 64 位系统为例,安装命令为:
sudo apt-get install libc6-dbg
先准备一个测试代码(test_valgrind.cpp
):
#include
#include
#include
#include
char * getTestStr() {
char * str = (char *) malloc(10); // malloc
char test[] = "hello";
strcpy(str, test);
return str;
}
int main() {
char * str = getTestStr();
printf("test : %s\n", str);
// free(str); // free
for (int i = 0; i < 8; ++i) {
char * s = new char[10]; // new
s[0] = i + 'a';
s[1] = 0;
// delete[] s; // delete
}
return 0;
}
以上代码 free 和 delete 都是被注释掉的。
使用 memcheck 工具:
# 使用 -g 是保留调试信息,方便代码定位
g++ test_valgrind.cpp -g -o test_valgrind
# --tool=memcheck 指定使用的 memcheck 工具
# --leak-check=full 完全检查内存泄漏
# --show-reachable=yes 显示内存泄漏的地点
# --trace-children=yes 跟入子进程
# --max-stackframe=[default: 2000000] 栈最大值(默认2000000),如果栈指针的偏移超过这个数量,valgrind 则会认为程序是切换到另一个栈执行。
# --log-file=./valgrind.log LOG保存文件
valgrind --tool=memcheck --leak-check=full --show-reachable=yes --trace-children=yes --log-file=./valgrind.log ./test_valgrind &
运行结果:
==17157== Memcheck, a memory error detector
==17157== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==17157== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==17157== Command: ./test_valgrind
==17157==
test : hello
==17157==
==17157== HEAP SUMMARY:
==17157== in use at exit: 72,794 bytes in 10 blocks
==17157== total heap usage: 11 allocs, 1 frees, 73,818 bytes allocated
==17157==
==17157== 10 bytes in 1 blocks are definitely lost in loss record 1 of 3
==17157== at 0x4C2DE56: malloc (vg_replace_malloc.c:299)
==17157== by 0x400866: getTestStr() (test_valgrind.cpp:7)
==17157== by 0x4008B1: main (test_valgrind.cpp:14)
==17157==
==17157== 80 bytes in 8 blocks are definitely lost in loss record 2 of 3
==17157== at 0x4C2EB1B: operator new[](unsigned long) (vg_replace_malloc.c:423)
==17157== by 0x4008E2: main (test_valgrind.cpp:18)
==17157==
==17157== 72,704 bytes in 1 blocks are still reachable in loss record 3 of 3
==17157== at 0x4C2DE56: malloc (vg_replace_malloc.c:299)
==17157== by 0x4EC4EFF: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
==17157== by 0x40106B9: call_init.part.0 (dl-init.c:72)
==17157== by 0x40107CA: call_init (dl-init.c:30)
==17157== by 0x40107CA: _dl_init (dl-init.c:120)
==17157== by 0x4000C69: ??? (in /lib/x86_64-linux-gnu/ld-2.23.so)
==17157==
==17157== LEAK SUMMARY:
==17157== definitely lost: 90 bytes in 9 blocks
==17157== indirectly lost: 0 bytes in 0 blocks
==17157== possibly lost: 0 bytes in 0 blocks
==17157== still reachable: 72,704 bytes in 1 blocks
==17157== suppressed: 0 bytes in 0 blocks
==17157==
==17157== For counts of detected and suppressed errors, rerun with: -v
==17157== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
可以看到 getTestStr() (test_valgrind.cpp:7)
和 main (test_valgrind.cpp:18)
存在问题(剩下一个是 valgrind 导致的)。
正常程序的运行结果(去掉注释的 free 和 delete 即可):
==17164== Memcheck, a memory error detector
==17164== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==17164== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==17164== Command: ./a.out
==17164==
test : hello
==17164==
==17164== HEAP SUMMARY:
==17164== in use at exit: 72,704 bytes in 1 blocks
==17164== total heap usage: 11 allocs, 10 frees, 73,818 bytes allocated
==17164==
==17164== 72,704 bytes in 1 blocks are still reachable in loss record 1 of 1
==17164== at 0x4C2DE56: malloc (vg_replace_malloc.c:299)
==17164== by 0x4EC4EFF: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
==17164== by 0x40106B9: call_init.part.0 (dl-init.c:72)
==17164== by 0x40107CA: call_init (dl-init.c:30)
==17164== by 0x40107CA: _dl_init (dl-init.c:120)
==17164== by 0x4000C69: ??? (in /lib/x86_64-linux-gnu/ld-2.23.so)
==17164==
==17164== LEAK SUMMARY:
==17164== definitely lost: 0 bytes in 0 blocks
==17164== indirectly lost: 0 bytes in 0 blocks
==17164== possibly lost: 0 bytes in 0 blocks
==17164== still reachable: 72,704 bytes in 1 blocks
==17164== suppressed: 0 bytes in 0 blocks
==17164==
==17164== For counts of detected and suppressed errors, rerun with: -v
==17164== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
关于 valgrind
命令的参数,可以通过 valgrind -h
查看。
见手册 http://www.valgrind.org/docs/manual/mc-manual.html
,关注关注 definitely lost 和 possible lost 处可能的内存泄露。
---> a. Illegal read / Illegal write errors
已释放后的内存进行读 / 写或数组的越界访问等,valgrind 报告程序在尝试读写非法内存,造成的后果是程序易发生 segmentation fault(吐 core)风险。Invalid write of size X,访问超出了范围的内存,试图从该内存读取数据
---> b. Use of uninitialised values
检测使用未初始化变量,可以检测在条件判断语句中使用未初始化变量,因此应该养成在声明变量时就进行初始化的习惯
Conditional jump or move depends on uninitialised value(s)
---> c. When a heap block is freed with an inappropriate deallocation function
使用不正确的方法释放内存,比如 new/delete
, malloc/free
使用混淆了,new 分配的地方用 free 释放了Mismatched free() / delete / delete []
---> d. Illegal frees
重复内存释放,比如 2 次使用 free 释放同一块内存Invalid free() / delete / delete[]
---> e. Overlapping source and destination blocks
针对 C 语言常见的 memcpy, strcpy, strncpy, strcat, strncat 拷贝类函数,出现源串和目标串地址重叠Source and destination overlap in memcpy(0xbffff294, 0xbffff280, 21)
---> f. Fishy argument values
所有内存分配类函数在分配内存时,都会传入一个内存分配大小的数,如果你传入的数大于机器所能分配的最大数时,如 64 位机器上,你传入的 size 大于了 2**63Argument ‘size’ of function malloc has a fishy (possibly negative) value: -3
---> g. Leak detecion
HEAP SUMMARY: 如果指定 --show-reachable=yes,在程序退出时 memcheck 会收集 reachable and indirectly lost blocks
LEAK SUMMARY: memcheck 会记录由 malloc/new 等函数创建的所有的 heap blocks,因此在程序退出时,memcheck 能够知道哪些 block 没有 free。如下:
definitely lost: 4 bytes in 1 blocks
indirectly lost: 0 bytes in 0 blocks
possibly lost: 0 bytes in 0 blocks
still reachable: 95 bytes in 6 blocks
definitely lost
:如果一个 block 在程序在退出后,memcheck 找不到指向它的 pointer,一般是由于在代码中对该 block 没有 free 造成的,要重点关注。
possibly lost
:在程序退出时,memcheck 发现仍有 interior pointer(如果一个 pointer 指向一个 block 的中间某个位置)指向一个 block,那么该 block 被认为是 possibly lost。
still reachable
:如果 memcheck 发现仍有 start pointer 指向一个 block,那么该 block 就是 still reachable。
使用方法
代码加头文件: #include
。
更改环境变量: setenv("MALLOC_TRACE", "mtrace.out", 1);
增加调用函数:mtrace();
编译参数增加:-g
主要框架:
#include
...
int main() {
setenv("MALLOC_TRACE", "mtrace.out", 1);
mtrace();
...
retrun 0;
}
// g++ -g -Wall test_valgrind_mtrace.cpp && ./a.out && mtrace a.out mtrace.out
内存泄漏测试代码(test_valgrind_mtrace.cpp):
#include
#include
#include
#include
#include
char * getTestStr() {
char * str = (char *) malloc(10); // malloc
char test[] = "hello";
strcpy(str, test);
return str;
}
int main() {
setenv("MALLOC_TRACE", "mtrace.out", 1);
mtrace();
char * str = getTestStr();
printf("test : %s\n", str);
// free(str); // free
for (int i = 0; i < 8; ++i) {
char * s = new char[10]; // new
s[0] = i + 'a';
s[1] = 0;
// delete[] s; // delete
}
return 0;
}
测试结果:
# 命令
g++ -g -Wall test_valgrind_mtrace.cpp && ./a.out && mtrace a.out mtrace.out
# 输出
test : hello
- 0x0000000001951c20 Free 12 was never alloc'd 0x7faf1cfdbe9d
- 0x0000000001951cb0 Free 13 was never alloc'd 0x7faf1d0a691f
- 0x0000000001951cd0 Free 14 was never alloc'd 0x7faf1d11623c
Memory not freed:
-----------------
Address Size Caller
0x0000000001952140 0xa at /data/code/zone/test_valgrind_mtrace.cpp:9
0x0000000001952570 0xa at 0x7faf1d3f9e78
0x0000000001952590 0xa at 0x7faf1d3f9e78
0x00000000019525b0 0xa at 0x7faf1d3f9e78
0x00000000019525d0 0xa at 0x7faf1d3f9e78
0x00000000019525f0 0xa at 0x7faf1d3f9e78
0x0000000001952610 0xa at 0x7faf1d3f9e78
0x0000000001952630 0xa at 0x7faf1d3f9e78
0x0000000001952650 0xa at 0x7faf1d3f9e78
对于 malloc()
方法能跟踪到具体的代码。
但是对于 new
只能识别到具体的地址,而不能跟踪到具体的代码。
使用未初始化的内存
==1001== Use of uninitialised value of size 8
在内存被释放后进行读 / 写
==1001== Invalid read of size 1
从已分配内存块的尾部进行读 / 写
==1001== Invalid read of size 1
内存泄露
==1001== LEAK SUMMARY
不匹配地使用 malloc/new/new [] 和 free/delete/delete []
==1001== Mismatched free() / delete / delete []
两次释放内存
==1001== Invalid free() / delete / delete[]