使用windbg找内存泄露(memory leak)
我们开发的系统需要在客户的电脑上持续运行,可是客户报告在运行几天后,程序会占用越来越多的内存,最后会产生下面的两个错误使之不能继续:
1. The application has no enough resource to display
2. The application crash with a log like "memory allocation failed".
怎么办?上windbg。
准备工作
Gflags.exe -i excel.exe +ust
第一次记录
让系统运行一段时间,用windbg attach 它的进程,运行下面的命令
0:025> !heap -s
LFH Key : 0xeaafe2e0
Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast
(k) (k) (k) (k) length blocks cont. heap
-----------------------------------------------------------------------------
00160000 00000002 32768 28204 28492 1460 239 19 0 f LFH
00260000 00001002 64 40 40 3 1 1 0 0 L
00270000 00008000 64 12 12 10 1 1 0 0
004e0000 00000002 64 8 8 0 0 1 0 0 L
00030000 00001002 1088 72 72 9 2 1 0 0 L
00480000 00001002 7232 3444 3444 36 5 2 0 0 L
004c0000 00001002 1088 252 252 5 1 1 0 0 L
004d0000 00001002 64 12 12 4 1 1 0 0 L
01060000 00001002 64 16 16 2 2 1 0 0 L
01120000 00000002 1024 24 24 0 0 1 0 0 L
010b0000 00001002 256 32 32 0 0 1 0 0 L
01660000 00001002 3136 2796 2828 377 13 7 0 0 L
External fragmentation 13 % (13 free blocks)
01680000 00001002 64 32 32 3 0 1 0 0 L
01690000 00041002 256 12 12 0 0 1 0 0 L
01790000 00001003 256 104 116 60 9 3 0 bad
017d0000 00001003 256 4 4 2 1 1 0 bad
01810000 00001003 256 4 4 2 1 1 0 bad
030d0000 00001003 256 4 4 2 1 1 0 bad
03110000 00001003 256 4 4 2 1 1 0 bad
01850000 00001002 64 20 20 2 1 1 0 0 L
03560000 00001002 1280 664 776 22 7 4 0 0 L
04780000 00001003 256 8 8 2 1 1 0 bad
047c0000 00001003 256 4 4 2 1 1 0 bad
04800000 00001003 256 4 4 2 1 1 0 bad
04840000 00001003 256 4 4 2 1 1 0 bad
04880000 00001003 256 4 4 2 1 1 0 bad
048e0000 00001002 256 16 16 4 1 1 0 0 L
04920000 00001002 1088 1012 1024 111 7 3 0 0 L
04930000 00001002 3136 940 940 153 9 2 0 8d L
04ce0000 00001002 64 16 16 0 0 1 0 0 L
04cf0000 00001002 1088 192 192 6 2 1 0 0 L
05850000 00001002 64 28 28 1 1 1 0 0 L
05de0000 00001002 64 12 12 3 1 1 0 0 L
第二次记录
Detatch the windbg from the excel process
让它再运行一段时间,用windbg attach 它的进程,运行下面的命令
0:025> !heap -s
LFH Key : 0xeaafe2e0
Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast
(k) (k) (k) (k) length blocks cont. heap
-----------------------------------------------------------------------------
00160000 00000002 32768 28204 28492 1460 239 19 0 f LFH
00260000 00001002 64 40 40 3 1 1 0 0 L
00270000 00008000 64 12 12 10 1 1 0 0
004e0000 00000002 64 8 8 0 0 1 0 0 L
00030000 00001002 1088 72 72 9 2 1 0 0 L
00480000 00001002 7232 3444 3444 36 5 2 0 0 L
004c0000 00001002 1088 252 252 5 1 1 0 0 L
004d0000 00001002 64 12 12 4 1 1 0 0 L
01060000 00001002 64 16 16 2 2 1 0 0 L
01120000 00000002 1024 24 24 0 0 1 0 0 L
010b0000 00001002 256 32 32 0 0 1 0 0 L
01660000 00001002 3136 2796 2828 377 13 7 0 0 L
External fragmentation 13 % (13 free blocks)
01680000 00001002 64 32 32 3 0 1 0 0 L
01690000 00041002 256 12 12 0 0 1 0 0 L
01790000 00001003 256 104 116 60 9 3 0 bad
017d0000 00001003 256 4 4 2 1 1 0 bad
01810000 00001003 256 4 4 2 1 1 0 bad
030d0000 00001003 256 4 4 2 1 1 0 bad
03110000 00001003 256 4 4 2 1 1 0 bad
01850000 00001002 64 20 20 2 1 1 0 0 L
03560000 00001002 1280 664 776 22 7 4 0 0 L
04780000 00001003 256 8 8 2 1 1 0 bad
047c0000 00001003 256 4 4 2 1 1 0 bad
04800000 00001003 256 4 4 2 1 1 0 bad
04840000 00001003 256 4 4 2 1 1 0 bad
04880000 00001003 256 4 4 2 1 1 0 bad
048e0000 00001002 256 16 16 4 1 1 0 0 L
04920000 00001002 1088 3012 3024 511 7 3 0 0 L
04930000 00001002 3136 940 940 153 9 2 0 8d L
04ce0000 00001002 64 16 16 0 0 1 0 0 L
04cf0000 00001002 1088 192 192 6 2 1 0 0 L
05850000 00001002 64 28 28 1 1 1 0 0 L
05de0000 00001002 64 12 12 3 1 1 0 0 L
比较第一次和第二次,发现在0x04920000上的内存有明显的增长
执行!heap -stat -h 04920000 去观察这段内存的详细情况
0:025> !heap -stat -h 04920000
heap @ 04920000
group-by: TOTSIZE max-display: 20
size #blocks total ( %) (percent of total busy bytes)
4 21a29 - 82cd0 (94.77)
d0 2a - 2220 (1.06)
20 cd - 19a0 (0.79)
90 2d - 1950 (0.78)
be0 2 - 17c0 (0.74)
e0 1b - 17a0 (0.73)
f0 19 - 1770 (0.73)
1f0 b - 1550 (0.66)
200 a - 1400 (0.62)
40 4f - 13c0 (0.61)
240 7 - fc0 (0.49)
bd0 1 - bd0 (0.37)
发现这段内存主要是由size=4的内存构成的,而内存泄漏通常都是同一size的内存只分配,但没有释放引起的,所以,这个是值得高度怀疑的。
执行!heap -flt s 4 去查进程中size=4的所有内存,
_HEAP @ 04920000
03659ab8 0002 0002 [01] 03659ac0 00004 - (busy)
03659ac8 0003 0002 [01] 03659ad0 00004 - (busy)
0365e8e8 0002 0003 [01] 0365e8f0 00004 - (busy)
0f2b9fe8 0003 0002 [11] 0f2b9ff0 00004 - (busy)
0f2d9760 0003 0003 [01] 0f2d9768 00004 - (busy)
0f2dcc20 0002 0003 [01] 0f2dcc28 00004 - (busy)
0f2dcc50 0002 0002 [01] 0f2dcc58 00004 - (busy)
0f2dd790 0002 0002 [01] 0f2dd798 00004 - (busy)
0f2dd7c0 0002 0002 [01] 0f2dd7c8 00004 - (busy)
0f2de260 0002 0002 [01] 0f2de268 00004 - (busy)
0f2de290 0002 0002 [01] 0f2de298 00004 - (busy)
0f2de2a0 0003 0002 [01] 0f2de2a8 00004 - (busy)
0f2df740 0002 0003 [01] 0f2df748 00004 - (busy)
0f2e0270 0002 0002 [01] 0f2e0278 00004 - (busy)
0f2e02a0 0002 0002 [01] 0f2e02a8 00004 - (busy)
0f2e02e0 0003 0002 [01] 0f2e02e8 00004 - (busy)
0f2e1270 0002 0003 [01] 0f2e1278 00004 - (busy)
0f2e1ce0 0002 0002 [01] 0f2e1ce8 00004 - (busy)
0f2e1d10 0002 0002 [01] 0f2e1d18 00004 - (busy)
0f2e27d0 0002 0002 [01] 0f2e27d8 00004 - (busy)
0f2e2800 0002 0002 [01] 0f2e2808 00004 - (busy)
0f2e2cc0 0002 0002 [01] 0f2e2cc8 00004 - (busy)
0f2e2cf0 0002 0002 [01] 0f2e2cf8 00004 - (busy)
0f2e3340 0003 0002 [01] 0f2e3348 00004 - (busy)
0f2e3d20 0002 0003 [01] 0f2e3d28 00004 - (busy)
0f2e4890 0002 0002 [01] 0f2e4898 00004 - (busy)
0f2e48c0 0003 0002 [01] 0f2e48c8 00004 - (busy)
然后执行!heap -p -a 0365e8f0 该内存分配时的堆栈。 这样就可以定位到内存泄露的根源了。
Note:
1.To make sure the symbol are correctly loaded.
2.To make sure to use gflags to set the image file options to enable stack trace.
3.It turned out the memory leak is caused by the stringstream bug in VC2005,
http://social.msdn.microsoft.com/forums/en-US/vclanguage/thread/9a1fb540-3b40-48ac-95bd-a5d2d1af928d/
It can be fixed by either not using stringstream or install VS2005SP1