From:http://www.rainsts.net/article.asp?id=765
使用 Windbg 调试 Dump 文件是一种常用的手法,不同于 Attach Process,这种方式允许我们 "离线" 完成调试工作。Dump 文件保存了目标进程某个时间点的内存及相关程序信息镜像。本文只是一个简单的调试过程演示,更多细节可参考 Windbg 及 SOS.dll 的相关帮助。
class Program
{
private List list = new List();
void Test1()
{
for (int i = 0; i < 10; i++)
{
list.Add(new byte[1024 * 1024 * 10]);
}
}
void Test2()
{
new Thread(() =>
{
while (true)
{
}
}).Start();
}
static void Main(string[] args)
{
var o = new Program();
o.Test1();
o.Test2();
Console.WriteLine("Press any key to exit...");
Console.ReadKey(true);
Environment.Exit(0);
}
}
这个测试样本会造成 CPU 及内存占用过大,这也是我们日常调试工作中最常见的两个问题。
1. 抓取 Dump 文件
我们可以直接使用 Windbg 自带的 ADPlus.vbs 完成该工作。
C:\...\Windbg> adplus.vbs -hang -o z:\temp -p 6876
Attaching the debugger to: LEARN.CUI.EXE
(Process ID: 6876)
参数说明:
- -hang: 表示附加到目标进程,抓取 dump 镜像,然后解除。对应的参数是 -crash 崩溃模式,该参数会终止目标进程。
- -o: 指定 Dump 文件保存路径。
- -p: 指定目标进程 PID。
2. 使用 Windbg 调试 Dump 文件
(1) 启动 Windbg 打开 Dump 文件 (File -> Open Crash Dump...)。
Microsoft (R) Windows Debugger Version 6.9.0003.113 X86
Copyright (c) Microsoft Corporation. All rights reserved.
Loading Dump File [Z:\Temp\...\PID-6876__LEARN.CUI.EXE__full_1e84_2008-12-19_13-01-28-781_1adc.dmp]
User Mini Dump File with Full Memory: Only application data is available
...
Executable search path is:
Windows XP Version 2600 (Service Pack 3) MP (2 procs) Free x86 compatible
Product: WinNt, suite: SingleUserTS
Debug session time: Fri Dec 19 13:01:28.000 2008 (GMT+8)
System Uptime: 0 days 1:31:57.003
Process Uptime: 0 days 0:00:21.000
......................
This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(1adc.1aa4): Wake debugger - code 80000007 (first/second chance not available)
eax=00000014 ebx=00000000 ecx=79153810 edx=7c92e4f4 esi=0012f274 edi=00000000
eip=7c92e4f4 esp=0012f228 ebp=0012f248 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
ntdll!KiFastSystemCallRet:
7c92e4f4 c3 ret
(2) 载入 SOS.dll
0:000> .load sos
(3) 对于 CPU 占用过高的问题,别忘了 ThreadPool 也可能是造成问题的根源。
0:000> !threadpool
CPU utilization 0%
Worker Thread: Total: 0 Running: 0 Idle: 0 MaxLimit: 0 MinLimit: 0
Work Request in Queue: 0
--------------------------------------
Number of Timers: 0
--------------------------------------
Completion Port Thread:Total: 0 Free: 0 MaxFree: 0 CurrentLimit: 0 MaxLimit: 1000 MinLimit: 0
"CPU utilization 0%",看来这次并不是 ThreadPool 的问题。
(4) 我们看看是哪个线程占用 CPU 时间过多。
0:000> !runaway
User Mode Time
Thread Time
3:1a1c 0 days 0:00:20.984
2:1a30 0 days 0:00:00.000
1:1a14 0 days 0:00:00.000
0:1aa4 0 days 0:00:00.000
看来这个 Thread 3 就是我们的目标了。
(5) 切换到该线程,查看调用堆栈。
0:000> ~3 s
eax=00993034 ebx=012d4e98 ecx=012d2f18 edx=012d2f18 esi=012d2f18 edi=00000000
eip=00cd0285 esp=0115f8b4 ebp=0115f8b8 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
00cd0285 90 nop
0:003> !clrstack
OS Thread Id: 0x1a1c (3)
ESP EIP
0115f8b4 00cd0285 Learn.CUI.Program.b__0()
0115f8c0 792d6cf6 System.Threading.ThreadHelper.ThreadStart_Context(System.Object)
0115f8cc 792e019f System.Threading.ExecutionContext.Run(...)
0115f8e4 792d6c74 System.Threading.ThreadHelper.ThreadStart()
0115fb0c 79e71b4c [GCFrame: 0115fb0c]
(6) 看看这个 "Learn.CUI.Program.b__0()" 的 IL 代码。
0:003> !name2ee * Learn.CUI.Program
Module: 790c1000 (mscorlib.dll)
--------------------------------------
Module: 00992c5c (Learn.CUI.exe)
Token: 0x02000002
MethodTable: 0099304c
EEClass: 00991320
Name: Learn.CUI.Program
0:003> !dumpmt -md 0099304c
EEClass: 00991320
Module: 00992c5c
Name: Learn.CUI.Program
mdToken: 02000002 (D:\...\Learn.CUI.exe)
BaseSize: 0xc
ComponentSize: 0x0
Number of IFaces in IFaceMap: 0
Slots in VTable: 9
--------------------------------------
MethodDesc Table
Entry MethodDesc JIT Name
79286a70 79104934 PreJIT System.Object.ToString()
79286a90 7910493c PreJIT System.Object.Equals(System.Object)
79286b00 7910496c PreJIT System.Object.GetHashCode()
792f72f0 79104990 PreJIT System.Object.Finalize()
00cd00f8 0099302c JIT Learn.CUI.Program..ctor()
00cd0150 00993008 JIT Learn.CUI.Program.Test1()
00cd01d8 00993014 JIT Learn.CUI.Program.Test2()
00cd0070 00993020 JIT Learn.CUI.Program.Main(System.String[])
00cd0268 00993034 JIT Learn.CUI.Program.b__0()
0:003> !dumpmd 00993034
Method Name: Learn.CUI.Program.b__0()
Class: 00991320
MethodTable: 0099304c
mdToken: 06000005
Module: 00992c5c
IsJitted: yes
CodeAddr: 00cd0268
0:003> !dumpil 00993034
ilAddr = 00402088
IL_0000: nop
IL_0001: br.s IL_0005
IL_0003: nop
IL_0004: nop
IL_0005: ldc.i4.1
IL_0006: stloc.0
IL_0007: br.s IL_0103
有点问题,IL 并不完整。这通常是匿名方法造成的。没关系,继续。
(6) 我们将 Module 导出,然后用 Reflector.exe 查看源码。
0:003> !dumpdomain
Domain 1: 001647b0
LowFrequencyHeap: 001647d4
HighFrequencyHeap: 00164820
StubHeap: 0016486c
Stage: OPEN
SecurityDescriptor: 001532a0
Name: Learn.CUI.exe
Assembly: 00171418 [C:\WINDOWS\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll]
ClassLoader: 00171488
SecurityDescriptor: 0015d288
Module Name
790c1000 C:\WINDOWS\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll
Assembly: 0017a788 [D:\...\Learn.CUI.exe]
ClassLoader: 0017a7f8
SecurityDescriptor: 00172458
Module Name
00992c5c D:\...\Learn.CUI.exe
0:003> !SaveModule 00992c5c z:\temp\a.dll
3 sections in file
section 0 - VA=2000, VASize=af4, FileAddr=200, FileSize=c00
section 1 - VA=4000, VASize=5c0, FileAddr=e00, FileSize=600
section 2 - VA=6000, VASize=c, FileAddr=1400, FileSize=200
用 Reflector.exe 打开 a.dll,我们查看 Learn.CUI.Program.b__0() 方法的反编译结果。
[CompilerGenerated]
private static void b__0()
{
while (true)
{
}
}
很显然这就是问题之所在。
(7) 接下来,我们需要找出吃内存的大户。
0:003> !eeheap -gc
Number of GC Heaps: 1
generation 0 starts at 0x012d2e8c
generation 1 starts at 0x012d2e80
generation 2 starts at 0x012d1000
ephemeral segment allocation context: none
segment begin allocated size
012d0000 012d1000 012d8e98 0x00007e98(32408)
Large object heap starts at 0x022d1000
segment begin allocated size
022d0000 022d1000 02cd3260 0x00a02260(10494560)
032d0000 032d1000 03cd1010 0x00a00010(10485776)
042d0000 042d1000 04cd1020 0x00a00020(10485792)
052d0000 052d1000 05cd1020 0x00a00020(10485792)
062d0000 062d1000 06cd1020 0x00a00020(10485792)
072d0000 072d1000 07cd1020 0x00a00020(10485792)
082d0000 082d1000 08cd1020 0x00a00020(10485792)
092d0000 092d1000 09cd1020 0x00a00020(10485792)
0a760000 0a761000 0b161020 0x00a00020(10485792)
0b760000 0b761000 0c161020 0x00a00020(10485792)
Total Size 0x640a208(104899080)
------------------------------
GC Heap Size 0x640a208(104899080)
好家伙,LOH 上有一堆大户在打麻将。
(8) 查证这些大户的身份。
0:003> !dumpheap -min 85000 -stat
total 10 objects
Statistics:
MT Count TotalSize Class Name
7933335c 10 104857760 System.Byte[]
Total 10 objects
"System.Byte[]",嗯,继续。
(9) 找出大户的具体内存地址。
0:003> !dumpheap -type Byte[] -min 85000
Address MT Size
022d3250 7933335c 10485776
032d1000 7933335c 10485776
042d1000 7933335c 10485776
052d1000 7933335c 10485776
062d1000 7933335c 10485776
072d1000 7933335c 10485776
082d1000 7933335c 10485776
092d1000 7933335c 10485776
0a761000 7933335c 10485776
0b761000 7933335c 10485776
total 10 objects
(10) 挑一个出来,看看谁持有该大户的引用。
0:003> !gcroot 022d3250
Scan Thread 0 OSTHread 1aa4
ESP:12f468:Root:012d2e2c(Learn.CUI.Program)->
012d2e38(System.Collections.Generic.List`1[[System.Byte[], mscorlib]])->
012d2ec8(System.Byte[][])->
022d3250(System.Byte[])
Scan Thread 2 OSTHread 1a30
Scan Thread 3 OSTHread 1a1c
嘿嘿,Program 类型里面有个类型为 "List" 的家伙拿了该大户的把柄。继续看看 Program 对象的信息。
0:003> !do 012d2e2c
Name: Learn.CUI.Program
MethodTable: 0099304c
EEClass: 00991320
Size: 12(0xc) bytes
Fields:
MT Field Offset Type VT Attr Value Name
00000000 4000001 4 0 instance 012d2e38 list
79317cc4 4000002 4 ...ading.ThreadStart 0 static 012d2f18 CS$<>9__CachedAnonymousMethodDelegate1
很显然,问题就是这个 list 字段。
(11) 剩下来的工作,就是在 Reflector.exe 中查看我们前面 SaveModule 出来的那个 a.dll 源代码了。利用 Byte[] 和 list 搜索就无需我演示了吧。
private void Test1()
{
for (int i = 0; i < 10; i++)
{
this.list.Add(new byte[0xa00000]);
}
}
--------------