如何分析堆栈出错的 dmp 文件

原文:http://bbs.kanxue.com/showthread.php?t=51141 标 题: 【讨论】如何分析堆栈出错的 dmp 文件 作 者: 小喂 时 间: 2007-09-05,15:41:07 链 接: http://bbs.pediy.com/showthread.php?t=51141 如何分析堆栈出错的 dmp 文件 分析程序出错生成的 dmp 文件是事后分析的主要工作。第一步往往都是使用 WinDbg 自带的 !analyze -v 命令先进行初步分析,得到出错地址和出错堆栈后再进行详细分析。 本文介绍一个方法,当 !analyze -v 不好使的时候应该怎么得到出错地址和出错堆栈。 int sum(int x, int y) { __asm mov ebp, 0 return (x + y); } int sumstub(int x, int y) { int tmp = 0; printf("enter fun() .../n"); tmp = sum(x, y); printf("leave fun() .../n"); return tmp; } int main(int argc, char* argv[]) { printf("enter main() .../n"); printf("sum = %d/n", sumstub(0x1234, 0x5678)); printf("leave main() .../n"); return 0; } 示例程序比较简单,在 sum 函数里面会把 ebp 清零,下面取 x 或者 y 的值时就会出错。 用 WinDbg 打开出错得到的 dmp 文件,先用 !analyze -v 分析,结果如下: 0:000> !analyze -v ******************************************************************************* * * * Exception Analysis * * * ******************************************************************************* *** WARNING: Unable to verify checksum for Dump01.exe *** ERROR: Symbol file could not be found. Defaulted to export symbols for lpk.dll - *** ERROR: Symbol file could not be found. Defaulted to export symbols for Sysfer.dll - *** ERROR: Symbol file could not be found. Defaulted to export symbols for usp10.dll - *** ERROR: Symbol file could not be found. Defaulted to export symbols for imm32.dll - *** ERROR: Symbol file could not be found. Defaulted to export symbols for apphelp.dll - *** ERROR: Symbol file could not be found. Defaulted to export symbols for version.dll - *** ERROR: Symbol file could not be found. Defaulted to export symbols for advapi32.dll - *** ERROR: Symbol file could not be found. Defaulted to export symbols for shlwapi.dll - FAULTING_IP: +0 00000000 ?? ??? EXCEPTION_RECORD: ffffffff -- (.exr 0xffffffffffffffff) ExceptionAddress: 00000000 ExceptionCode: 80000007 (Wake debugger) ExceptionFlags: 00000000 NumberParameters: 0 BUGCHECK_STR: 80000007 PROCESS_NAME: Dump01.exe ERROR_CODE: (NTSTATUS) 0x80000007 - { NTGLOBALFLAG: 0 APPLICATION_VERIFIER_FLAGS: 0 DERIVED_WAIT_CHAIN: Dl Eid Cid WaitType -- --- ------- -------------------------- 0 62c.928 Unknown WAIT_CHAIN_COMMAND: ~0s;k;; BLOCKING_THREAD: 00000928 DEFAULT_BUCKET_ID: APPLICATION_HANG_HungIn_ExceptionHandler PRIMARY_PROBLEM_CLASS: APPLICATION_HANG_HungIn_ExceptionHandler LAST_CONTROL_TRANSFER: from 7c92e9ab to 7c92eb94 FAULTING_THREAD: 00000928 STACK_TEXT: 0012f3b8 7c92e9ab 7c86372c 00000002 0012f53c ntdll!KiFastSystemCallRet 0012f3bc 7c86372c 00000002 0012f53c 00000001 ntdll!ZwWaitForMultipleObjects+0xc 0012fb38 00401dda 0012fb74 0012ffb0 0012ffc0 kernel32!UnhandledExceptionFilter+0x8e4 0012fb48 00401198 c0000005 0012fb74 0040261b Dump01!_XcptFilter+0x13e 0012ffc0 7c816fd7 011dd65c 011dd664 7ffd6000 Dump01!mainCRTStartup+0xd1 0012fff0 00000000 004010c7 00000000 00000000 kernel32!BaseProcessStart+0x23 FOLLOWUP_IP: Dump01!_XcptFilter+13e 00401dda 5b pop ebx SYMBOL_STACK_INDEX: 3 SYMBOL_NAME: Dump01!_XcptFilter+13e FOLLOWUP_NAME: MachineOwner MODULE_NAME: Dump01 IMAGE_NAME: Dump01.exe DEBUG_FLR_IMAGE_TIMESTAMP: 46de4ed1 STACK_COMMAND: ~0s ; kb FAILURE_BUCKET_ID: 80000007_Dump01!_XcptFilter+13e BUCKET_ID: 80000007_Dump01!_XcptFilter+13e Followup: MachineOwner --------- 分析得到的出错地址为 0,堆栈也在内核里面。很明显这次 !analyze -v 命令出问题了,需要手动分析才能得到想要的信息。 0:000> ~*kv . 0 Id: 62c.928 Suspend: 1 Teb: 7ffdf000 Unfrozen ChildEBP RetAddr Args to Child 0012f3b8 7c92e9ab 7c86372c 00000002 0012f53c ntdll!KiFastSystemCallRet (FPO: [0,0,0]) 0012f3bc 7c86372c 00000002 0012f53c 00000001 ntdll!ZwWaitForMultipleObjects+0xc (FPO: [5,0,0]) 0012fb38 00401dda 0012fb74 0012ffb0 0012ffc0 kernel32!UnhandledExceptionFilter+0x8e4 (FPO: [Non-Fpo]) 0012fb48 00401198 c0000005 0012fb74 0040261b Dump01!_XcptFilter+0x13e 0012ffc0 7c816fd7 011dd65c 011dd664 7ffd6000 Dump01!mainCRTStartup+0xd1 0012fff0 00000000 004010c7 00000000 00000000 kernel32!BaseProcessStart+0x23 (FPO: [Non-Fpo]) 0:000> !teb TEB at 7ffdf000 ExceptionList: 0012fb28 StackBase: 00130000 StackLimit: 0012a000 SubSystemTib: 00000000 FiberData: 00001e00 ArbitraryUserPointer: 00000000 Self: 7ffdf000 EnvironmentPointer: 00000000 ClientId: 0000062c . 00000928 RpcHandle: 00000000 Tls Storage: 00000000 PEB Address: 7ffd6000 LastErrorValue: 0 LastStatusValue: 103 Count Owned Locks: 0 HardErrorMode: 0 先查看所有线程的堆栈信息,然后找出比较像出了问题的线程。本次示例只有一个线程,所以肯定是该线程出错。然后显示出错线程的 TEB 信息。 0:000> dps 0x0012a000 0x00130000 根据堆栈的位置和大小,显示堆栈的所有内容。 根据 Windows 异常处理流程可知,所有没被调试器处理的异常最终都会转到 ntdll!KiUserExceptionDispatcher 函数查找 SEH 异常处理例程来处理异常。所以在显示的堆栈信息中查找 ntdll!KiUserExceptionDispatcher 字符串。 0012fc50 00000000 0012fc54 7c92eafa ntdll!KiUserExceptionDispatcher+0xe 0012fc58 00000000 0012fc5c 0012fc84 再根据 KiUserExceptionDispatcher 函数的原型得到本次异常发生时保存的 CONTEXT 结构信息。 ; VOID ; KiUserExceptionDispatcher ( ; IN PEXCEPTION_RECORD ExceptionRecord, ; IN PCONTEXT ContextRecord ; ) 第二个参数指向 CONTEXT 结构,利用 WinDbg 的 .cxr 命令显示/切换 CONTEXT 结构。 0:000> .cxr 0x0012fc84 eax=00005678 ebx=7ffd6000 ecx=00001234 edx=7c92eb94 esi=011dd664 edi=011dd65c eip=0040100b esp=0012ff50 ebp=00000000 iopl=0 nv up ei pl nz na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010206 Dump01!sum+0xb: 0040100b 8b4508 mov eax,dword ptr [ebp+8] ss:0023:00000008=???????? 0:000> kv *** Stack trace for last set context - .thread/.cxr resets it ChildEBP RetAddr Args to Child 00000000 00000000 00000000 00000000 00000000 Dump01!sum+0xb (CONV: cdecl) [E:/Works/Dump01/Dump01.cpp @ 10] 现在已经找到出错地址为 0x0040100b,下面恢复正确的出错堆栈。 0:000> ?? sizeof(ntdll!_CONTEXT) unsigned int 0x2cc 0:000> ? 0x0012fc84 + 0x2cc Evaluate expression: 1245008 = 0012ff50 计算可知,出错前的堆栈位置在 0x0012ff50 处。 0:000> ub 0x0040100b L 6 Dump01!sum [E:/Works/Dump01/Dump01.cpp @ 7]: 00401000 55 push ebp 00401001 8bec mov ebp,esp 00401003 53 push ebx 00401004 56 push esi 00401005 57 push edi 00401006 bd00000000 mov ebp,0 0:000> dps 0x0012ff50 L 0x10 0012ff50 011dd65c 0012ff54 011dd664 0012ff58 7ffd6000 0012ff5c 0012ff70 0012ff60 0040103b Dump01!sumstub+0x25 [E:/Works/Dump01/Dump01.cpp @ 19] 0012ff64 00001234 0012ff68 00005678 0012ff6c 00000000 0012ff70 0012ff80 0012ff74 00401074 Dump01!main+0x1f [E:/Works/Dump01/Dump01.cpp @ 30] 0012ff78 00001234 0012ff7c 00005678 0012ff80 0012ffc0 0012ff84 0040117b Dump01!mainCRTStartup+0xb4 0012ff88 00000001 0012ff8c 00520eb0 0:000> r Last set context: eax=00005678 ebx=7ffd6000 ecx=00001234 edx=7c92eb94 esi=011dd664 edi=011dd65c eip=0040100b esp=0012ff50 ebp=00000000 iopl=0 nv up ei pl nz na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010206 Dump01!sum+0xb: 0040100b 8b4508 mov eax,dword ptr [ebp+8] ss:0023:00000008=???????? 反汇编出错地址前的几条指令,可以知道出错原因是 0x00401006 处的指令导致 ebp 被赋零,所以接下来取参数的指令出错。再根据堆栈信息,出错前往堆栈中压入了 ebx/esi/edi 几个寄存器的值,对比 0x0012ff50 处的堆栈,可知 0x0012ff50 正好是程序出错前的堆栈地址。同时还可以得到保存在堆栈上的 ebp 的值,从而得到正确的出错堆栈。 0:000> kv L = 0x0012ff5c ChildEBP RetAddr Args to Child 0012ff5c 0040103b 00001234 00005678 00000000 Dump01!sum+0xb (CONV: cdecl) 0012ff70 00401074 00001234 00005678 0012ffc0 Dump01!sumstub+0x25 (CONV: cdecl) 0012ff80 0040117b 00000001 00520eb0 00520e20 Dump01!main+0x1f (CONV: cdecl) 0012ffc0 7c816fd7 011dd65c 011dd664 7ffd6000 Dump01!mainCRTStartup+0xb4 0012fff0 00000000 004010c7 00000000 00000000 kernel32!BaseProcessStart+0x23 (FPO: [Non-Fpo]) 从这个堆栈来看,起始地址从 kernel32!BaseProcessStart 开始,结束地址也正好在出错地址处,应该是正确的出错堆栈。 ========== 小喂 2007.09.05

你可能感兴趣的:(thread,c,exception,File,command,application)