【作者:张佩】 【原文:http://www.yiiyee.cn/Blog/0x19-1/】
内核在管理内存的时候,为了提高内存使用效率,对于小片内存的申请(小于一个PAGE大小),都是通过内存池来操作的。系统里面有两种不同的内存池:分页内存池和非分页内存池。这二者的区别是很明显的:分页内存池所使用的内存页面,随时有可能被分页出去;而非分页内存池所使用的虚拟页面,总是留驻在物理内存中。
对于运行在高中断级别(>=DISPATCH_LEVEL 2)上的代码,它使用的内存只应该是从非分页内存池中申请的。因为系统无法在这些中断级上处理页错误。
除了上面的区别外,系统对两个内存池的管理是极类似的。那么,系统是怎么管理这些内存池的呢?当请求者申请内存的时候,池管理器首先会检查自己当前的储备内存,以确认能否满足申请者的要求,如果可以,则从储备内存中进行分配;否则就重新申请一个新的内存页,并从新内存页中划出一块给申请者,剩下的留作储备内存继续使用。
池管理器把一个页面分成若干个小块(可称为:Entry),第一个Entry是从页的起始地址开始的,而最后一个Entry则恰好到达页尾。每个Entry都有一个描述符,管理器通过这些描述符有效管理这些无数的小块内存。描述符被定义为POOL_HEADER,下面是这个结构体在32位Win7系统上的定义:
0: kd> dt nt!_pool_header +0x000 PreviousSize : Pos 0, 9 Bits +0x000 PoolIndex : Pos 9, 7 Bits +0x002 BlockSize : Pos 0, 9 Bits +0x002 PoolType : Pos 9, 7 Bits //+0x000 Ulong1 : Uint4B +0x004 PoolTag : Uint4B //+0x004 AllocatorBackTraceIndex : Uint2B //+0x006 PoolTagHash : Uint2B
在64位系统上,这个结构体定义略有区别,但不影响本文描述。这个描述符位于每个Entry的头部,所以它的名字POOL_HEADER是名副其实的。里面包含5个成员变量,介绍如下:
PriviouseSize和BlockSize这两个变量很有趣,把它们两个结合在一起有很强大的作用,系统用它们来检查内存池页面的完整性。如果完整性被破坏,系统就会报第一个参数为0x20的BAD_POOL_CALLER(0x19)蓝屏。
这次拿到的dump文件就是这样的一个例子,运行自动分析命令:
0: kd> !analyze -v ******************************************************************* * * * Bugcheck Analysis * * * ******************************************************************* BAD_POOL_HEADER (19) The pool is already corrupt at the time of the current request. This may or may not be due to the caller. The internal pool links must be walked to figure out a possible cause of the problem, and then special pool applied to the suspect tags or the driver verifier to a suspect driver. Arguments: Arg1: 00000020, a pool block header size is corrupt. Arg2: 8739ed50, The pool entry we were looking for within the page. Arg3: 8739ed88, The next pool entry. Arg4: 18070009, (reserved) Debugging Details: ------------------------------------------------------- BUGCHECK_STR: 0x19_20 POOL_ADDRESS: 8739ed50 Nonpaged pool DEFAULT_BUCKET_ID: WIN7_DRIVER_FAULT PROCESS_NAME: xxxusermode.exe CURRENT_IRQL: 2 IRP_ADDRESS: 013e8e48 LAST_CONTROL_TRANSFER: from 82c8c588 to 82d2cc6b STACK_TEXT: 99f8f8d8 82c8c588 8739ed58 00000000 b99f0218 nt!ExFreePoolWithTag+0x1b1 99f8f924 82c8403f 013e8e88 99f8f96c 99f8f964 nt!IopCompleteRequest+0xe6 99f8f974 82f3db64 00000000 b13e8e48 b13e8e48 nt!IopfCompleteRequest+0x3b4 99f8f9dc 95af0c5a 8a1c8978 86c97008 99f8f9fc nt!IovCompleteRequest+0x133 99f8f9ec 95aef48f 86c94180 b13e8e48 99f8fa14 ks!CKsFilter::DispatchDeviceIoControl+0x68 99f8f9fc 95adf0ba 86c94180 b13e8e48 b13e8e48 ks!KsDispatchIrp+0xb0 99f8fa14 82f3d6c3 86c94180 b13e8e48 8a98fa48 ks!CKsDevice::PassThroughIrp+0x46 99f8fa38 82c42bd5 00000000 b13e8e48 86c94180 nt!IovCallDriver+0x258 99f8fa4c 82e36bf9 8a98fa48 b13e8e48 b13e8fd8 nt!IofCallDriver+0x1b 99f8fa6c 82e39de2 86c94180 8a98fa48 00000000 nt!IopSynchronousServiceTail+0x1f8 99f8fb08 82e80764 86c94180 b13e8e48 00000000 nt!IopXxxControlFile+0x6aa 99f8fb3c 8932cc90 000003c0 00000000 00000000 nt!NtDeviceIoControlFile+0x2a WARNING: Stack unwind information not available. Following frames may be wrong. 99f8fd04 82c498a6 000003c0 00000000 00000000 DgSafe+0x19c90 99f8fd04 77967094 000003c0 00000000 00000000 nt!KiSystemServicePostCall 0027ed44 00000000 00000000 00000000 00000000 0x77967094 STACK_COMMAND: kb FOLLOWUP_IP: ks!CKsFilter::DispatchDeviceIoControl+68 95af0c5a 8bc7 mov eax,edi SYMBOL_STACK_INDEX: 4 SYMBOL_NAME: ks!CKsFilter::DispatchDeviceIoControl+68 FOLLOWUP_NAME: MachineOwner MODULE_NAME: ks IMAGE_NAME: ks.sys DEBUG_FLR_IMAGE_TIMESTAMP: 4ce799d9 FAILURE_BUCKET_ID: 0x19_20_VRF_ks!CKsFilter::DispatchDeviceIoControl+68 BUCKET_ID: 0x19_20_VRF_ks!CKsFilter::DispatchDeviceIoControl+68 Followup: MachineOwner ---------
注意几个地方:
Windows的内核DDI在32位系统上是使用STDCALL调用协议的,就是通过栈传递参数。这样我们就能得到它的参数:
99f8f8d8 82c8c588 8739ed58 00000000 b99f0218 nt!ExFreePoolWithTag+0x1b1
还原一下,它的调用是这样的:
ExFreePoolWithTag (8739ed58, 0);
对照第一点的两个地址,它和当前操作Entry的地址是很近似的。相差了8个字节,正好是POOL_HEADER结构体的长度:
0: kd> dt nt!_pool_header 8739ed50 +0x000 PreviousSize : 0y000001001 (0x9) // 9 * 8 = 0x48 +0x000 PoolIndex : 0y0000000 (0) +0x002 BlockSize : 0y000000111 (0x7) // 7 * 8 = 0x38 +0x002 PoolType : 0y0001100 (0xc) +0x000 Ulong1 : 0x18070009 +0x004 PoolTag : 0x7070534b +0x004 AllocatorBackTraceIndex : 0x534b +0x006 PoolTagHash : 0x7070
问题出在哪里呢? 运行一下!pool命令:
0: kd> !pool 8739ed50
Pool page 8739ed50 region is Nonpaged pool
8739e000 size: d0 previous size: 0 (Free) Ntfx
8739e0d0 size: 68 previous size: d0 (Allocated) FMsl
8739e138 size: 68 previous size: 68 (Allocated) EtwR (Protected)
8739e1a0 size: 68 previous size: 68 (Allocated) EtwR (Protected)
8739e208 size: 68 previous size: 68 (Allocated) Mdl
8739e270 size: 18 previous size: 68 (Allocated) MmSi
8739e288 size: 20 previous size: 18 (Allocated) USBB
8739e2a8 size: 68 previous size: 20 (Allocated) FMsl
8739e310 size: 10 previous size: 68 (Free) NSpg
8739e320 size: 48 previous size: 10 (Allocated) Vad
8739e368 size: 10 previous size: 48 (Free) NKBS
8739e378 size: 48 previous size: 10 (Allocated) Vad
8739e3c0 size: c8 previous size: 48 (Allocated) File (Protected)
8739e488 size: 1f8 previous size: c8 (Allocated) z...
8739e680 size: 68 previous size: 1f8 (Allocated) EtwR (Protected)
8739e6e8 size: 18 previous size: 68 (Allocated) MmSi
8739e700 size: 40 previous size: 18 (Allocated) Even (Protected)
8739e740 size: 2e8 previous size: 40 (Allocated) Thre (Protected)
8739ea28 size: 68 previous size: 2e8 (Allocated) FMsl
8739ea90 size: 68 previous size: 68 (Allocated) FMsl
8739eaf8 size: 18 previous size: 68 (Allocated) MmSi
8739eb10 size: 40 previous size: 18 (Allocated) SeTl
8739eb50 size: 40 previous size: 40 (Allocated) Even (Protected)
8739eb90 size: 40 previous size: 40 (Allocated) Even (Protected)
8739ebd0 size: 38 previous size: 40 (Allocated) AlIn
8739ec08 size: 50 previous size: 38 (Free) z...
8739ec58 size: 40 previous size: 50 (Allocated) MmIo
8739ec98 size: 28 previous size: 40 (Allocated) VadS
8739ecc0 size: 48 previous size: 28 (Allocated) Vad
8739ed08 size: 48 previous size: 48 (Allocated) Vad
*8739ed50 size: 38 previous size: 48 (Free ) *KSpp
Pooltag KSpp : irp system buffer property/method/event parameter
8739ed88 doesn't look like a valid small pool allocation, checking to see
if the entire page is actually part of a large page allocation...
8739ed88 is not a valid large pool allocation, checking large session pool...
8739ed88 is freed (or corrupt) pool
Bad allocation size @8739ed88, zero is invalid
***
*** An error (or corruption) in the pool was detected;
*** Attempting to diagnose the problem.
***
*** Use !poolval 8739e000 for more details.
Pool page [ 8739e000 ] is __inVALID.
Analyzing linked list...
[ 8739ed50 --> 8739edf0 (size = 0xa0 bytes)]: Corrupt region
Scanning for single bit errors...
None found
!pool在处理任何一个地址的时候,都会跳到地址所在页的起始处开始分析。当前系统的页大小为4K,可算出0x8739ed50对应的页起始地址是0x8739e000,也就是第一条Entry的地址。纵观从第一个entry开始往下,可以看到size和previous size这两个相映成趣的值,恰好是一个“之”型网络:
之型结构从第一个Entry开始,它的previous size为0;前一个entry的size应该和第二个Entry的previous size相等,否则就不对;一直到最后一个entry,都必须把这个关系保持下去。
看看现在这个链表,大概很快就知道问题出在哪里了。当前处理的Entry和它后面一个Entry之间的关系断裂了。
0: kd> dt nt!_pool_header 8739ed88 +0x000 PreviousSize : 0y000000000 (0) +0x000 PoolIndex : 0y0000000 (0) +0x002 BlockSize : 0y000000000 (0) +0x002 PoolType : 0y0000000 (0) +0x000 Ulong1 : 0 +0x004 PoolTag : 0 +0x004 AllocatorBackTraceIndex : 0 +0x006 PoolTagHash : 0 0: kd> db 8739ed88 L8 8739ed88 00 00 00 00 00 00 00 00
本该是POOL_HEADER的地方,内存已被清空为0。可能什么原因导致这个问题呢?最大的可能性,是驱动申请了一块地址为0x8739ed58d,大小为0x30的内存后,往内存中写入了超过0x30字节的内容,把后面本该属于下一个POOL_HEADER结构体的8个字节内容覆盖了。
对照这个分析进行调试,仔细观察出问题时的用户程序,发现用户程序发送了一个结构体给内核处理,但用户程序和内核定义对同一个结构体的定义不一致,内核所定义的 结构体比用户程序多出两个变量。最终问题得到了解决。