昨天写一个简单的驱动,驱动的write例程会将IRP挂起放进自定义的队列中,然后在另一个线程中取消这些挂起的IRP:
NTSTATUS SampleCharWriteAsync(PDEVICE_OBJECT devObj, PIRP irp)
{
KIRQL oldIrql;
SampleCharDevContext* devCtx = (SampleCharDevContext*)devObj->DeviceExtension;
IoMarkIrpPending(irp);
KeAcquireSpinLock(&devCtx->devWriteLock, &oldIrql);
IoSetCancelRoutine(irp, SampleCharCancelIrp);
if (irp->Cancel == TRUE)
{
if (IoSetCancelRoutine(irp, NULL) != NULL)
{
irp->IoStatus.Status = STATUS_CANCELLED;
irp->IoStatus.Information = 0;
IoCompleteRequest(irp, IO_NO_INCREMENT);
KeReleaseSpinLock(&devCtx->devWriteLock, oldIrql);
//return STATUS_PENDING;
return STATUS_CANCELLED; //返回STATUS_CANCELLED后会蓝屏
}
}
InsertTailList(&devCtx->pendWrListHead, &irp->Tail.Overlay.ListEntry);
KeReleaseSpinLock(&devCtx->devWriteLock, oldIrql);
return STATUS_PENDING;
}
开始时,没写测试代码,就想着手动置irp->Cancel为true。改完后进入if分支并执行了IoCompleteRequest,整个过程如此顺利,完全出乎意料。但是当执行完return STATUS_CANCELLED后蓝屏不期而至:
kd> kb
ChildEBP RetAddr Args to Child
8cccac34 82a814bc 851d06e8 868473f8 868473f8 SampleChar!SampleCharWriteAsync+0x3a [c:\studio\samplechar0x44\samplechar\samplechar.c @ 243]
WARNING: Stack unwind information not available. Following frames may be wrong.
8cccac4c 82c82eee 86916038 868473f8 8684748c nt!IofCallDriver+0x64
8cccac6c 82c837a2 851d06e8 86916038 00000001 nt!RtlRandomEx+0x1340
8cccad08 82a8842a 851d06e8 0000008c 00000000 nt!NtWriteFile+0x6ee
kd> ?? irp
struct _IRP * 0x868473f8
+0x024 Cancel : 0 ''
kd> eb 0x868473f8+0x024 1
kd> ?? irp
struct _IRP * 0x868473f8
+0x024 Cancel : 0x1 ''
kd> !analyze -v
REFERENCE_BY_POINTER (18)
Arguments:
Arg1: 84feb378, Object type of the object whose reference count is being lowered
Arg2: 86c76a08, Object whose reference count is being lowered
...
Debugging Details:
------------------
IRP_ADDRESS: 00129450
LAST_CONTROL_TRANSFER: from 82b24e71 to 82ab3394
STACK_TEXT:
...
a49c1bec 82ab0ed0 86c76a08 82aea363 612ca77a nt!ObfDereferenceObjectWithTag+0x4b
a49c1bf4 82aea363 612ca77a 87086f80 87129450 nt!ObfDereferenceObject+0xd
a49c1c38 82c85f36 00129490 a49c1c64 00000000 nt!IopCompleteRequest+0x24d
a49c1c6c 82c867a2 c0000120 87086f80 00000001 nt!IopSynchronousServiceTail+0x240
a49c1d08 82a8b42a 86312b98 0000008c 00000000 nt!NtWriteFile+0x6e8
a49c1d08 773464f4 86312b98 0000008c 00000000 nt!KiFastCallEntry+0x12a
00f6d5dc 77345ebc 756b90e3 00000080 0000008c ntdll!KiFastSystemCallRet
00f6d5e0 756b90e3 00000080 0000008c 00000000 ntdll!ZwWriteFile+0xc
00f6d644 7708121a 00000080 00f6e778 0000000b KERNELBASE!WriteFile+0xaa
00f6d660 01138af4 00000080 00f6e778 0000000b kernel32!WriteFileImplementation+0x76
...
在!analyze -v众多输出中,我看到失败时IRP的地址。抱着尝试的态度,我试着查看这个IRP的信息:
Debugging Details:
------------------
IRP_ADDRESS: 00129450 <----失败时的IRP地址
kd> !irp 00129450
00129450: Could not read Irp
很可惜,windbg不能解析IRP信息。很无奈,我只能通过堆栈回溯来分析失败的原因了:
NtWriteFile对写请求做了一系列检测处理,然后调用IopSynchronousServiceTail由这个函数调用IoCallDriver向设备栈发送IRP请求:
status = IoCallDriver( DeviceObject, Irp );
if (DeferredIoCompletion) {
if (status != STATUS_PENDING) {
PKNORMAL_ROUTINE normalRoutine;
PVOID normalContext;
KIRQL irql = PASSIVE_LEVEL; // Just to shut up the compiler
ASSERT( !Irp->PendingReturned );
if (!SynchronousIo) {
KeRaiseIrql( APC_LEVEL, &irql );
}
IopCompleteRequest( &Irp->Tail.Apc,
&normalRoutine,
&normalContext,
(PVOID *) &FileObject,
&normalContext );
根据堆栈回溯,出错时内核已经执行过IoCallDriver,并进入了IopCompleteRequest函数开始完成IRP请求。其实到这问题已经呼之欲出了:同一个IRP在我的驱动中已经调用过IopCompleteRequest;当驱动返回时,IO管理器试图再次操作IRP,因此出错了(在调试过程中,Windbg直接提示BugCheck Code 0x44----多次释放同一个IRP)。而导致内核调用IopCompleteRequest完成IRP的原因是status!=STATUS_PENDING----我在驱动中返回了STATUS_CANCEL。啊,果然手贱在驱动的派遣函数中写错了返回值:只要Irp没有同步完成,派遣函数的返回值就必须是STATUS_PENDING。
虽然问题已经定位,但另外还有一些比较重要的信息,可以用作这次分析的佐证----确认蓝屏确实发生在我的驱动中:
根据蓝屏前函数调用栈frame 06显示:
06 a49c1c38 82c85f36 00129490 a49c1c64 00000000 nt!IopCompleteRequest+0x24d
IoCompleteRequest的第一个参数的值是0x00129490,这和!analyze -v输出的蓝屏时IRP的地址0x129450比较接近,不排除两者之间有关联。查看IRP的结构:
kd> dt ntkrpamp!_IRP
+0x000 Type : Int2B
+0x002 Size : Uint2B
+0x004 MdlAddress : Ptr32 _MDL
...
+0x02c UserEvent : Ptr32 _KEVENT
+0x030 Overlay :
+0x038 CancelRoutine : Ptr32 void
+0x03c UserBuffer : Ptr32 Void
+0x040 Tail :
Irp!Tail成员确实和Irp结构起始地址差0x40B,与前面BugCheck分析得出的错误IRP:"IRP_ADDRESS: 00129450"一致。查看源码,IO管理器调用IopCompleteRequest时为之传入Irp->Tail.Apc为参数。这样至少可以知道出错时Windbg提示的IRP在设备栈中做了些啥~
windbg的帮助文档提到BugCheck 0x18的第2个参数代表了被错误解引用的对象,这里是0x86c76a08。查看这个对象的信息:
kd> !object 86c76a08 9
Object: 86c76a08 Type: (84feb378) Event
ObjectHeader: 86c769f0 (new version)
HandleCount: 1 PointerCount: 0
kd> !findhandle 86c76a08
Now checking process 84fcea20...Now checking process 863764c0...
[86ecb030 onlyForWrite.e] #<--------句柄86c7a08的属主进程:0x86ecb030
8c: Entry a5320118 Granted Access 1f0003
列出所有进程及其进程ID,86ecb030确实是onlyForWrite.exe程序:
kd> !process 0 0
**** NT ACTIVE PROCESS DUMP ****
...
PROCESS 88892d40 SessionId: 1 Cid: 0dfc Peb: 7ffdb000 ParentCid: 01ac
DirBase: 7eb7a440 ObjectTable: a52b8c30 HandleCount: 52.
Image: conhost.exe
PROCESS 86ecb030 SessionId: 1 Cid: 0e50 Peb: 7ffdc000 ParentCid: 0df4
DirBase: 7eb7a3a0 ObjectTable: 98258ec8 HandleCount: 50.
Image: onlyForWrite.exe
最后,我想确定这个Event是不是onlyForWrite调用WriteFile时传入的OVERLAPD!Event句柄:
OVERLAPPED overlapRd = { 0 },overlapWr = {0};
overlapRd.hEvent = CreateEvent(NULL, FALSE, FALSE, NULL);
overlapWr.hEvent = CreateEvent(NULL, FALSE, FALSE, NULL);
char writeBuff[4096] = { "_ThreadProc" }, readBuff[4096] = { 0 };
DWORD len = 0, writeLen, readLen;
while (1)
{
WriteFile(hDev, writeBuff, strlen(writeBuff), &writeLen, &overlapWr); //调用WriteFile时传入overlapWr!hEvent事件句柄
//WaitForSingleObject(overlapWr.hEvent, INFINITE);
Sleep(1000);
}
蓝屏时,我并没有用调试器调试onlyForWrite程序,所以我并没有记录overlapWr!hEvent输入的句柄值。但这个值还是可以查看调用ZwWriteFile时的参数值。MSDN解释说参数2即为传入的OVERLAPPED!hEvent
NTSTATUS ZwWriteFile(
_In_ HANDLE FileHandle,
_In_opt_ HANDLE Event,
_In_opt_ PIO_APC_ROUTINE ApcRoutine,
_In_opt_ PVOID ApcContext,
_Out_ PIO_STATUS_BLOCK IoStatusBlock,
_In_ PVOID Buffer,
_In_ ULONG Length,
_In_opt_ PLARGE_INTEGER ByteOffset,
_In_opt_ PULONG Key
);
Parameters
Event [in, optional]
Optionally, a handle to an event object to set to the signaled state after the write operation completes. Device and intermediate drivers should set this parameter to NULL.
来看下栈回溯,传给ZwWriteFile的第二个参数的句柄值为0x8C:
0a 00f6d5dc 77345ebc 756b90e3 00000080 0000008c ntdll!KiFastSystemCallRet
0b 00f6d5e0 756b90e3 00000080 0000008c 00000000 ntdll!ZwWriteFile+0xc
0c 00f6d644 7708121a 00000080 00f6e778 0000000b KERNELBASE!WriteFile+0xaa
有了这个句柄值,我只要attach到onlyForWrite.exe然后查看0x8c句柄对应的对象是否为0x86c76a08:
kd> .process 86ecb030
Implicit process is now 86ecb030
WARNING: .cache forcedecodeuser is not enabled
kd> !handle 0000008c
PROCESS 86ecb030 SessionId: 1 Cid: 0e50 Peb: 7ffdc000 ParentCid: 0df4
DirBase: 7eb7a3a0 ObjectTable: 98258ec8 HandleCount: 50.
Image: onlyForWrite.exe
Handle table at 98258ec8 with 50 entries in use
008c:<-左边是句柄值,右边是内核对象-> Object: 86c76a08 GrantedAccess: 001f0003 Entry: a5320118
Object: 86c76a08 Type: (84feb378) Event
ObjectHeader: 86c769f0 (new version)
HandleCount: 1 PointerCount: 0
这个内核对象果然就是引起蓝屏的Event对象。在调用IoCompleteRequest的最后,如果IO管理器会对用户传入的Event调用KeSetEvnet,通知应用程序读写操作完成,使得应用程序从阻塞中返回,继续往下执行。