取消Irp引起蓝屏(BugCheck:0x18)

    昨天写一个简单的驱动,驱动的write例程会将IRP挂起放进自定义的队列中,然后在另一个线程中取消这些挂起的IRP:

NTSTATUS SampleCharWriteAsync(PDEVICE_OBJECT devObj, PIRP irp)
{
	KIRQL oldIrql;
	SampleCharDevContext* devCtx = (SampleCharDevContext*)devObj->DeviceExtension;
	
	IoMarkIrpPending(irp);
	KeAcquireSpinLock(&devCtx->devWriteLock, &oldIrql);

	IoSetCancelRoutine(irp, SampleCharCancelIrp);

	if (irp->Cancel == TRUE)
	{
		if (IoSetCancelRoutine(irp, NULL) != NULL)
		{
			irp->IoStatus.Status = STATUS_CANCELLED;
			irp->IoStatus.Information = 0;
			IoCompleteRequest(irp, IO_NO_INCREMENT);

			KeReleaseSpinLock(&devCtx->devWriteLock, oldIrql);

			//return STATUS_PENDING;
			return STATUS_CANCELLED; //返回STATUS_CANCELLED后会蓝屏
		}
	}

	InsertTailList(&devCtx->pendWrListHead, &irp->Tail.Overlay.ListEntry);
	KeReleaseSpinLock(&devCtx->devWriteLock, oldIrql);

	return STATUS_PENDING;
}
    开始时,没写测试代码,就想着手动置irp->Cancel为true。改完后进入if分支并执行了IoCompleteRequest,整个过程如此顺利,完全出乎意料。但是当执行完return STATUS_CANCELLED后蓝屏不期而至:

kd> kb
ChildEBP RetAddr  Args to Child              
8cccac34 82a814bc 851d06e8 868473f8 868473f8 SampleChar!SampleCharWriteAsync+0x3a [c:\studio\samplechar0x44\samplechar\samplechar.c @ 243]
WARNING: Stack unwind information not available. Following frames may be wrong.
8cccac4c 82c82eee 86916038 868473f8 8684748c nt!IofCallDriver+0x64
8cccac6c 82c837a2 851d06e8 86916038 00000001 nt!RtlRandomEx+0x1340
8cccad08 82a8842a 851d06e8 0000008c 00000000 nt!NtWriteFile+0x6ee
kd> ?? irp
struct _IRP * 0x868473f8
   +0x024 Cancel           : 0 ''
kd> eb 0x868473f8+0x024 1
kd> ?? irp
struct _IRP * 0x868473f8
   +0x024 Cancel           : 0x1 ''

取消Irp引起蓝屏(BugCheck:0x18)_第1张图片

kd> !analyze -v
REFERENCE_BY_POINTER (18)
Arguments:
Arg1: 84feb378, Object type of the object whose reference count is being lowered
Arg2: 86c76a08, Object whose reference count is being lowered
...

Debugging Details:
------------------

IRP_ADDRESS: 00129450

LAST_CONTROL_TRANSFER:  from 82b24e71 to 82ab3394

STACK_TEXT:  
...
a49c1bec 82ab0ed0 86c76a08 82aea363 612ca77a nt!ObfDereferenceObjectWithTag+0x4b
a49c1bf4 82aea363 612ca77a 87086f80 87129450 nt!ObfDereferenceObject+0xd
a49c1c38 82c85f36 00129490 a49c1c64 00000000 nt!IopCompleteRequest+0x24d
a49c1c6c 82c867a2 c0000120 87086f80 00000001 nt!IopSynchronousServiceTail+0x240
a49c1d08 82a8b42a 86312b98 0000008c 00000000 nt!NtWriteFile+0x6e8
a49c1d08 773464f4 86312b98 0000008c 00000000 nt!KiFastCallEntry+0x12a
00f6d5dc 77345ebc 756b90e3 00000080 0000008c ntdll!KiFastSystemCallRet
00f6d5e0 756b90e3 00000080 0000008c 00000000 ntdll!ZwWriteFile+0xc
00f6d644 7708121a 00000080 00f6e778 0000000b KERNELBASE!WriteFile+0xaa
00f6d660 01138af4 00000080 00f6e778 0000000b kernel32!WriteFileImplementation+0x76
...
在!analyze -v众多输出中,我看到失败时IRP的地址。抱着尝试的态度,我试着查看这个IRP的信息:

Debugging Details:
------------------
IRP_ADDRESS: 00129450 <----失败时的IRP地址

kd> !irp 00129450
00129450: Could not read Irp
很可惜,windbg不能解析IRP信息。很无奈,我只能通过堆栈回溯来分析失败的原因了:

NtWriteFile对写请求做了一系列检测处理,然后调用IopSynchronousServiceTail由这个函数调用IoCallDriver向设备栈发送IRP请求:

    status = IoCallDriver( DeviceObject, Irp );

    if (DeferredIoCompletion) {

        if (status != STATUS_PENDING) {

            PKNORMAL_ROUTINE normalRoutine;
            PVOID normalContext;
            KIRQL irql = PASSIVE_LEVEL; // Just to shut up the compiler

            ASSERT( !Irp->PendingReturned );

            if (!SynchronousIo) {
                KeRaiseIrql( APC_LEVEL, &irql );
            }
            IopCompleteRequest( &Irp->Tail.Apc,
                                &normalRoutine,
                                &normalContext,
                                (PVOID *) &FileObject,
                                &normalContext );

根据堆栈回溯,出错时内核已经执行过IoCallDriver,并进入了IopCompleteRequest函数开始完成IRP请求。其实到这问题已经呼之欲出了:同一个IRP在我的驱动中已经调用过IopCompleteRequest;当驱动返回时,IO管理器试图再次操作IRP,因此出错了(在调试过程中,Windbg直接提示BugCheck Code 0x44----多次释放同一个IRP)。而导致内核调用IopCompleteRequest完成IRP的原因是status!=STATUS_PENDING----我在驱动中返回了STATUS_CANCEL。啊,果然手贱在驱动的派遣函数中写错了返回值:只要Irp没有同步完成,派遣函数的返回值就必须是STATUS_PENDING。


虽然问题已经定位,但另外还有一些比较重要的信息,可以用作这次分析的佐证----确认蓝屏确实发生在我的驱动中:

根据蓝屏前函数调用栈frame 06显示:

06 a49c1c38 82c85f36 00129490 a49c1c64 00000000 nt!IopCompleteRequest+0x24d

IoCompleteRequest的第一个参数的值是0x00129490,这和!analyze -v输出的蓝屏时IRP的地址0x129450比较接近,不排除两者之间有关联。查看IRP的结构:

kd> dt ntkrpamp!_IRP
   +0x000 Type             : Int2B
   +0x002 Size             : Uint2B
   +0x004 MdlAddress       : Ptr32 _MDL
...
   +0x02c UserEvent        : Ptr32 _KEVENT
   +0x030 Overlay          : 
   +0x038 CancelRoutine    : Ptr32     void 
   +0x03c UserBuffer       : Ptr32 Void
   +0x040 Tail             : 
Irp!Tail成员确实和Irp结构起始地址差0x40B,与前面BugCheck分析得出的错误IRP:"IRP_ADDRESS: 00129450"一致。查看源码,IO管理器调用IopCompleteRequest时为之传入Irp->Tail.Apc为参数。这样至少可以知道出错时Windbg提示的IRP在设备栈中做了些啥~

windbg的帮助文档提到BugCheck 0x18的第2个参数代表了被错误解引用的对象,这里是0x86c76a08。查看这个对象的信息:

kd> !object 86c76a08 9
Object: 86c76a08  Type: (84feb378) Event
    ObjectHeader: 86c769f0 (new version)
    HandleCount: 1  PointerCount: 0
kd> !findhandle 86c76a08
Now checking process 84fcea20...Now checking process 863764c0...
                   [86ecb030 onlyForWrite.e] #<--------句柄86c7a08的属主进程:0x86ecb030
    8c: Entry a5320118 Granted Access 1f0003
列出所有进程及其进程ID,86ecb030确实是onlyForWrite.exe程序:
kd> !process 0 0
**** NT ACTIVE PROCESS DUMP ****
...
PROCESS 88892d40  SessionId: 1  Cid: 0dfc    Peb: 7ffdb000  ParentCid: 01ac
    DirBase: 7eb7a440  ObjectTable: a52b8c30  HandleCount:  52.
    Image: conhost.exe

PROCESS 86ecb030  SessionId: 1  Cid: 0e50    Peb: 7ffdc000  ParentCid: 0df4
    DirBase: 7eb7a3a0  ObjectTable: 98258ec8  HandleCount:  50.
    Image: onlyForWrite.exe

最后,我想确定这个Event是不是onlyForWrite调用WriteFile时传入的OVERLAPD!Event句柄:

	OVERLAPPED overlapRd = { 0 },overlapWr = {0};
	overlapRd.hEvent = CreateEvent(NULL, FALSE, FALSE, NULL);
	overlapWr.hEvent = CreateEvent(NULL, FALSE, FALSE, NULL);

	char writeBuff[4096] = { "_ThreadProc" }, readBuff[4096] = { 0 };
	DWORD len = 0, writeLen, readLen;

	while (1)
	{
		WriteFile(hDev, writeBuff, strlen(writeBuff), &writeLen, &overlapWr); //调用WriteFile时传入overlapWr!hEvent事件句柄
		//WaitForSingleObject(overlapWr.hEvent, INFINITE);
		Sleep(1000);
	}
蓝屏时,我并没有用调试器调试onlyForWrite程序,所以我并没有记录overlapWr!hEvent输入的句柄值。但这个值还是可以查看调用ZwWriteFile时的参数值。MSDN解释说参数2即为传入的OVERLAPPED!hEvent

NTSTATUS ZwWriteFile(
  _In_     HANDLE           FileHandle,
  _In_opt_ HANDLE           Event,
  _In_opt_ PIO_APC_ROUTINE  ApcRoutine,
  _In_opt_ PVOID            ApcContext,
  _Out_    PIO_STATUS_BLOCK IoStatusBlock,
  _In_     PVOID            Buffer,
  _In_     ULONG            Length,
  _In_opt_ PLARGE_INTEGER   ByteOffset,
  _In_opt_ PULONG           Key
);
Parameters
Event [in, optional]

    Optionally, a handle to an event object to set to the signaled state after the write operation completes. Device and intermediate drivers should set this parameter to NULL.
来看下栈回溯,传给ZwWriteFile的第二个参数的句柄值为0x8C:

0a 00f6d5dc 77345ebc 756b90e3 00000080 0000008c ntdll!KiFastSystemCallRet
0b 00f6d5e0 756b90e3 00000080 0000008c 00000000 ntdll!ZwWriteFile+0xc
0c 00f6d644 7708121a 00000080 00f6e778 0000000b KERNELBASE!WriteFile+0xaa
有了这个句柄值,我只要attach到onlyForWrite.exe然后查看0x8c句柄对应的对象是否为0x86c76a08:
kd> .process 86ecb030  
Implicit process is now 86ecb030
WARNING: .cache forcedecodeuser is not enabled
kd> !handle 0000008c 

PROCESS 86ecb030  SessionId: 1  Cid: 0e50    Peb: 7ffdc000  ParentCid: 0df4
    DirBase: 7eb7a3a0  ObjectTable: 98258ec8  HandleCount:  50.
    Image: onlyForWrite.exe

Handle table at 98258ec8 with 50 entries in use

008c:<-左边是句柄值,右边是内核对象-> Object: 86c76a08  GrantedAccess: 001f0003 Entry: a5320118
Object: 86c76a08  Type: (84feb378) Event
    ObjectHeader: 86c769f0 (new version)
        HandleCount: 1  PointerCount: 0
这个内核对象果然就是引起蓝屏的Event对象。在调用IoCompleteRequest的最后,如果IO管理器会对用户传入的Event调用KeSetEvnet,通知应用程序读写操作完成,使得应用程序从阻塞中返回,继续往下执行。




你可能感兴趣的:(win驱动,设备驱动)