#define PASSIVE_LEVEL 0
#define LOW_LEVEL 0
#define APC_LEVEL 1
#define DISPATCH_LEVEL 2
#define PROFILE_LEVEL 27
#define CLOCK1_LEVEL 28
#define CLOCK2_LEVEL 28
#define IPI_LEVEL 29
#define POWER_LEVEL 30
#define HIGH_LEVEL 31
1.3.1 内核代码运行级别
Windows NT为它的内核模式的代码分配了不同的级别。在同一个CPU上,级别低的过程
可以被任何级别更大的过程中断。级别由低到高排列如下:
级别名称 运行于该级别的过程
PASSIVE_LEVEL DriverEntry, Unload, ShutDown, DispatchXxx。
APC_LEVEL 在某些特殊情况下,大存储量设备的驱动程序运行于该级别。
DISPATCH_LEVEL StartIo, AdapterControl, ControllerControl, IoTimer,Dpc。
DIRQLs 各种中断处理程序。
表16.1 驱动程序例程的缺省IRQL
IRQL(由低到高) 屏蔽掉的中断 运行在此IRQL的支持例程
PASSIVE_LEVEL 无 Dispatch、DriverEntry、AddDevice、Reinitialize、Unload例程、驱动程序创建的线程、工作者线程(work-thread)回调、文件系统驱动程序
DISPATCH_LEVEL DISPATCH_LEVEL和APC_LEVEL中断被屏蔽掉了。设备、时钟和电源错误中断仍可发生
StartIo、AdapterControl、AdapterListControl、ControllerControl、IoTimer、Cancel(持有撤消自旋锁时)、DpcForIsr、CustomTimerDpc、CustomDpc例程
DIRQL 驱动程序中断对象中所有IRQL<=DIRQL的中断。时钟和电源错误中断仍可发生 ISR、SyncCritSection例程
对于高层驱动程序可能需要一个或多个IoCompletion例程,最起码完成检查I/O状态块然后调用IoCompleteRequest的工作。如果需要,还要对Device Extension数据结构和内容做些修改。有一点必须很清楚的,就是代码运行级别的问题,即IRQL,最常见的级别是PASSIVE_LEVEL、APC_LEVEL、DISPATCH_LEVEL和DIRQL。
在看NT DDK HELP中的函数说明的时候,要注意函数的可运行级别,比如有的函数只能在PASSIVE_LEVEL下运行,有的函数则可以在DISPATCH_LEVEL以下级别运行,级别越高的时候,对代码的要求就越严格,比如在DISPATCH_LEVEL的时候,就不能使用分页内存。通常情况下应该尽可能让代码在低运行级别如PASSIVE_LEVEL下运行,在高级别下运行过长时间将导致系统效率降低、影响系统响应的实时性。但有时候自己无法控制运行的级别,例如在调用低层Driver时使用IoCallDriver,低层Driver响应完毕后会执行completion例程,该例程运行的级别就是由低层Driver来决定。因此在编写completion例程时,应尽量将这个函数设计成能在DISPATCH_LEVEL级别运行。
以下内容转载自:http://blog.163.com/hao_dsliu/blog/static/131578908201373171111173/
这篇文章主要说明俩个问题:
1. 在APC_LEVEL上,Thread为何不能被suspend。
2. 在 APC_LEVEL上,可以使用分页内存的原因。
关于线程如何响应APC,要看是何种APC,请参考MSDN文档。我在看微软提供的资料的时候,发现俩个比较难懂的问题,把它们单独拿出来讨论。
首先看中断请求级:IRQL(Interrupt Request Levels)
IRQL |
IRQL value |
Description |
||
x86 |
IA64 |
AMD64 |
||
PASSIVE_LEVEL |
0 |
0 |
0 |
User threads and most kernel-mode operations |
APC_LEVEL |
1 |
1 |
1 |
Asynchronous procedure calls and page faults |
DISPATCH_LEVEL |
2 |
2 |
2 |
Thread scheduler and deferred procedure calls (DPCs) |
CMC_LEVEL |
N/A |
3 |
N/A |
Correctable machine-check level (IA64 platforms only) |
Device interrupt levels (DIRQL) |
3-26 |
4-11 |
3-11 |
Device interrupts |
PC_LEVEL |
N/A |
12 |
N/A |
Performance counter (IA64 platforms only) |
PROFILE_LEVEL |
27 |
15 |
15 |
Profiling timer for releases earlier than Windows 2000 |
SYNCH_LEVEL |
27 |
13 |
13 |
Synchronization of code and instruction streams across processors |
CLOCK_LEVEL |
N/A |
13 |
13 |
Clock timer |
CLOCK2_LEVEL |
28 |
N/A |
N/A |
Clock timer for x86 hardware |
IPI_LEVEL |
29 |
14 |
14 |
Interprocessor interrupt for enforcing cache consistency |
POWER_LEVEL |
30 |
15 |
14 |
Power failure |
HIGH_LEVEL |
31 |
15 |
15 |
Machine checks and catastrophic errors; profiling timer for Windows XP and later releases |
微软说:
When a processor is running at a given IRQL, interrupts at that IRQL and lower are masked off (blocked) on the processor.
但其实这句话只适合在DISPATCH_LEVEL到HIGH_LEVEL之间。在APC_LEVEL和PASSIVE_LEVEL级别要特殊对待。而且这两个级别可以被调度器调度,就显得更加复杂。
所以微软又说:
IRQL分为: Processor-specific and Thread-specific IRQLs
规则:When a processor is running at a given IRQL, interrupts at that IRQL and lower are masked off (blocked) on the processor.
在这三个级别上运行的线程,都能被调度器调度(调度器运行在DISPATCH_LEVEL),调度器考虑的只是优先级,优先级高的就能抢占优先级低的线程。所一个运行在APC_LEVEL低优先级的线程,可以被一个运行在PASSIVE_LEVEL优先级高的线程给抢占。所有微软说:
The thread scheduler considers only thread priority, and not IRQL, when preempting a thread. If a thread running at IRQL=APC_LEVEL blocks, the scheduler might select a new thread for the processor that was previously running at PASSIVE_LEVEL.
线程相关的IRQL,可以将线程想象为一个伪CPU,此CPU只有三个LEVEL:PASSIVE_LEVEL,Intermediate level and APC_LEVEL.
IRQL PASSIVE_LEVEL in a critical region
Code that is running at PASSIVE_LEVEL in a critical region is effectively running at an intermediate level between PASSIVE_LEVEL and APC_LEVEL. Calls to KeGetCurrentIrql return PASSIVE_LEVEL. Driver code can determine whether it is operating in a critical region by calling the function KeAreApcsDisabled (available in Windows XP and later releases).
Driver code that is running above PASSIVE_LEVEL (either at PASSIVE_LEVEL in a critical region or at APC_LEVEL or higher) cannot be suspended. Almost every operation that a driver can perform at PASSIVE_LEVEL can also be performed in a critical region. Two notable exceptions are raising hard errors and opening a file on storage media.
IRQL APC_LEVEL
APC_LEVEL is a thread-specific IRQL that is most commonly associated with paging I/O. Applications cannot suspend code that is running at IRQL=APC_LEVEL. The system implements fast mutexes (a type of synchronization mechanism) at APC_LEVEL. The KeAcquireFastMutex routine raises the IRQL to APC_LEVEL, and KeReleaseFastMutex returns the IRQL to its original value.
The only difference between a thread that is running at PASSIVE_LEVEL with APCs disabled and a thread that is running at APC_LEVEL is that while running at APC_LEVEL, the thread cannot be interrupted to deliver a special kernel-mode APC.
Thread 进入APC_LEVEL方式:
使用Fast Mutex进入APC_LEVEL后,对于其它线程,若要获取此Mutex,则会被设置为等待状态。对于线程自己而言,微软说:
Code paths that are protected by a fast mutex run at IRQL=APC_LEVEL, thus disabling delivery of all APCs and preventing the thread from suspension.
即:阻止响应任何APC,而且线程不能被挂起(suspend),为什么不能被挂起?因为操作系统实现线程挂起的方式,就是Delivery APC,在APC的回调函数里面等待一个信号量(这个是我查阅WRK中找到的答案)。由于运行在APC_LEVEL上,会disabling delivery of all APCs。
如果将有一个线程理解为一个伪CPU,此CPU只有三个LEVEL:PASSIVE_LEVEL,Intermediate level and APC_LEVEL. 然后将Delivery APC当做一个中断来处理,使用微软的中断规则来处理:When a processor is running at a given IRQL, interrupts at that IRQL and lower are masked off (blocked) on the processor. 就可以理解为CPU在APC_LEVEL上,屏蔽了所有等于和小于它的中断。
APC有三种:kernel normal apc, special kernel normal apc, and user mode apc。Delivery a APC,此APC并不是会马上运行,要看情况而定,这个情况很复杂,不在本文说明,可以去微软MSDN寻找答案。
另外一个问题:
在 APC_LEVEL上,可以使用分页内存。但我查找资料发现,当在分页内存中发生了Page fault,系统会delivery APC,那岂不是此APC也不会得到执行了么?那不就形成死锁,和在DIPATCH_LEVEL一样,形成蓝屏了么?后来,我找到了一个关键点。在APC_LEVEL上,一旦发生Page fault,系统首先是将此线程挂起,此线程将等待一个内核同步对象,虽然线程运行在APC_LEVEL上但是线程被挂起,此时调度器会调度别的线程。当Page fault处理完成后,系统会给此线程delivery a APC,此线程又临时变为可调度状态,调度器就去调用这个APC,这个APC会激活线程等待的那个内核同步对象,但线程等待的那个点并不会马上得到执行,得等到这个APC退出后,因为这个APC是在这个线程中执行的。等到Page fault的Delivery APC退出,线程等待的对象处于激活状态,此线程变为可调度,调度器会在合适的时间,继续调度此线程,由于内存错误得到了解决,所以访问内存没有任何问题。
这段解释是不对的,一旦线程在APC_LEVEL上,而且线程处于等待状态,给它排队APC,这个APC是不会得到执行的。排队APC等于给线程发送一个软中断,是不会得到响应的。在DISPATCH_LEVEL不能使用分页的主要原因在于,一旦有分页错误,它就会等待分页完成,在DISPATCH_LEVEL上,是不能调用KeWaitXXX来等待的。而不是在于排队一个APC,而且一个同步的分页,是不会排队APC的。在IoCompleteRequest函数中有如下代码:
if (Irp->Flags & (IRP_PAGING_IO | IRP_CLOSE_OPERATION |IRP_SET_USER_EVENT)) {
if (Irp->Flags & (IRP_SYNCHRONOUS_PAGING_IO | IRP_CLOSE_OPERATION |IRP_SET_USER_EVENT)) {
ULONG flags;
flags = Irp->Flags & (IRP_SYNCHRONOUS_PAGING_IO|IRP_PAGING_IO);
*Irp->UserIosb = Irp->IoStatus;
(VOID) KeSetEvent( Irp->UserEvent, PriorityBoost, FALSE );
if (flags) {
if (IopIsReserveIrp(Irp)) {
IopFreeReserveIrp(PriorityBoost);
} else {
IoFreeIrp( Irp );
}
}
} else {
thread = Irp->Tail.Overlay.Thread;
KeInitializeApc( &Irp->Tail.Apc,
&thread->Tcb,
Irp->ApcEnvironment,
IopCompletePageWrite,
(PKRUNDOWN_ROUTINE) NULL,
(PKNORMAL_ROUTINE) NULL,
KernelMode,
(PVOID) NULL );
(VOID) KeInsertQueueApc( &Irp->Tail.Apc,
(PVOID) NULL,
(PVOID) NULL,
PriorityBoost );
}
return;
}
如果线程运行在DISPATCH_LEVEL,显而易见,单核CPU中调度器无法执行(多核CPU中可以执行在另外一个核中)。但主要的原因是发送的APC相当于一个APC_LEVEL的中断,它是得不到执行的。所以在DISPATCH_LEVEL上,会崩溃。而且在DISPATCH_LEVEL上,不能等待一个timer不为0的内核对象。