Windows系统会在初始化的时候将时钟中断与时钟中断处理函数进行关联,而正是在这个中断函数会在合适的时机进行系统的线程调度。
阶段1函数HalpInitPhase0会调用HalpEnableInterruptHandler函数开启时钟中断。HalpEnableInterruptHandler函数会在内部将时钟处理函数HalpClockInterrupt设置到相应的中断向量当中。HalpClockInterrupt在汇编文件中以宏的形式生成。
TRAP_ENTRY HalpClockInterrupt, KI_PUSH_FAKE_ERROR_CODE
TRAP_ENTRY实现中断处理的两个部分,第一部分生成一个中断框架,第二部分进行中断处理函数的调用。而真正的中断处理函数是通过宏调用组合而成的。在这里是HalpClockInterruptHandler函数。
MACRO(TRAP_ENTRY, Trap, Flags)
EXTERN @&Trap&Handler@4 :PROC
PUBLIC _&Trap
.PROC _&Trap
/* Generate proper debugging symbols */
FPO 0, 0, 0, 0, 1, FRAME_TRAP
/* Common code to create the trap frame */
KiEnterTrap Flags
/* Call the C handler */
KiCallHandler @&Trap&Handler@4
.ENDP
ENDM
在HalpClockInterruptHandler函数只有一个参数,就是中断的框架。在中断处理函数当中对这个参数进行一些设置之后,就需要对时钟的寄存器进行一些设置,然后传递给KeUpdateSystemTime。主要的中断操作都在KeUpdateSystemTime函数里面实现。
VOID
FASTCALL
HalpClockInterruptHandler(IN PKTRAP_FRAME TrapFrame)
{
ULONG LastIncrement;
KIRQL Irql;
/* Enter trap */
KiEnterInterruptTrap(TrapFrame);
/* Start the interrupt */
if (!HalBeginSystemInterrupt(CLOCK_LEVEL, HalpClockVector, &Irql))
{
/* Spurious, just end the interrupt */
KiEoiHelper(TrapFrame);
}
/* Read register C, so that the next interrupt can happen */
HalpReadCmos(RTC_REGISTER_C);;
/* Save increment */
LastIncrement = HalpCurrentTimeIncrement;
/* Check if someone changed the time rate */
if (HalpClockSetMSRate)
{
/* Set new clock rate */
RtcSetClockRate(HalpCurrentRate);
/* We're done */
HalpClockSetMSRate = FALSE;
}
/* Update the system time -- on x86 the kernel will exit this trap */
KeUpdateSystemTime(TrapFrame, LastIncrement, Irql);
}
KeUpdateSystemTime函数做两个操作,第一部分查看定时器时间是否已经到期,若时间到期则调用软中断进行定时器提交;第二部分更新系统的运行时间,更具运行时间的更新进行线程切换。在KeUpdateSystemTime函数里面会调用KeUpdateRunTime更新运行时间,KeUpdateRunTime函数实现计时系统的运行计时,然后将所有的请求都交给HalReuquestSoftwareInterrupt函数。在这个函数当中会根据传递的IRQL调用相应的任务处理函数,由于HalRequestSoftwareInterrupt函数仅仅只接受小于等于DISPATCH_LEVEL的IRQL,所以系统的调度也就必须在小于DISPATCH_LEVEL级别下才能进行。
VOID
FASTCALL
HalRequestSoftwareInterrupt(IN KIRQL Irql)
{
ULONG EFlags;
PKPCR Pcr = KeGetPcr();
KIRQL PendingIrql;
/* Save EFlags and disable interrupts */
EFlags = __readeflags();
_disable();
/* Mask out the requested bit */
Pcr->IRR |= (1 << Irql);
/* Check for pending software interrupts and compare with current IRQL */
PendingIrql = SWInterruptLookUpTable[Pcr->IRR & 3];
if (PendingIrql > Pcr->Irql) SWInterruptHandlerTable[PendingIrql]();
/* Restore interrupt state */
__writeeflags(EFlags);
}
在HalSoftWareInterrupt函数中,通过传递进来的IRQL数值与系统保存的数值进行与操作得到一个索引,然后利用这个索引调用函数数组SWInterruptHandlerTable。最终的函数在编译的时候会填充这个数组,如果是系统调度,那么就会调用数组的第三项HalpDispatchnterrupt函数。
VOID
HalpDispatchInterrupt2(VOID)
{
ULONG PendingIrqlMask, PendingIrql;
KIRQL OldIrql;
PIC_MASK Mask;
PKPCR Pcr = KeGetPcr();
/* Do the work */
OldIrql = _HalpDispatchInterruptHandler();
/* Restore IRQL */
Pcr->Irql = OldIrql;
/* Check for pending software interrupts and compare with current IRQL */
PendingIrqlMask = Pcr->IRR & FindHigherIrqlMask[OldIrql];
if (PendingIrqlMask)
{
/* Check if pending IRQL affects hardware state */
BitScanReverse(&PendingIrql, PendingIrqlMask);
if (PendingIrql > DISPATCH_LEVEL)
{
/* Set new PIC mask */
Mask.Both = Pcr->IDR & 0xFFFF;
__outbyte(PIC1_DATA_PORT, Mask.Master);
__outbyte(PIC2_DATA_PORT, Mask.Slave);
/* Clear IRR bit */
Pcr->IRR ^= (1 << PendingIrql);
}
/* Now handle pending interrupt */
SWInterruptHandlerTable[PendingIrql]();
}
}
在HalpDispatchInterrupt2函数当中真正的调度函数式_HalpDispatchInterruptHandler函数。HalpDispatchInterrupt2函数仅仅设置一些参数后调用KiDispatchInterrupt,KiDispatchInterrupt函数将PRCB当中的成员变量NextThread所指向的线程指针调度为下一个运行的线程,然后切换线程的上下文。
VOID
NTAPI
KiDispatchInterrupt(VOID)
{
PKIPCR Pcr = (PKIPCR)KeGetPcr();
PKPRCB Prcb = &Pcr->PrcbData;
PVOID OldHandler;
PKTHREAD NewThread, OldThread;
/* Disable interrupts */
_disable();
/* Check for pending timers, pending DPCs, or pending ready threads */
if ((Prcb->DpcData[0].DpcQueueDepth) ||
(Prcb->TimerRequest) ||
(Prcb->DeferredReadyListHead.Next))
{
/* Switch to safe execution context */
OldHandler = Pcr->NtTib.ExceptionList;
Pcr->NtTib.ExceptionList = EXCEPTION_CHAIN_END;
/* Retire DPCs while under the DPC stack */
KiRetireDpcListInDpcStack(Prcb, Prcb->DpcStack);
/* Restore context */
Pcr->NtTib.ExceptionList = OldHandler;
}
/* Re-enable interrupts */
_enable();
/* Check for quantum end */
if (Prcb->QuantumEnd)
{
/* Handle quantum end */
Prcb->QuantumEnd = FALSE;
KiQuantumEnd();
}
else if (Prcb->NextThread)
{
OldThread = Prcb->CurrentThread;
NewThread = Prcb->NextThread;
Prcb->NextThread = NULL;
Prcb->CurrentThread = NewThread;
NewThread->State = Running;
OldThread->WaitReason = WrDispatchInt;
KxQueueReadyThread(OldThread, Prcb);
KiSwapContext(APC_LEVEL, OldThread);
}
}
在KxQueueReadyThread函数当中,会根据当前处理器的成员属性和线程的亲和性决定是否会将OldThread参数插入到就绪队列当中。如果亲和性是当前的处理器则会执行插入操作,否则不会引起插入操作。从而保证线程仅在亲和性处理器上运行。KiSwapContext函数最终会将控制权转入到KiSwapContextEntry。
VOID
FASTCALL
KiSwapContextEntry(IN PKSWITCHFRAME SwitchFrame,
IN ULONG_PTR OldThreadAndApcFlag)
{
PKIPCR Pcr = (PKIPCR)KeGetPcr();
PKTHREAD OldThread, NewThread;
ULONG Cr0, NewCr0;
/* Save APC bypass disable */
SwitchFrame->ApcBypassDisable = OldThreadAndApcFlag & 3;
SwitchFrame->ExceptionList = Pcr->NtTib.ExceptionList;
/* Increase context switch count and check if tracing is enabled */
Pcr->ContextSwitches++;
if (Pcr->PerfGlobalGroupMask)
{
/* We don't support this yet on x86 either */
DPRINT1("WMI Tracing not supported\n");
ASSERT(FALSE);
}
/* Get thread pointers */
OldThread = (PKTHREAD)(OldThreadAndApcFlag & ~3);
NewThread = Pcr->PrcbData.CurrentThread;
/* Get the old thread and set its kernel stack */
OldThread->KernelStack = SwitchFrame;
/* ISRs can change FPU state, so disable interrupts while checking */
_disable();
/* Get current and new CR0 and check if they've changed */
Cr0 = __readcr0();
NewCr0 = NewThread->NpxState |
(Cr0 & ~(CR0_MP | CR0_EM | CR0_TS)) |
KiGetThreadNpxArea(NewThread)->Cr0NpxState;
if (Cr0 != NewCr0) __writecr0(NewCr0);
/* Now enable interrupts and do the switch */
_enable();
KiSwitchThreads(OldThread, NewThread->KernelStack);
}
这个函数设置好当前调换后的线程的堆栈和页面的管理后就进入到KiSwitchThreads函数进行线程的堆栈切换。其实真正的堆栈切换很简单,就是通过改变CR3的页表项就可以了,然后需要将新的设置给更新到PRCB当中。到这里基本上线程的调度已经实现了,然而由于可能需要线程接受APC,所以还需要在调度之后检查APC是否被激活,以便于线程接受APC提交。
BOOLEAN
FASTCALL
KiSwapContextExit(IN PKTHREAD OldThread,
IN PKSWITCHFRAME SwitchFrame)
{
PKIPCR Pcr = (PKIPCR)KeGetPcr();
PKPROCESS OldProcess, NewProcess;
PKGDTENTRY GdtEntry;
PKTHREAD NewThread;
/* We are on the new thread stack now */
NewThread = Pcr->PrcbData.CurrentThread;
/* Now we are the new thread. Check if it's in a new process */
OldProcess = OldThread->ApcState.Process;
NewProcess = NewThread->ApcState.Process;
if (OldProcess != NewProcess)
{
/* Check if there is a different LDT */
if (*(PULONGLONG)&OldProcess->LdtDescriptor != *(PULONGLONG)&NewProcess->LdtDescriptor)
{
DPRINT1("LDT switch not implemented\n");
ASSERT(FALSE);
}
/* Switch address space and flush TLB */
__writecr3(NewProcess->DirectoryTableBase[0]);
}
/* Clear GS */
Ke386SetGs(0);
/* Set the TEB */
Pcr->NtTib.Self = (PVOID)NewThread->Teb;
GdtEntry = &Pcr->GDT[KGDT_R3_TEB / sizeof(KGDTENTRY)];
GdtEntry->BaseLow = (USHORT)((ULONG_PTR)NewThread->Teb & 0xFFFF);
GdtEntry->HighWord.Bytes.BaseMid = (UCHAR)((ULONG_PTR)NewThread->Teb >> 16);
GdtEntry->HighWord.Bytes.BaseHi = (UCHAR)((ULONG_PTR)NewThread->Teb >> 24);
/* Set new TSS fields */
Pcr->TSS->Esp0 = (ULONG_PTR)NewThread->InitialStack;
if (!((KeGetTrapFrame(NewThread))->EFlags & EFLAGS_V86_MASK))
{
Pcr->TSS->Esp0 -= (FIELD_OFFSET(KTRAP_FRAME, V86Gs) - FIELD_OFFSET(KTRAP_FRAME, HardwareSegSs));
}
Pcr->TSS->Esp0 -= NPX_FRAME_LENGTH;
Pcr->TSS->IoMapBase = NewProcess->IopmOffset;
/* Increase thread context switches */
NewThread->ContextSwitches++;
/* Load data from switch frame */
Pcr->NtTib.ExceptionList = SwitchFrame->ExceptionList;
/* DPCs shouldn't be active */
if (Pcr->PrcbData.DpcRoutineActive)
{
/* Crash the machine */
KeBugCheckEx(ATTEMPTED_SWITCH_FROM_DPC,
(ULONG_PTR)OldThread,
(ULONG_PTR)NewThread,
(ULONG_PTR)OldThread->InitialStack,
0);
}
/* Kernel APCs may be pending */
if (NewThread->ApcState.KernelApcPending)
{
/* Are APCs enabled? */
if (!NewThread->SpecialApcDisable)
{
/* Request APC delivery */
if (SwitchFrame->ApcBypassDisable)
HalRequestSoftwareInterrupt(APC_LEVEL);
else
return TRUE;
}
}
/* Return stating that no kernel APCs are pending*/
return FALSE;
}