作者:姚开健
原创作品转载请注明出处
《Linux内核分析》MOOC课程http://mooc.study.163.com/course/USTC-1000029000
1、进程调度的时机
在Linux系统中,进程调度一般发生在中断处理过程中(包括时钟中断,I/O中断,系统调用和异常),直接调用schedule()系统调用,或者返回用户态时根据need_resched标记调用schedule()。进程调度可以主动调度,也可以被动调度。
内核线程(特殊的进程,没有用户态)可以直接调用schedule()进行进程切换,也可以在中断处理过程中进行调度,也就是说内核线程作为一类的特殊的进程可以主动调度,也可以被动调度;用户态进程无法实现主动调度,仅能通过陷入内核态后的某个时机点进行调度,即在中断处理过程中进行调度。schedule()调用实现如下:
asmlinkage __visible void __sched schedule(void)
2866{
2867 struct task_struct *tsk = current;
2868
2869 sched_submit_work(tsk);
2870 __schedule();
2871}
其中__schedule()函数调用会执行一些进程调度详细工作,包括,
next = pick_next_task(rq, prev);
选择一个进程来调度,选择一个进程的调度算法的实现都在pick_next_task中,不过算法在这里并不重要,不管什么算法其作用都是选择一个进程来切换。在选择了一个进程后,那么接下来要进行进程的切换。
2、进程切换
在__schedule()函数中,选择了一个进程准备切换后,接着要调用context_switch()来进行进程上下文的切换。其实现如下:
static inline void
2336context_switch(struct rq *rq, struct task_struct *prev,
2337 struct task_struct *next)
2338{
2339 struct mm_struct *mm, *oldmm;
2340
2341 prepare_task_switch(rq, prev, next);
2342
2343 mm = next->mm;
2344 oldmm = prev->active_mm;
2345 /*
2346 * For paravirt, this is coupled with an exit in switch_to to
2347 * combine the page table reload and the switch backend into
2348 * one hypercall.
2349 */
2350 arch_start_context_switch(prev);
2351
2352 if (!mm) {
2353 next->active_mm = oldmm;
2354 atomic_inc(&oldmm->mm_count);
2355 enter_lazy_tlb(oldmm, next);
2356 } else
2357 switch_mm(oldmm, mm, next);
2358
2359 if (!prev->mm) {
2360 prev->active_mm = NULL;
2361 rq->prev_mm = oldmm;
2362 }
2363 /*
2364 * Since the runqueue lock will be released by the next
2365 * task (which is an invalid locking op but in the case
2366 * of the scheduler it's an obvious special-case), so we
2367 * do an early lockdep release here:
2368 */
2369 spin_release(&rq->lock.dep_map, 1, _THIS_IP_);
2370
2371 context_tracking_task_switch(prev, next);
2372 /* Here we just switch the register state and the stack. */
2373 switch_to(prev, next, prev);
2374
2375 barrier();
2376 /*
2377 * this_rq must be evaluated again because prev may have moved
2378 * CPUs since it called schedule(), thus the 'rq' on its stack
2379 * frame will be invalid.
2380 */
2381 finish_task_switch(this_rq(), prev);
2382}
pre参数是当前要被切换的进程,next参数是选出来进行切换的进程,在该函数中,调用了switch__to()函数进行堆栈的更换:
#define switch_to(prev, next, last) \
32do { \
33 /* \
34 * Context-switching clobbers all registers, so we clobber \
35 * them explicitly, via unused output variables. \
36 * (EAX and EBP is not listed because EBP is saved/restored \
37 * explicitly for wchan access and EAX is the return value of \
38 * __switch_to()) \
39 */ \
40 unsigned long ebx, ecx, edx, esi, edi; \
41 \
42 asm volatile("pushfl\n\t" /* save flags */ \
43 "pushl %%ebp\n\t" /* save EBP */ \
44 "movl %%esp,%[prev_sp]\n\t" /* save ESP */ \
45 "movl %[next_sp],%%esp\n\t" /* restore ESP */ \
46 "movl $1f,%[prev_ip]\n\t" /* save EIP */ \
47 "pushl %[next_ip]\n\t" /* restore EIP */ \
48 __switch_canary \
49 "jmp __switch_to\n" /* regparm call */ \
50 "1:\t" \
51 "popl %%ebp\n\t" /* restore EBP */ \
52 "popfl\n" /* restore flags */ \
53 \
54 /* output parameters */ \
55 : [prev_sp] "=m" (prev->thread.sp), \
56 [prev_ip] "=m" (prev->thread.ip), \
57 "=a" (last), \
58 \
59 /* clobbered output registers: */ \
60 "=b" (ebx), "=c" (ecx), "=d" (edx), \
61 "=S" (esi), "=D" (edi) \
62 \
63 __switch_canary_oparam \
64 \
65 /* input parameters: */ \
66 : [next_sp] "m" (next->thread.sp), \
67 [next_ip] "m" (next->thread.ip), \
68 \
69 /* regparm parameters for __switch_to(): */ \
70 [prev] "a" (prev), \
71 [next] "d" (next) \
72 \
73 __switch_canary_iparam \
74 \
75 : /* reloaded segment registers */ \
76 "memory"); \
77} while (0)
78
其中
"pushfl\n\t" /* save flags */ \
43 "pushl %%ebp\n\t" /* save EBP */ \
44 "movl %%esp,%[prev_sp]\n\t" /* save ESP
先保存了被切换进程的CPU执行信息,包括标志位,栈底指针和栈顶指针,接着保存被切换进程的被切换回来时的执行地址即eip,还初始化了准备执行的进程的eip:
"movl $1f,%[prev_ip]\n\t" /* save EIP */ \
47 "pushl %[next_ip]\n\t" /* restore EIP */
假如这个新进程是曾经被切换过的进程的话,新的进程将从标号1处开始执行,把之前压栈的指针和标志位恢复。
"1:\t" \
51 "popl %%ebp\n\t" /* restore EBP */ \
52 "popfl\n" /* restore flags */ \
53 \
Linux中,一个运行的用户态进程X发生了中断,需要切换到内核态,于是需要把用户态的堆栈压入内核态堆栈,接着CPU保存现场。如果在中断处理过程中或者中断返回前调用了schedule(),其中的switch_to做了关键的进程上下文切换。新的用户态进程Y从上述汇编代码的标号1处开始执行(假设进程Y曾经被切换过),接着进程Y恢复它的CPU现场,并且将之前保存在内核态堆栈的信息压回用户态堆栈,继续运行用户态进程Y,这就是Linux一般的进程调度与切换机制。
在通过中断处理来进行进程调度的时机中,用户态进程与内核线程之间互相切换和内核线程之间互相切换;内核线程切换之间,主动调用schedule(),只有进程上下文之间的切换,没有中断上下文的切换。