Linux内核分析之八——进程调度与进程切换的过程

作者:姚开健

原创作品转载请注明出处

《Linux内核分析》MOOC课程http://mooc.study.163.com/course/USTC-1000029000

1、进程调度的时机

在Linux系统中,进程调度一般发生在中断处理过程中(包括时钟中断,I/O中断,系统调用和异常),直接调用schedule()系统调用,或者返回用户态时根据need_resched标记调用schedule()。进程调度可以主动调度,也可以被动调度。

内核线程(特殊的进程,没有用户态)可以直接调用schedule()进行进程切换,也可以在中断处理过程中进行调度,也就是说内核线程作为一类的特殊的进程可以主动调度,也可以被动调度;用户态进程无法实现主动调度,仅能通过陷入内核态后的某个时机点进行调度,即在中断处理过程中进行调度。schedule()调用实现如下:

asmlinkage __visible void __sched schedule(void)
2866{
2867	struct task_struct *tsk = current;
2868
2869	sched_submit_work(tsk);
2870	__schedule();
2871}
其中__schedule()函数调用会执行一些进程调度详细工作,包括,
next = pick_next_task(rq, prev);
选择一个进程来调度,选择一个进程的调度算法的实现都在pick_next_task中,不过算法在这里并不重要,不管什么算法其作用都是选择一个进程来切换。在选择了一个进程后,那么接下来要进行进程的切换。


2、进程切换

在__schedule()函数中,选择了一个进程准备切换后,接着要调用context_switch()来进行进程上下文的切换。其实现如下:

static inline void
2336context_switch(struct rq *rq, struct task_struct *prev,
2337	       struct task_struct *next)
2338{
2339	struct mm_struct *mm, *oldmm;
2340
2341	prepare_task_switch(rq, prev, next);
2342
2343	mm = next->mm;
2344	oldmm = prev->active_mm;
2345	/*
2346	 * For paravirt, this is coupled with an exit in switch_to to
2347	 * combine the page table reload and the switch backend into
2348	 * one hypercall.
2349	 */
2350	arch_start_context_switch(prev);
2351
2352	if (!mm) {
2353		next->active_mm = oldmm;
2354		atomic_inc(&oldmm->mm_count);
2355		enter_lazy_tlb(oldmm, next);
2356	} else
2357		switch_mm(oldmm, mm, next);
2358
2359	if (!prev->mm) {
2360		prev->active_mm = NULL;
2361		rq->prev_mm = oldmm;
2362	}
2363	/*
2364	 * Since the runqueue lock will be released by the next
2365	 * task (which is an invalid locking op but in the case
2366	 * of the scheduler it's an obvious special-case), so we
2367	 * do an early lockdep release here:
2368	 */
2369	spin_release(&rq->lock.dep_map, 1, _THIS_IP_);
2370
2371	context_tracking_task_switch(prev, next);
2372	/* Here we just switch the register state and the stack. */
2373	switch_to(prev, next, prev);
2374
2375	barrier();
2376	/*
2377	 * this_rq must be evaluated again because prev may have moved
2378	 * CPUs since it called schedule(), thus the 'rq' on its stack
2379	 * frame will be invalid.
2380	 */
2381	finish_task_switch(this_rq(), prev);
2382}
pre参数是当前要被切换的进程,next参数是选出来进行切换的进程,在该函数中,调用了switch__to()函数进行堆栈的更换:

#define switch_to(prev, next, last)					\
32do {									\
33	/*								\
34	 * Context-switching clobbers all registers, so we clobber	\
35	 * them explicitly, via unused output variables.		\
36	 * (EAX and EBP is not listed because EBP is saved/restored	\
37	 * explicitly for wchan access and EAX is the return value of	\
38	 * __switch_to())						\
39	 */								\
40	unsigned long ebx, ecx, edx, esi, edi;				\
41									\
42	asm volatile("pushfl\n\t"		/* save    flags */	\
43		     "pushl %%ebp\n\t"		/* save    EBP   */	\
44		     "movl %%esp,%[prev_sp]\n\t"	/* save    ESP   */ \
45		     "movl %[next_sp],%%esp\n\t"	/* restore ESP   */ \
46		     "movl $1f,%[prev_ip]\n\t"	/* save    EIP   */	\
47		     "pushl %[next_ip]\n\t"	/* restore EIP   */	\
48		     __switch_canary					\
49		     "jmp __switch_to\n"	/* regparm call  */	\
50		     "1:\t"						\
51		     "popl %%ebp\n\t"		/* restore EBP   */	\
52		     "popfl\n"			/* restore flags */	\
53									\
54		     /* output parameters */				\
55		     : [prev_sp] "=m" (prev->thread.sp),		\
56		       [prev_ip] "=m" (prev->thread.ip),		\
57		       "=a" (last),					\
58									\
59		       /* clobbered output registers: */		\
60		       "=b" (ebx), "=c" (ecx), "=d" (edx),		\
61		       "=S" (esi), "=D" (edi)				\
62		       							\
63		       __switch_canary_oparam				\
64									\
65		       /* input parameters: */				\
66		     : [next_sp]  "m" (next->thread.sp),		\
67		       [next_ip]  "m" (next->thread.ip),		\
68		       							\
69		       /* regparm parameters for __switch_to(): */	\
70		       [prev]     "a" (prev),				\
71		       [next]     "d" (next)				\
72									\
73		       __switch_canary_iparam				\
74									\
75		     : /* reloaded segment registers */			\
76			"memory");					\
77} while (0)
78
其中
"pushfl\n\t"		/* save    flags */	\
43		     "pushl %%ebp\n\t"		/* save    EBP   */	\
44		     "movl %%esp,%[prev_sp]\n\t"	/* save    ESP 
先保存了被切换进程的CPU执行信息,包括标志位,栈底指针和栈顶指针,接着保存被切换进程的被切换回来时的执行地址即eip,还初始化了准备执行的进程的eip:
 "movl $1f,%[prev_ip]\n\t"	/* save    EIP   */	\
47		     "pushl %[next_ip]\n\t"	/* restore EIP   */
假如这个新进程是曾经被切换过的进程的话,新的进程将从标号1处开始执行,把之前压栈的指针和标志位恢复。

		     "1:\t"						\
51		     "popl %%ebp\n\t"		/* restore EBP   */	\
52		     "popfl\n"			/* restore flags */	\
53									\


总结

Linux中,一个运行的用户态进程X发生了中断,需要切换到内核态,于是需要把用户态的堆栈压入内核态堆栈,接着CPU保存现场。如果在中断处理过程中或者中断返回前调用了schedule(),其中的switch_to做了关键的进程上下文切换。新的用户态进程Y从上述汇编代码的标号1处开始执行(假设进程Y曾经被切换过),接着进程Y恢复它的CPU现场,并且将之前保存在内核态堆栈的信息压回用户态堆栈,继续运行用户态进程Y,这就是Linux一般的进程调度与切换机制。

在通过中断处理来进行进程调度的时机中,用户态进程与内核线程之间互相切换和内核线程之间互相切换;内核线程切换之间,主动调用schedule(),只有进程上下文之间的切换,没有中断上下文的切换。








你可能感兴趣的:(Linux系统基础)