进程的上下文切换包括kernel stack切换,寄存器的切换,页表切换等等。本文基于Linux 3.10并只分析kernel stack和寄存器的切换。
kernel stack和寄存器的切换发生在context_swich()函数。其中输入参数prev指向当前进程,next指向目标进程。
/*
* context_switch - switch to the new MM and the new
* thread's register state.
*/
static inline void
context_switch(struct rq *rq, struct task_struct *prev,
struct task_struct *next)
/* Save restore flags to clear handle leaking NT */
#define switch_to(prev, next, last) \
asm volatile(SAVE_CONTEXT \
"movq %%rsp,%P[threadrsp](%[prev])\n\t" /* save RSP */ \
"movq %P[threadrsp](%[next]),%%rsp\n\t" /* restore RSP */ \
"call __switch_to\n\t" \
"movq "__percpu_arg([current_task])",%%rsi\n\t" \
__switch_canary \
"movq %P[thread_info](%%rsi),%%r8\n\t" \
"movq %%rax,%%rdi\n\t" \
"testl %[_tif_fork],%P[ti_flags](%%r8)\n\t" \
"jnz ret_from_fork\n\t" \
RESTORE_CONTEXT \
: "=a" (last) \
__switch_canary_oparam \
: [next] "S" (next), [prev] "D" (prev), \
[threadrsp] "i" (offsetof(struct task_struct, thread.sp)), \
[ti_flags] "i" (offsetof(struct thread_info, flags)), \
[_tif_fork] "i" (_TIF_FORK), \
[thread_info] "i" (offsetof(struct task_struct, stack)), \
[current_task] "m" (current_task) \
__switch_canary_iparam \
: "memory", "cc" __EXTRA_CLOBBER)
输入参数中rdi代表next,rdi代表prev。下面以进程A, B, C之间的切换分析switch_to的工作原理。
假设正在进行由进程A到进程B的切换。由于__EXTRA_CLOBBER和SAVE_CONTEXT的作用,当程序执行完SAVE_CONTEXT后,进程A的kernel stack应当如下所示。
----------------------------------------------- <- rsp rbp = rsi = B, rdi = A, rsi = B
register save for A except rsp/rdi/rsi
----------------------------------------------- <- when switch_to runs
local data
----------------------------------------------- <- when context_switch runs
之后rsp被保存到进程A的task_struct中,并且stack被切换为进程B的。最终进程B的寄存器也将从堆栈中恢复,完成进程切换。
----------------------------------------------- <- rsp rbp = rsi = A, rdi = C, rsi = A
register save for C except rsp/rdi/rsi
----------------------------------------------- <- when switch_to runs
local data
----------------------------------------------- <- when context_switch runs
在执行完stack swtich之后(即SAVE_CONTEXT之后的两行汇编),进程A的stack重新被load到rsp。当前kernel stack如下:----------------------------------------------- <- rsp rbp = rsi = A, rdi = C, rsi = A
register save for A except rsp/rdi/rsi
----------------------------------------------- <- when switch_to runs
local data
----------------------------------------------- <- when context_switch runs
rdi被设置为__switch_to的返回值(rax),即进程C的task_struct地址。在_TIF_FORK标志未设置的情况下(只有当A是一个刚刚fork的子进程时该标志才会被设置),程序最终运行到RESTORE_CONTEXT。在运行完RESTORE_CONTEXT并自动恢复__EXTRA_CLOBBER中的寄存器后,kernel stack如下所示。可以看到进程A的所有寄存器都被正确恢复。注意rbp的值不会被callee改变,因此rsi最终在RESTORE_CONTEXT中恢复为A。
----------------------------------------------- <- rsp rdi = C, rsi = A
local data
----------------------------------------------- <- when context_switch runs