进程切换&&中断&&异常&系统调用execve()函数

  进程切换

每个进程都可以拥有属于自己的进程空间;但是共享一个CPU寄存器;因此在切换进程之前,必须确保每个寄存器的值都是准入了挂起进程时的值。

进程恢复执行前必须装入寄存器的一组数据称为硬件上下文(hardware context)  ;在linux 中 硬件上下文的一部分存放在TSS段,而剩余部分存放在内核态堆栈中。
* Stack layout in 'ret_from_system_call':
 * 	ptrace needs to have all regs on the stack.
 *	if the order here is changed, it needs to be
 *	updated in fork.c:copy_process, signal.c:do_signal,
 *	ptrace.c and ptrace.h
 *
 *	 0(%esp) - %ebx
 *	 4(%esp) - %ecx
 *	 8(%esp) - %edx
 *       C(%esp) - %esi
 *	10(%esp) - %edi
 *	14(%esp) - %ebp
 *	18(%esp) - %eax
 *	1C(%esp) - %ds
 *	20(%esp) - %es
 *	24(%esp) - orig_eax
 *	28(%esp) - %eip
 *	2C(%esp) - %cs
 *	30(%esp) - %eflags
 *	34(%esp) - %oldesp
 *	38(%esp) - %oldss
 *
 * "current" is in register %ebx during any slow entries


thread字段
在每次进程切换时,被替换的进程硬件上下文必出存在别处;而linux中为每个处理器而不是每个进程使用TSS;所以进程切换时,内核就把硬件上下文保存在thread_struct的thread字段中。这个字段包含大部分CPU寄存器字段,但不包含通用寄存器(eax、ebx等);它们的值保存在内核堆栈中。

switch_to宏:

#define switch_to(prev,next,last) do {\
unsigned long esi,edi;\
asm volatile("pushfl\n\t"\
    "pushl %%ebp\n\t"\
    "movl %%esp,%0\n\t"/* save ESP */ \
    "movl %5,%%esp\n\t"/* restore ESP */ \
    "movl $1f,%1\n\t"/* save EIP */ \
    "pushl %6\n\t"/* restore EIP */ \
    "jmp __switch_to\n"\
    "1:\t" \
    "popl %%ebp\n\t"\
    "popfl" \
    :"=m" (prev->thread.esp),"=m" (prev->thread.eip),\
     "=a" (last),"=S" (esi),"=D" (edi)\
    :"m" (next->thread.esp),"m" (next->thread.eip),\
     "2" (prev), "d" (next));\
} while (0)

这里的输出部有三个参数,表示这段程序执行以后有三项数据会有改变。其中%0和%1都在内存中,分别为prev->thread.esp和prev->thread.eip,而%2则与寄存器EB X结合,对应于参数中的last。而输入部则有5个参数。其中%3和%4在内存中,分别为next->thread.esp和next->thread.eip}  %5, %6和%7分别与寄存器EAX, EDX以及EBX结合,分别对应于prev, next和prev 。
       先来看开头的三条push指令和结尾处的三条pop指令。看起来好像是很一般,其实却暗藏玄机。且看第19行和20行。第19行将当前的ESP,也就是当前进程prev的系统空问堆栈指针存入prev->thread.esp,第20行又将新受到调度要进入运行的进程next的系统空问堆栈指针next->thread.esp置入ESP。这样一来,CPU在第20行与21行这两条指令之问就已经切换了堆栈。假定我们有A, B两个进程,在本次切换中prev指向A,而next指向B。也就是说,在本次切换中A为要“调离”的进程,而B为要“切入”的进程。那么,在这里的第16至20行是在使用A的堆栈,而从第21行开始就是在用B的堆栈了。换言之,从第21行开始,“当前进程”,已经是B而不是A了。我们以前讲过,在内核代码中当需要访问当前进程的task struct结构时使用的指针current实际上是宏定义,它根据当前的堆栈指针ESP计算出所需的地址。如果第21行处引用current的话,那就已经指向B的task struct结构了。从这个意义上说,进程的切换在第20行的指令执行完就
已经完成了。但是,构成一个进程的另一个要素是程序的执行,这方面的切换显然尚未完成。那么,为什么在第16至18行push进A的堆栈,而在第25行至27行却从B的堆栈POP回来呢?这就是奥妙所在了。其实,第25行至27行是在恢复新切入的进程在上一次被调离时push进堆栈的内容。那么,程序执行的切换,具体又是怎样实现的呢?让我们来看第21行至24行。第21行将标号1"所在的地址,实际上就是第25行的pop指令所在的地址保存在prev->thread.eip中,作为进程A下一次被调度运行而切入时的“返回”地址。然后,又将next->thread.eip压入堆栈。所以,这里的
next->thread.eip正是进程B上一次被调离时在第21行中保存的。它也指向这里的标号“1",即25行的pop指令。接着,在23行通过jmp指令,而不是call指令,转入了一个函数_switch to()。且不说在_switch toQ中干了些什么,当CPU执行到那里的ret指令时,由于是通过jmp指令转过去的,最后进入堆栈的next->thread.eip就变成了返回地址,而这就是标号“1”所在的地址,也就是25行的pop指令所在的地址。由于每个进程在被调离时都要执行这里的第21行,这就决定了每个进程在受到调度恢复运行时都是从这里的第25行开始。但是有一个例外,那就是新创建的进程。新创建的进程并没有在“上一次调离时”执行过这里的第16至21行,所以一来要将其task struct结构中的thread.eip事先设置好,二来所设置的“返回地址”也未必是这里的标号“1”所在,这取决于其系统空问堆栈的设置。事实上,读者在fork()一节中已经看到,这个地址在copy_ thread()中(见arch/i386/kernel/process.c)设置为ret from fork,其代码在entry.S中:


ENTRY(ret_from_fork)
pushl %eax
call schedule_tail
GET_THREAD_INFO(%ebp)
popl %eax
jmp syscall_exit


syscall_exit:
cli # make sure we don't miss an interrupt
# setting need_resched or sigpending
# between sampling and the iret
movl TI_flags(%ebp), %ecx
testw $_TIF_ALLWORK_MASK, %cx# current->work
jne syscall_exit_work

syscall_exit_work:
testb $(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT|_TIF_SINGLESTEP), %cl
jz work_pending
sti # could let do_syscall_trace() call
# schedule() instead
movl %esp, %eax
movl $1, %edx
call do_syscall_trace
jmp resume_userspace
/*
 * Return to user mode is not as complex as all this looks,
 * but we want the default path for a system call return to
 * go as quickly as possible which is why some of this is
 * less clear than it otherwise should be.
 */
即从内核空间返回到用户空间

 父进程在fork()了进程以后,并不立即主动调用schedule(),而只是将其task struct结构中的need_ resched标志设成了1,然后就从do_ fork()和sys fork()中返回。经过entry. S中的ret_from_ sys_ call到达ret _with _reschedule时,如果其task_struct结构中的need resched为0,那就直接返回了,这时其堆栈指针已经指向了regs,所以RESTORE_ ALL就使进程回到用户空问(参看第3章)。可是,现在need_ resched已经是1,就要调用schedule()进行调度,所以其堆栈指针又回过头来向下仲展。如果调
度的结果是继续运行,那就马上会从schedule()返回,就像什么事也没发生过一样。而如果调度了另一个进程运行,那么其系统空问堆栈就变成了图4.5中的样了。处于堆栈“顶部”的是进程在下一次被调度运行时的切入点,那就是在前面switoh to()的代码中21行设置的。注意,switch to()是一个宏操作而并不是一个函数,所以堆栈中并没有从switch to()返回的地址。将来,当父进程被调度恢复运行时,在switoh to()的20行恢复了其堆栈指针,然后在_switch_ to()中执行ret指令时就“返回”到了25行,
所以其堆栈中的这一项也可以看成是“从_switch to()返回的地址”。父进程最后返回到了entry.S中的289行,紧接着就会跳转到ret from_ sys call。相比之下,了进程的这个“返回地址”被设置成ret from sys call ,所以存switch  to(、一执行ret指令就返回到哪里}I}
汇编指令调用的jmp    _switch_to
主要完成tss的处理
/*
 * Note that the .io_bitmap member must be extra-big. This is because
 * the CPU will access an additional byte beyond the end of the IO
 * permission bitmap. The extra byte must be all 1 bits, and must
 * be within the limit.
 */
#define INIT_TSS  { \
.esp0 = sizeof(init_stack) + (long)&init_stack,\
.ss0 = __KERNEL_DS,\
.ss1 = __KERNEL_CS,\
.ldt = GDT_ENTRY_LDT,\
.io_bitmap_base= INVALID_IO_BITMAP_OFFSET, \
.io_bitmap = { [ 0 ... IO_BITMAP_LONGS] = ~0 }, \
}

DEFINE_PER_CPU(struct tss_struct, init_tss) ____cacheline_maxaligned_in_smp = INIT_TSS;

struct tss_struct {
unsigned shortback_link,__blh;
unsigned long esp0;
unsigned shortss0,__ss0h;
unsigned long esp1;
unsigned shortss1,__ss1h; /* ss1 is used to cache MSR_IA32_SYSENTER_CS */
unsigned long esp2;
unsigned shortss2,__ss2h;
unsigned long __cr3;
unsigned long eip;
unsigned long eflags;
unsigned long eax,ecx,edx,ebx;
unsigned long esp;
unsigned long ebp;
unsigned long esi;
unsigned long edi;
unsigned shortes, __esh;
unsigned shortcs, __csh;
unsigned shortss, __ssh;
unsigned shortds, __dsh;
unsigned shortfs, __fsh;
unsigned shortgs, __gsh;
unsigned shortldt, __ldth;
unsigned shorttrace, io_bitmap_base;
/*
* The extra 1 is there because the CPU will access an
* additional byte beyond the end of the IO permission
* bitmap. The extra byte must be all 1 bits, and must
* be within the limit.
*/
unsigned long io_bitmap[IO_BITMAP_LONGS + 1];
/*
* Cache the current maximum and the last task that used the bitmap:
*/
unsigned long io_bitmap_max;
struct thread_struct *io_bitmap_owner;
/*
* pads the TSS to be cacheline-aligned (size is 0x100)
*/
unsigned long __cacheline_filler[35];
/*
* .. and then another 0x100 bytes for emergency kernel stack
*/
unsigned long stack[64];
} __attribute__((packed)

此指令完成的内容:process.c文件中
/*
 *	switch_to(x,yn) should switch tasks from x to y.
 *
 * We fsave/fwait so that an exception goes off at the right time
 * (as a call from the fsave or fwait in effect) rather than to
 * the wrong process. Lazy FP saving no longer makes any sense
 * with modern CPU's, and this simplifies a lot of things (SMP
 * and UP become the same).
 *
 * NOTE! We used to use the x86 hardware context switching. The
 * reason for not using it any more becomes apparent when you
 * try to recover gracefully from saved state that is no longer
 * valid (stale segment register values in particular). With the
 * hardware task-switch, there is no way to fix up bad state in
 * a reasonable manner.
 *
 * The fact that Intel documents the hardware task-switching to
 * be slow is a fairly red herring - this code is not noticeably
 * faster. However, there _is_ some room for improvement here,
 * so the performance issues may eventually be a valid point.
 * More important, however, is the fact that this allows us much
 * more flexibility.
 *
 * The return value (in %eax) will be the "prev" task after
 * the task-switch, and shows up in ret_from_fork in entry.S,
 * for example.
 */
struct task_struct fastcall * __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
{
	struct thread_struct *prev = &prev_p->thread,
				 *next = &next_p->thread;
	int cpu = smp_processor_id();
	struct tss_struct *tss = &per_cpu(init_tss, cpu);

	/* never put a printk in __switch_to... printk() calls wake_up*() indirectly */

	__unlazy_fpu(prev_p);

	/*
	 * Reload esp0, LDT and the page table pointer:
	 */
	load_esp0(tss, next);

	/*
	 * Load the per-thread Thread-Local Storage descriptor.
	 */
	load_TLS(next, cpu);

	/*
	 * Save away %fs and %gs. No need to save %es and %ds, as
	 * those are always kernel segments while inside the kernel.
	 */
	asm volatile("movl %%fs,%0":"=m" (*(int *)&prev->fs));
	asm volatile("movl %%gs,%0":"=m" (*(int *)&prev->gs));

	/*
	 * Restore %fs and %gs if needed.
	 */
	if (unlikely(prev->fs | prev->gs | next->fs | next->gs)) {
		loadsegment(fs, next->fs);
		loadsegment(gs, next->gs);
	}

	/*
	 * Now maybe reload the debug registers
	 */
	if (unlikely(next->debugreg[7])) {
		loaddebug(next, 0);
		loaddebug(next, 1);
		loaddebug(next, 2);
		loaddebug(next, 3);
		/* no 4 and 5 */
		loaddebug(next, 6);
		loaddebug(next, 7);
	}

	if (unlikely(prev->io_bitmap_ptr || next->io_bitmap_ptr))
		handle_io_bitmap(next, tss);

	return prev_p;
}






你可能感兴趣的:(进程切换&&中断&&异常&系统调用execve()函数)