简介
该实验需要让JOS实现一个用户环境,使其可以运行用户程序(进程),用户程序的运行涉及到内核态和用户态之间的切换。所以一共需要实现两个内容,一是进程的创建,二是中断指令。
Part A 用户环境和异常处理
Exercise 1 分配Envs数组内存
在Linux中,进程在程序中使用PCB数据结构来表示,在JOS中,Env就相当于PCB,其结构如下,有些参数是Lab4的所需的,在此可以略过。
struct Env {
struct Trapframe env_tf; // Saved registers
struct Env *env_link; // Next free Env
envid_t env_id; // Unique environment identifier
envid_t env_parent_id; // env_id of this env's parent
enum EnvType env_type; // Indicates special system environments
unsigned env_status; // Status of the environment
uint32_t env_runs; // Number of times environment has run
int env_cpunum; // The CPU that the env is running on
// Address space
pde_t *env_pgdir; // Kernel virtual address of page dir
// Exception handling
void *env_pgfault_upcall; // Page fault upcall entry point
// Lab 4 IPC
bool env_ipc_recving; // Env is blocked receiving
void *env_ipc_dstva; // VA at which to map received page
uint32_t env_ipc_value; // Data value sent to us
envid_t env_ipc_from; // envid of the sender
int env_ipc_perm; // Perm of page mapping received
};
在JOS中,进程数是有上限的,由NENV
来决定。所有的Env存放在一个envs数组之中。一开始需要对envs分配物理空间。这个过程和pageInfo对应一个物理页差不多,分配内存的过程和之前pages分配物理空间一模一样。
void
mem_init(void)
{
...
envs = (struct Env*)boot_alloc(sizeof(struct Env) * NENV);
memset(envs, 0, sizeof(struct Env) * NENV);
...
}
Exercise 2 创建运行环境
由于JOS还未实现文件系统,所以无法像在Linux下一样直接在命令行运行用户程序。所以目前采用的方法是将用户程序使用-b binary转换成二进制代码嵌入到内核中,观察obj/kern/kernel.sym(如下所示)可以发现有许多类似_binary_obj_user_hello_start, _binary_obj_user_hello_end, and _binary_obj_user_hello_size.这样的符号,这都是user/hello.c的信息。通过地址读取这些信息就能运行hello.c。
0000891c A _binary_obj_user_hello_size
0000891c A _binary_obj_user_softint_size
0000891c A _binary_obj_user_yield_size
00008920 A _binary_obj_user_badsegment_size
00008920 A _binary_obj_user_breakpoint_size
00008920 A _binary_obj_user_buggyhello_size
00008920 A _binary_obj_user_evilhello_size
00008920 A _binary_obj_user_faultread_size
此任务需要完成以下函数
env_init(): 初始化所有的Env,并将其加入空闲链表中,需要注意是按从大到小插入空闲链表。
void
env_init(void)
{
// Set up envs array
// LAB 3: Your code here.
int i;
for (int i = NENV-1; i >= 0; --i) {
envs[i].env_id = 0;
envs[i].env_status = ENV_FREE;
envs[i].env_link = env_free_list;
env_free_list = &envs[i];
}
// Per-CPU part of the initialization
env_init_percpu();
}
env_setup_vm(): 每个进程都有自己独立的虚拟地址空间,因此需要对其Env中的pgdir分配一页页目录空间并初始化。并将kern_pgdir的内容复制进去,但是UVPT页和kern_pgdir的不同,这页地址代表是env自己的页表地址。所以需要特殊设置。
static int
env_setup_vm(struct Env *e)
{
int i;
struct PageInfo *p = NULL;
// Allocate a page for the page directory
if (!(p = page_alloc(ALLOC_ZERO)))
return -E_NO_MEM;
// Now, set e->env_pgdir and initialize the page directory.
//
// Hint:
// - The VA space of all envs is identical above UTOP
// (except at UVPT, which we've set below).
// See inc/memlayout.h for permissions and layout.
// Can you use kern_pgdir as a template? Hint: Yes.
// (Make sure you got the permissions right in Lab 2.)
// - The initial VA below UTOP is empty.
// - You do not need to make any more calls to page_alloc.
// - Note: In general, pp_ref is not maintained for
// physical pages mapped only above UTOP, but env_pgdir
// is an exception -- you need to increment env_pgdir's
// pp_ref for env_free to work correctly.
// - The functions in kern/pmap.h are handy.
// LAB 3: Your code here.
e->env_pgdir = (pde_t*)page2kva(p);
p->pp_ref++;
memset(e->env_pgdir, 0, PGSIZE);
memcpy(e->env_pgdir, kern_pgdir, PGSIZE);
// UVPT maps the env's own page table read-only.
// Permissions: kernel R, user R
e->env_pgdir[PDX(UVPT)] = PADDR(e->env_pgdir) | PTE_P | PTE_U;
return 0;
}
region_alloc(): 之前的操作都是对一个进程标识进行初始化,实际上的进程自己所需的物理地址空间还没分配。此函数将长度为len的物理地址空间映射到[va, va+len)中,就相当于给进程分配了len的内存。注意实际分配的空间大等于len(由于页对齐),分配的页的物理地址保存在env_pgdir中。
static void
region_alloc(struct Env *e, void *va, size_t len)
{
// LAB 3: Your code here.
// (But only if you need it for load_icode.)
//
// Hint: It is easier to use region_alloc if the caller can pass
// 'va' and 'len' values that are not page-aligned.
// You should round va down, and round (va + len) up.
// (Watch out for corner-cases!)
void* down = (void*)ROUNDDOWN(va, PGSIZE);
void* up = (void*)ROUNDUP(va+len, PGSIZE);
//uint32_t* down2 = (uint32_t*)ROUNDDOWN(va, PGSIZE);
//cprintf("void*: %08x, uint32_t*: %08x\n", down, down2);
for (; down < up; down += PGSIZE) {
struct PageInfo* p = page_alloc(ALLOC_ZERO);
if (p == NULL) {
panic("lack of memory\n");
}
page_insert(e->env_pgdir, p, down, PTE_U | PTE_W);
}
}
load_icode(): 真正的进程初始化函数。用户程序被编译成了二进制文件嵌入到了内核中,此函数的作用读入用户程序初始化进程。这个过程和bootmain非常像,主要区别是bootmain从磁盘读取elf头,此程序中内存读取elf头,将所需要的段信息拷贝到进程地址空间中。程序所需的信息都在struct Proghdr *ph中。别忘记最后需要设置Env中的eip寄存器地址为程序的入口,同时设置进程分配用户栈(这个步骤需要将cr3中页目录地址切换为内核页目录地址)。
static void
load_icode(struct Env *e, uint8_t *binary)
{
// Hints:
// Load each program segment into virtual memory
// at the address specified in the ELF segment header.
// You should only load segments with ph->p_type == ELF_PROG_LOAD.
// Each segment's virtual address can be found in ph->p_va
// and its size in memory can be found in ph->p_memsz.
// The ph->p_filesz bytes from the ELF binary, starting at
// 'binary + ph->p_offset', should be copied to virtual address
// ph->p_va. Any remaining memory bytes should be cleared to zero.
// (The ELF header should have ph->p_filesz <= ph->p_memsz.)
// Use functions from the previous lab to allocate and map pages.
//
// All page protection bits should be user read/write for now.
// ELF segments are not necessarily page-aligned, but you can
// assume for this function that no two segments will touch
// the same virtual page.
//
// You may find a function like region_alloc useful.
//
// Loading the segments is much simpler if you can move data
// directly into the virtual addresses stored in the ELF binary.
// So which page directory should be in force during
// this function?
//
// You must also do something with the program's entry point,
// to make sure that the environment starts executing there.
// What? (See env_run() and env_pop_tf() below.)
// LAB 3: Your code here.
// cprintf("load_icode begin\n");
struct Elf* env_elf;
struct Proghdr *ph, *eph;
env_elf = (struct Elf*) binary;
ph = (struct Proghdr*)(binary + env_elf->e_phoff);
eph = ph + env_elf->e_phnum;
lcr3(PADDR(e->env_pgdir));
for (; ph < eph; ph++) {
// cprintf("ph: %x, eph: %x\n", ph, eph);
if (ph->p_type == ELF_PROG_LOAD) {
// cprintf("ph->p_va: %x, ph->p_memsz: %x, ph->p_filesz: %x\n", ph->p_va, ph->p_memsz, ph->p_filesz);
region_alloc(e, (void*)ph->p_va, ph->p_memsz);
// cprintf("region_alloc segment end\n");
memcpy((void*)ph->p_va, (void*)(binary + ph->p_offset), ph->p_filesz);
// cprintf("memcpy end\n");
memset((void*)(ph->p_va + ph->p_filesz), 0, ph->p_memsz-ph->p_filesz);
}
}
// cprintf("region_alloc segment end\n");
e->env_tf.tf_eip = env_elf->e_entry;
// Now map one page for the program's initial stack
// at virtual address USTACKTOP - PGSIZE.
lcr3(PADDR(kern_pgdir));
region_alloc(e, (void*)(USTACKTOP-PGSIZE), PGSIZE); // 用户栈
// cprintf("load_icode end\n");
}
env_create(): 准备工作都做好后就开始创建进程了,主要分为两步骤,初始化进程表示Env,初始化进程。
void
env_create(uint8_t *binary, enum EnvType type)
{
//cprintf("env_create begin\n");
// LAB 3: Your code here.
struct Env* new_env;
env_alloc(&new_env, 0);
load_icode(new_env, binary);
new_env->env_type = type;
//cprintf("env_create end\n");
}
env_run(): 运行进程。将当前运行进程curenv切换到目标进程e,并设置进程状态,更改cr3的值,将进程的寄存器信息放入对应的寄存器中(在ev_pop_tf的popal中完成)
env_pop_tf(): 其实就是将栈指针esp指向该进程的env_tf,然后将 env_tf 中存储的寄存器的值弹出到对应寄存器中,最后通过 iret 指令弹出栈中的元素分别到 EIP, CS, EFLAGS 到对应寄存器并跳转到 CS:EIP 存储的地址执行(当使用iret指令返回到一个不同特权级运行时,还会弹出堆栈段选择子及堆栈指针分别到SS与SP寄存器),这样,相关寄存器都从内核设置成了用户程序对应的值,EIP存储的是程序入口地址。
此时还无法从用户态到内核态。
void
env_pop_tf(struct Trapframe *tf)
{
// Record the CPU we are running on for user-space debugging
asm volatile(
"\tmovl %0,%%esp\n"
"\tpopal\n"
"\tpopl %%es\n"
"\tpopl %%ds\n"
"\taddl $0x8,%%esp\n" /* skip tf_trapno and tf_errcode */
"\tiret\n"
: : "g" (tf) : "memory");
panic("iret failed"); /* mostly to placate the compiler */
}
void
env_run(struct Env *e)
{
// Step 1: If this is a context switch (a new environment is running):
// 1. Set the current environment (if any) back to
// ENV_RUNNABLE if it is ENV_RUNNING (think about
// what other states it can be in),
// 2. Set 'curenv' to the new environment,
// 3. Set its status to ENV_RUNNING,
// 4. Update its 'env_runs' counter,
// 5. Use lcr3() to switch to its address space.
// Step 2: Use env_pop_tf() to restore the environment's
// registers and drop into user mode in the
// environment.
// Hint: This function loads the new environment's state from
// e->env_tf. Go back through the code you wrote above
// and make sure you have set the relevant parts of
// e->env_tf to sensible values.
// LAB 3: Your code here.
if (curenv != NULL && curenv->env_status == ENV_RUNNING) {
curenv->env_status = ENV_RUNNABLE;
}
curenv = e;
curenv->env_status = ENV_RUNNING;
curenv->env_runs++;
lcr3(PADDR(curenv->env_pgdir));
env_pop_tf(&curenv->env_tf);
// cprintf("env_run end\n");
// panic("env_run not yet implemented");
}
总结:
进程的创建过程:
- start (kern/entry.S)
- i386_init (kern/init.c)
- cons_init
- mem_init
- env_init
- trap_init (still incomplete at this point)
- env_create
- env_run
- env_pop_tf
当运行make qemu后会出现不断重启的现象,是因为在hello.c中的cprintf当中调用了sys_cputs(),尝试将用户态转变到内核态,其本质是使用int 0x30系统调用,但目前无法处理这个系统调用。
Exercise 3 了解中断和异常
阅读Chapter 9, Exceptions and Interrupts了解异常和中断。
基本的保护控制转移
使用异常和中断实现处理器在用户态不涉及任何内核态代码下从用户态转变到内核态,称作保护控制转移。
同步中断(异常): 是由cpu内部的电信号产生的中断,其特点为当前执行的指令结束后才转而产生中断,由于有cpu主动产生,其执行点必然是可控的。
异步中断: 是由cpu的外设产生的电信号引起的中断,其发生的时间点不可预期。
中断: 通常是外部设备产生, 比如I/O操作。
异常: 运行中的程序产生,比如除零错误。
如何实现中断和异常,使得用户态安全的转变到内核态呢?JOS使用了两种机制。
- 中断向量表: 内核中定义了一个内核可以响应的中断和向量的表单,表单上标明了处理函数的入口地址。只有在表单上的中断和异常才会被内核捕获。
- TSS段: 当用户态转变到内核态时,处理器需要保存当前程序运行的一些信息,比如cs和eip,以便中断返回后继续执行程序,一般这些内容都是保存在栈中,但是如果用户态和内核态用同一个栈的话,那么用户态的数据就会影响到内核态,所以内核态单独有一个栈,当触发中断或异常时,就需要将栈的地址变成内核栈,即改变ss和eip等寄存器,如何获取到内核栈的ss0和eip0呢?将其保存在TSS中。
中断的例子
当发生一个除零异常时
1、 处理器根据TSS中的ss0和eip0,将栈切换到内核栈,将旧的ss,eip,eflags,cs,eip入栈,类似于函数调用。
-
将异常错误码压栈。
读取中断向量表,根据异常类型获取处理函数的入口地址,设置cs:eip。
处理函数处理异常,结束后返回用户态。
Exercise 4 初始化中断向量表(IDT)
补全trapentry.S和trap.c,实现中断处理。先来看一下user/hello.c中中断调用的过程。
- hello.c中的cprintf
- vcprintf中的sys_cputs
int
vcprintf(const char *fmt, va_list ap)
{
struct printbuf b;
b.idx = 0;
b.cnt = 0;
vprintfmt((void*)putch, &b, fmt, ap);
sys_cputs(b.buf, b.idx);
return b.cnt;
}
- sys_cputs触发syscall
void
sys_cputs(const char *s, size_t len)
{
syscall(SYS_cputs, 0, (uint32_t)s, len, 0, 0, 0);
}
- syscall中触发int %1中断,会切换到内核栈,并将ss,eip等信息入栈。int 指令会查询中断向量表IDT中%1的处理函数入口地址(硬件实现),然后开始执行处理函数。
asm volatile("int %1\n"
: "=a" (ret)
: "i" (T_SYSCALL),
"a" (num),
"d" (a1),
"c" (a2),
"b" (a3),
"D" (a4),
"S" (a5)
: "cc", "memory");
首先在trapentry.S中使用TRAPHANDLER和TRAPHANDLER_NOEC来实现处理函数
TRAPHANDLER_NOEC(handler0, T_DIVIDE)
TRAPHANDLER_NOEC(handler1, T_DEBUG)
TRAPHANDLER_NOEC(handler2, T_NMI)
TRAPHANDLER_NOEC(handler3, T_BRKPT)
TRAPHANDLER_NOEC(handler4, T_OFLOW)
TRAPHANDLER_NOEC(handler5, T_BOUND)
TRAPHANDLER_NOEC(handler6, T_ILLOP)
TRAPHANDLER(handler7, T_DEVICE)
TRAPHANDLER_NOEC(handler8, T_DBLFLT)
TRAPHANDLER(handler10, T_TSS)
TRAPHANDLER(handler11, T_SEGNP)
TRAPHANDLER(handler12, T_STACK)
TRAPHANDLER(handler13, T_GPFLT)
TRAPHANDLER(handler14, T_PGFLT)
TRAPHANDLER_NOEC(handler16, T_FPERR)
TRAPHANDLER_NOEC(handler48, T_SYSCALL)
发现其最终都要jmp _alltraps中,最终转向trap.c中的trap(tf),所以_alltraps需要实现将当前寄存器,ds,es入栈,使其栈中的内容符合trapFrame结构的内容,才能调用trap(tf)。然后再trap_dispatch中进行消息分发。很像设计模式中的中介者模式。
_alltraps:
pushl %ds
pushl %es
pushal
movw $GD_KD, %ax
movw %ax, %ds
movw %ax, %es
pushl %esp
call trap
当触发int指令后,根据IDTR寄存器获取IDT表的地址,然后查询IDT获取中断处理函数的入口地址,IDT是在内核初始化的时候一起初始化的。
在trap_init()中使用SETGATE宏将中断描述符与trapentry.S中的处理函数挂钩。调用汇编代码中的函数需要先声明才能使用。
void handler0();
void handler1();
void handler2();
void handler3();
void handler4();
void handler5();
void handler6();
void handler7();
void handler8();
void handler10();
void handler11();
void handler12();
void handler13();
void handler14();
void handler15();
void handler16();
void handler48();
SETGATE(idt[T_DIVIDE], 0, GD_KT, handler0, 0);
SETGATE(idt[T_DEBUG], 0, GD_KT, handler1, 0);
SETGATE(idt[T_NMI], 0, GD_KT, handler2, 0);
// T_BRKPT DPL 3
SETGATE(idt[T_BRKPT], 0, GD_KT, handler3, 3);
SETGATE(idt[T_OFLOW], 0, GD_KT, handler4, 0);
SETGATE(idt[T_BOUND], 0, GD_KT, handler5, 0);
SETGATE(idt[T_ILLOP], 0, GD_KT, handler6, 0);
SETGATE(idt[T_DEVICE], 0, GD_KT, handler7, 0);
SETGATE(idt[T_DBLFLT], 0, GD_KT, handler8, 0);
SETGATE(idt[T_TSS], 0, GD_KT, handler10, 0);
SETGATE(idt[T_SEGNP], 0, GD_KT, handler11, 0);
SETGATE(idt[T_STACK], 0, GD_KT, handler12, 0);
SETGATE(idt[T_GPFLT], 0, GD_KT, handler13, 0);
SETGATE(idt[T_PGFLT], 0, GD_KT, handler14, 0);
SETGATE(idt[T_FPERR], 0, GD_KT, handler16, 0);
SETGATE(idt[T_SYSCALL], 0, GD_KT, handler48, 3);
Part B: 缺页错误,断点异常,系统调用
现在内核可以捕获中断了,但是还没有实现中断的真正处理。
Exercise 5 处理缺页错误
trap(tf)在保存好中断之前的信息后,会调用trap_dispatch分发中断。修改trap_dispatch()捕获page fault后调用page_fault_handler()
static void
trap_dispatch(struct Trapframe *tf)
{
// Handle processor exceptions.
// LAB 3: Your code here.
switch(tf->tf_trapno) {
case T_PGFLT:
page_fault_handler(tf);
return;
...
}
Exercise 6 断点异常
和Exercise 5 差不多,捕获后调用kernel monitor
static void
trap_dispatch(struct Trapframe *tf)
{
// Handle processor exceptions.
// LAB 3: Your code here.
switch(tf->tf_trapno) {
case T_PGFLT:
page_fault_handler(tf);
return;
case T_BRKPT:
monitor(tf);
...
Exercise 7 系统调用
系统调用是一种有意的异常,使用户程序可以从用户态到内核态执行一些操作。调用过程在之前有描述过,可以往上翻翻。以下是捕获后的系统调用处理函数。
trap_dispatch(),系统调用的trapno存放在td_regs的eax中,参数都存放在了tf_regs的edx, ecx, ebx, edi, esi中(lib/syscall.c将参数放入寄存器,trapentry.S中的_alltrap将寄存器的值放入tf)。
static void
trap_dispatch(struct Trapframe *tf)
{
// Handle processor exceptions.
// LAB 3: Your code here.
switch(tf->tf_trapno) {
case T_PGFLT:
page_fault_handler(tf);
return;
case T_BRKPT:
monitor(tf);
return;
case T_SYSCALL:
tf->tf_regs.reg_eax = syscall(tf->tf_regs.reg_eax,
tf->tf_regs.reg_edx,
tf->tf_regs.reg_ecx,
tf->tf_regs.reg_ebx,
tf->tf_regs.reg_edi,
tf->tf_regs.reg_edi);
return;
}
...
}
kern/syscall.c中实现真正的系统调用处理。
int32_t
syscall(uint32_t syscallno, uint32_t a1, uint32_t a2, uint32_t a3, uint32_t a4, uint32_t a5)
{
// Call the function corresponding to the 'syscallno' parameter.
// Return any appropriate return value.
// LAB 3: Your code here.
// panic("syscall not implemented");
switch (syscallno) {
case SYS_cputs:
sys_cputs((void *)a1, a2);
return 0;
case SYS_cgetc:
return sys_cgetc();
case SYS_getenvid:
return sys_getenvid();
case SYS_env_destroy:
return sys_env_destroy(a1);
default:
return -E_INVAL;
}
}
运行make run-hello可以看到打印出了hello world。但是第二个cprintf会出现page fault。
Exercise 8 解决hello.c的page fault
出现page fault的原因是thisenv并没有初始化。hello.c调用过程: lib/entry.S -> lib/libmain.c -> user/hello.c。在entry.S中将envs, pages, uvpt, uvpd进行了定义。在libmain中可以直接使用。
.globl envs
.set envs, UENVS
.globl pages
.set pages, UPAGES
.globl uvpt
.set uvpt, UVPT
.globl uvpd
.set uvpd, (UVPT+(UVPT>>12)*4)
void
libmain(int argc, char **argv)
{
// set thisenv to point at our Env structure in envs[].
// LAB 3: Your code here.
thisenv = 0;
thisenv = &envs[ENVX(sys_getenvid())];
// save the name of the program so that panic() can use it
if (argc > 0)
binaryname = argv[0];
// call user main routine
umain(argc, argv);
// exit gracefully
exit();
}
Exercise 9 缺页错误和内存保护
正在创作中~
参考链接:
https://github.com/shishujuan/mit6.828-2017/blob/master/docs/lab3-exercize.md
https://www.jianshu.com/p/3d3d79abd5d1
实际上参考了许多文章,但是没有及时记录文章地址...