这个 lab 分为三个部分:
操作系统必须知道自己还可以用哪部分的 physical RAM,这点很重要。虚拟内存的出现是为了解决计算机内存中的几大问题。
原文:https://blog.csdn.net/do2jiang/article/details/4690967
每个进程都有自己的虚拟内存空间,做到隔离,只可以访问自己那一部分。而操作系统会记录空闲的物理地址,写时拷贝的效果可以不浪费空间,用到的时候再分配空间。程序员开发的时候也不必关心自己写的程序被映射到了哪段物理地址,只用关心自己虚拟地址就好。地址也确定。
Exercise 1:
static void *
boot_alloc(uint32_t n)
{
static char *nextfree; // virtual address of next byte of free memory
char *result;
// Initialize nextfree if this is the first time.
// 'end' is a magic symbol automatically generated by the linker,
// which points to the end of the kernel's bss segment:
// the first virtual address that the linker did *not* assign
// to any kernel code or global variables.
if (!nextfree) {
extern char end[]; //在/kern/kernel.ld中定义的符号,位于bss段的末尾
nextfree = ROUNDUP((char *) end, PGSIZE);
}
// Allocate a chunk large enough to hold 'n' bytes, then update
// nextfree. Make sure nextfree is kept aligned
// to a multiple of PGSIZE.
//
// LAB 2: Your code here.
result = nextfree;
nextfree = ROUNDUP((char *)result + n, PGSIZE);
cprintf("boot_alloc memory at %x, next memory allocate at %x\n", result, nextfree);
return result;
}
linker 会产生一个 symbol 叫做 end,位于 bss 段的末尾,我们就能找到最开始可以用的 free memory 的地址,然后根据输入的参数确定下一个 free page 的物理地址。这个函数并没有实际占用到 memory 只是会 update nextfree 这个指针,注意这个 static 的作用。
mem_init()调用boot_alloc(),将返回值赋给全局变量kern_pgdir,kern_pgdir保存的是内核页目录的物理地址。
接着根据mem_init()中的注释:
kern_pgdir = (pde_t *) boot_alloc(PGSIZE); //分配一个页的空间保存页目录表
// Allocate an array of npages 'struct PageInfo's and store it in 'pages'.
// The kernel uses this array to keep track of physical pages: for
// each physical page, there is a corresponding struct PageInfo in this
// array. 'npages' is the number of physical pages in memory. Use memset
// to initialize all fields of each struct PageInfo to 0.
pages = (struct PageInfo*)boot_alloc(sizeof(struct PageInfo) * npages);
memset(pages, 0, sizeof(struct PageInfo) * npages);
这边要分配 npages 这么多个 pages,npages,这个值是一个全局变量具体是由自带的 i386_detect_memory(),这个函数检测出来的。
void
page_init(void)
{
// The example code here marks all physical pages as free.
// However this is not truly the case. What memory is free?
// 1) Mark physical page 0 as in use.
// This way we preserve the real-mode IDT and BIOS structures
// in case we ever need them. (Currently we don't, but...)
// 2) The rest of base memory, [PGSIZE, npages_basemem * PGSIZE)
// is free.
// 3) Then comes the IO hole [IOPHYSMEM, EXTPHYSMEM), which must
// never be allocated.
// 4) Then extended memory [EXTPHYSMEM, ...).
// Some of it is in use, some is free. Where is the kernel
// in physical memory? Which pages are already in use for
// page tables and other data structures?
//
// Change the code to reflect this.
// NB: DO NOT actually touch the physical memory corresponding to
// free pages!
// 这里初始化pages中的每一项,建立page_free_list链表
// 已使用的物理页包括如下几部分:
// 1)第一个物理页是IDT所在,需要标识为已用
// 2)[IOPHYSMEM, EXTPHYSMEM)称为IO hole的区域,需要标识为已用。
// 3)EXTPHYSMEM是内核加载的起始位置,终止位置可以由boot_alloc(0)给出(理由是boot_alloc()分配的内存是内核的最尾部),这块区域也要标识
size_t i;
size_t io_hole_start_page = (size_t)IOPHYSMEM / PGSIZE;
size_t kernel_end_page = PADDR(boot_alloc(0)) / PGSIZE; //这里调了半天,boot_alloc返回的是虚拟地址,需要转为物理地址
for (i = 0; i < npages; i++) {
if (i == 0) {
pages[i].pp_ref = 1;
pages[i].pp_link = NULL;
} else if (i >= io_hole_start_page && i < kernel_end_page) {
pages[i].pp_ref = 1;
pages[i].pp_link = NULL;
} else {
pages[i].pp_ref = 0;
pages[i].pp_link = page_free_list;
page_free_list = &pages[i];
}
}
}
page 的 structure 如下所示。这里要理解的是 pages,free_page_list,无非就是指针,可以看做是链表,记录 page 有没有被使用,仅此而已,这串链表占用的空间不大,实际的 page 之后会分配。
struct PageInfo {
// Next page on the free list.
struct PageInfo *pp_link;
// pp_ref is the count of pointers (usually in page table entries)
// to this page, for pages allocated using page_alloc.
// Pages allocated at boot time using pmap.c's
// boot_alloc do not have valid reference count fields.
uint16_t pp_ref;
};
// Allocates a physical page. If (alloc_flags & ALLOC_ZERO), fills the entire
// returned physical page with '\0' bytes. Does NOT increment the reference
// count of the page - the caller must do these if necessary (either explicitly
// or via page_insert).
//
// Be sure to set the pp_link field of the allocated page to NULL so
// page_free can check for double-free bugs.
//
// Returns NULL if out of free memory.
//
// Hint: use page2kva and memset
struct PageInfo *
page_alloc(int alloc_flags)
{
struct PageInfo *ret = page_free_list;
if (ret == NULL) {
cprintf("page_alloc: out of free memory\n");
return NULL;
}
page_free_list = ret->pp_link;
ret->pp_link = NULL;
if (alloc_flags & ALLOC_ZERO) {
memset(page2kva(ret), 0, PGSIZE);
}
return ret;
}
看代码很容易理解把作用是把当前的free_page占用,更新整个链表。
page2kva(ret) page to kernel virtual address。就是把记录当前这个 page 的 address 转换成 kernel 的虚拟地址,然后释放整个 page 的 内存。
page2pa(struct PageInfo *pp)
{
return (pp - pages) << PGSHIFT;
}
这个是指针的加减法 pp - pages 实际的结果应该是 pp - pages 的绝对结果除以 sizeof(struct PageInfo),也就是第几个 page 然后左移一个page的size出来。
// Return a page to the free list.
// (This function should only be called when pp->pp_ref reaches 0.)
//
void
page_free(struct PageInfo *pp)
{
// Fill this function in
// Hint: You may want to panic if pp->pp_ref is nonzero or
// pp->pp_link is not NULL.
if (pp->pp_ref != 0 || pp->pp_link != NULL) {
panic("page_free: pp->pp_ref is nonzero or pp->pp_link is not NULL\n");
}
pp->pp_link = page_free_list;
page_free_list = pp;
}
Exercise 2 Virtual Memory.
这部分主要的目的是实现一些函数操作页目录和页表从而达到实现虚拟地址到物理地址映射的目的。
JOS内核有时候在仅知道物理地址的情况下,想要访问该物理地址,但是没有办法绕过MMU的线性地址转换机制,所以没有办法用物理地址直接访问。JOS将虚拟地址0xf0000000映射到物理地址0x0处的一个原因就是希望能有一个简便的方式实现物理地址和线性地址的转换。在知道物理地址pa的情况下可以加0xf0000000得到对应的线性地址,可以用KADDR(pa)宏实现。在知道线性地址va的情况下减0xf0000000可以得到物理地址,可以用宏PADDR(va)实现。
Exercise 4
// Given 'pgdir', a pointer to a page directory, pgdir_walk returns
// a pointer to the page table entry (PTE) for linear address 'va'.
// This requires walking the two-level page table structure.
//
// The relevant page table page might not exist yet.
// If this is true, and create == false, then pgdir_walk returns NULL.
// Otherwise, pgdir_walk allocates a new page table page with page_alloc.
// - If the allocation fails, pgdir_walk returns NULL.
// - Otherwise, the new page's reference count is incremented,
// the page is cleared,
// and pgdir_walk returns a pointer into the new page table page.
//
// Hint 1: you can turn a PageInfo * into the physical address of the
// page it refers to with page2pa() from kern/pmap.h.
//
// Hint 2: the x86 MMU checks permission bits in both the page directory
// and the page table, so it's safe to leave permissions in the page
// directory more permissive than strictly necessary.
//
// Hint 3: look at inc/mmu.h for useful macros that manipulate page
// table and page directory entries.
//
pte_t *
pgdir_walk(pde_t *pgdir, const void *va, int create)
{
// Fill this function in
pde_t* pde_ptr = pgdir + PDX(va);
if (!(*pde_ptr & PTE_P)) { //页表还没有分配
if (create) {
//分配一个页作为页表
struct PageInfo *pp = page_alloc(1);
if (pp == NULL) {
return NULL;
}
pp->pp_ref++;
*pde_ptr = (page2pa(pp)) | PTE_P | PTE_U | PTE_W; //更新页目录
} else {
return NULL;
}
}
return (pte_t *)KADDR(PTE_ADDR(*pde_ptr)) + PTX(va);
}
pgdir 最早的时候已经在 linker 的 end 后面分配了一个 page 去装这个 page table。virtual address PDX (高10位转换来的 page directory entry index) PTX(高10位后面10位转换来的 page table entry index)
boot_map_region()
参数:
pgdir:页目录指针
va:虚拟地址
size:大小
pa:物理地址
perm:权限
作用:通过修改pgdir指向的树,将[va, va+size)对应的虚拟地址空间映射到物理地址空间[pa, pa+size)。va和pa都是页对齐的。
// Map [va, va+size) of virtual address space to physical [pa, pa+size)
// in the page table rooted at pgdir. Size is a multiple of PGSIZE, and
// va and pa are both page-aligned.
// Use permission bits perm|PTE_P for the entries.
//
// This function is only intended to set up the ``static'' mappings
// above UTOP. As such, it should *not* change the pp_ref field on the
// mapped pages.
//
// Hint: the TA solution uses pgdir_walk
static void
boot_map_region(pde_t *pgdir, uintptr_t va, size_t size, physaddr_t pa, int perm)
{
// Fill this function in
size_t pgs = size / PGSIZE;
if (size % PGSIZE != 0) {
pgs++;
} //计算总共有多少页
for (int i = 0; i < pgs; i++) {
pte_t *pte = pgdir_walk(pgdir, (void *)va, 1);//获取va对应的PTE的地址
if (pte == NULL) {
panic("boot_map_region(): out of memory\n");
}
*pte = pa | PTE_P | perm; //修改va对应的PTE的值
pa += PGSIZE; //更新pa和va,进行下一轮循环
va += PGSIZE;
}
}
page_insert()
参数:
pgdir:页目录指针
pp:PageInfo结构指针,代表一个物理页
va:线性地址
perm:权限
返回值:0代表成功,-E_NO_MEM代表物理空间不足。
作用:修改pgdir对应的树结构,使va映射到pp对应的物理页处。
// Map the physical page 'pp' at virtual address 'va'.
// The permissions (the low 12 bits) of the page table entry
// should be set to 'perm|PTE_P'.
//
// Requirements
// - If there is already a page mapped at 'va', it should be page_remove()d.
// - If necessary, on demand, a page table should be allocated and inserted
// into 'pgdir'.
// - pp->pp_ref should be incremented if the insertion succeeds.
// - The TLB must be invalidated if a page was formerly present at 'va'.
//
// Corner-case hint: Make sure to consider what happens when the same
// pp is re-inserted at the same virtual address in the same pgdir.
// However, try not to distinguish this case in your code, as this
// frequently leads to subtle bugs; there's an elegant way to handle
// everything in one code path.
//
// RETURNS:
// 0 on success
// -E_NO_MEM, if page table couldn't be allocated
//
// Hint: The TA solution is implemented using pgdir_walk, page_remove,
// and page2pa.
//
int
page_insert(pde_t *pgdir, struct PageInfo *pp, void *va, int perm)
{
// Fill this function in
pte_t *pte = pgdir_walk(pgdir, va, 1); //拿到va对应的PTE地址,如果va对应的页表还没有分配,则分配一个物理页作为页表
if (pte == NULL) {
return -E_NO_MEM;
}
pp->pp_ref++; //引用加1
if ((*pte) & PTE_P) { //当前虚拟地址va已经被映射过,需要先释放
page_remove(pgdir, va); //这个函数目前还没实现
}
physaddr_t pa = page2pa(pp); //将PageInfo结构转换为对应物理页的首地址
*pte = pa | perm | PTE_P; //修改PTE
pgdir[PDX(va)] |= perm;
return 0;
}
如果将page_insert()函数看作是“增”,那么page_lookup()就是“查”。查找一个虚拟地址va,对应的物理地址信息。
//
// Return the page mapped at virtual address 'va'.
// If pte_store is not zero, then we store in it the address
// of the pte for this page. This is used by page_remove and
// can be used to verify page permissions for syscall arguments,
// but should not be used by most callers.
//
// Return NULL if there is no page mapped at va.
//
// Hint: the TA solution uses pgdir_walk and pa2page.
//
struct PageInfo *
page_lookup(pde_t *pgdir, void *va, pte_t **pte_store)
{
// Fill this function in
struct PageInfo *pp;
pte_t *pte = pgdir_walk(pgdir, va, 0); //如果对应的页表不存在,不进行创建
if (pte == NULL) {
return NULL;
}
if (!(*pte) & PTE_P) {
return NULL;
}
physaddr_t pa = PTE_ADDR(*pte); //va对应的物理
pp = pa2page(pa); //物理地址对应的PageInfo结构地址
if (pte_store != NULL) {
*pte_store = pte;
}
return pp;
}
page_remve()
参数:
pgdir:页目录地址
va:虚拟地址
作用:修改pgdir指向的树结构,解除va的映射关系。
// Unmaps the physical page at virtual address 'va'.
// If there is no physical page at that address, silently does nothing.
//
// Details:
// - The ref count on the physical page should decrement.
// - The physical page should be freed if the refcount reaches 0.
// - The pg table entry corresponding to 'va' should be set to 0.
// (if such a PTE exists)
// - The TLB must be invalidated if you remove an entry from
// the page table.
//
// Hint: The TA solution is implemented using page_lookup,
// tlb_invalidate, and page_decref.
//
void
page_remove(pde_t *pgdir, void *va)
{
// Fill this function in
pte_t *pte_store;
struct PageInfo *pp = page_lookup(pgdir, va, &pte_store); //获取va对应的PTE的地址以及pp结构
if (pp == NULL) { //va可能还没有映射,那就什么都不用做
return;
}
page_decref(pp); //将pp->pp_ref减1,如果pp->pp_ref为0,需要释放该PageInfo结构(将其放入page_free_list链表中)
*pte_store = 0; //将PTE清空
tlb_invalidate(pgdir, va); //失效化TLB缓存
}
Part 3: Kernel Address Space
JOS将线性地址空间分为两部分,由定义在inc/memlayout.h中的ULIM分割。ULIM以上的部分用户没有权限访问,内核有读写权限。
/*
* Virtual memory map: Permissions
* kernel/user
*
* 4 Gig --------> +------------------------------+
* | | RW/--
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* : . :
* : . :
* : . :
* |~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| RW/--
* | | RW/--
* | Remapped Physical Memory | RW/--
* | | RW/--
* KERNBASE, ----> +------------------------------+ 0xf0000000 --+
* KSTACKTOP | CPU0's Kernel Stack | RW/-- KSTKSIZE |
* | - - - - - - - - - - - - - - -| |
* | Invalid Memory (*) | --/-- KSTKGAP |
* +------------------------------+ |
* | CPU1's Kernel Stack | RW/-- KSTKSIZE |
* | - - - - - - - - - - - - - - -| PTSIZE
* | Invalid Memory (*) | --/-- KSTKGAP |
* +------------------------------+ |
* : . : |
* : . : |
* MMIOLIM ------> +------------------------------+ 0xefc00000 --+
* | Memory-mapped I/O | RW/-- PTSIZE
* ULIM, MMIOBASE --> +------------------------------+ 0xef800000
* | Cur. Page Table (User R-) | R-/R- PTSIZE
* UVPT ----> +------------------------------+ 0xef400000
* | RO PAGES | R-/R- PTSIZE
* UPAGES ----> +------------------------------+ 0xef000000
* | RO ENVS | R-/R- PTSIZE
* UTOP,UENVS ------> +------------------------------+ 0xeec00000
* UXSTACKTOP -/ | User Exception Stack | RW/RW PGSIZE
* +------------------------------+ 0xeebff000
* | Empty Memory (*) | --/-- PGSIZE
* USTACKTOP ---> +------------------------------+ 0xeebfe000
* | Normal User Stack | RW/RW PGSIZE
* +------------------------------+ 0xeebfd000
* | |
* | |
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* . .
* . .
* . .
* |~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
* | Program Data & Heap |
* UTEXT --------> +------------------------------+ 0x00800000
* PFTEMP -------> | Empty Memory (*) | PTSIZE
* | |
* UTEMP --------> +------------------------------+ 0x00400000 --+
* | Empty Memory (*) | |
* | - - - - - - - - - - - - - - -| |
* | User STAB Data (optional) | PTSIZE
* USTABDATA ----> +------------------------------+ 0x00200000 |
* | Empty Memory (*) | |
* 0 ------------> +------------------------------+ --+
*
* (*) Note: The kernel ensures that "Invalid Memory" is *never* mapped.
* "Empty Memory" is normally unmapped, but user programs may map pages
* there if desired. JOS user programs map pages temporarily at UTEMP.
*/
Exercise 5
该实验需要我们填充mem_init()中缺失的代码,使用part2的增删改函数初始化内核线性地址空间。
在mem_init()函数的check_page();后添加如下语句:
//
// Map 'pages' read-only by the user at linear address UPAGES
// Permissions:
// - the new image at UPAGES -- kernel R, user R
// (ie. perm = PTE_U | PTE_P)
// - pages itself -- kernel RW, user NONE
// Your code goes here:
// 将虚拟地址的UPAGES映射到物理地址pages数组开始的位置
boot_map_region(kern_pgdir, UPAGES, PTSIZE, PADDR(pages), PTE_U);
//
// Use the physical memory that 'bootstack' refers to as the kernel
// stack. The kernel stack grows down from virtual address KSTACKTOP.
// We consider the entire range from [KSTACKTOP-PTSIZE, KSTACKTOP)
// to be the kernel stack, but break this into two pieces:
// * [KSTACKTOP-KSTKSIZE, KSTACKTOP) -- backed by physical memory
// * [KSTACKTOP-PTSIZE, KSTACKTOP-KSTKSIZE) -- not backed; so if
// the kernel overflows its stack, it will fault rather than
// overwrite memory. Known as a "guard page".
// Permissions: kernel RW, user NONE
// Your code goes here:
// 'bootstack'定义在/kernel/entry.
boot_map_region(kern_pgdir, KSTACKTOP-KSTKSIZE, KSTKSIZE, PADDR(bootstack), PTE_W);
//
// Map all of physical memory at KERNBASE.
// Ie. the VA range [KERNBASE, 2^32) should map to
// the PA range [0, 2^32 - KERNBASE)
// We might not have 2^32 - KERNBASE bytes of physical memory, but
// we just set up the mapping anyway.
// Permissions: kernel RW, user NONE
// Your code goes here:
boot_map_region(kern_pgdir, KERNBASE, 0xffffffff - KERNBASE, 0, PTE_W);