MIT6.828 Fall 2012 Lab 1: Booting a PC

首先感慨一下好长时间没有写博客了啊~好几次准备写,但是写着写着就觉得写不下去了。开通这个博客的初衷就是为了锻炼书面表达能力,在此鼓励下自己,即使写起来很费劲,但是还是得硬着头皮写啊~


再说这个6.828lab,才第一个lab就已经体会到涉及到的知识非常多,要想理解需要看很多各个方面的资料,好在实验里都列出来各个部分需要看的资料了~


环境搭建:

    我电脑的环境是Ubuntu12.04LTS, gcc4.6.3。源代码的clone还有QEMU的安装直接按照这里http://pdos.csail.mit.edu/6.828/2012/labs/lab1/ 的步骤一步一步来即可。值得提醒一下的是应该尽量选择最新的课程(目前是Fall2012),因为gcc版本等原因,过早之前的课程所使用的Bochs软件以及源码在Ubuntu12.04上可能无法安装、编译。另外下列命令需要执行等待较长一段时间,需要耐心等待。

git clone http://pdos.csail.mit.edu/6.828/2012/jos.git lab

Part 1: PC Bootstrap

Getting Started with x86 assembly

Exercise 1. Familiarize yourself with the assembly language materials available on the 6.828 reference page. You don't have to read them now, but you'll almost certainly want to refer to some of this material when reading and writing x86 assembly.

We do recommend reading the section "The Syntax" in Brennan's Guide to Inline Assembly. It gives a good (and quite brief) description of the AT&T assembly syntax we'll be using with the GNU assembler in JOS.


   练习1主要是介绍x86汇编。之前学过一些汇编语言,不过使用的是Intel句法的NASM,而6.828使用的主要是AT&T格式,关于二者的差别在Exercise 1中提到的Brennan's Guide to Inline Assembly中有总结。对于阅读汇编来说,需要记住的主要区别是,对于Intel格式的汇编,目标地址(寄存器)在左边,源地址(寄存器)在右边,而AT&T格式刚好相反,记住下面的例子即可记住:

 load ebx with the value in eax:
AT&T:  movl %eax, %ebx
Intel: mov ebx, eax

Simulating the x86

  略

The PC's Physical Address Space

+------------------+  <- 0xFFFFFFFF (4GB)
|      32-bit      |
|  memory mapped   |
|     devices      |
|                  |
/\/\/\/\/\/\/\/\/\/\

/\/\/\/\/\/\/\/\/\/\
|                  |
|      Unused      |
|                  |
+------------------+  <- depends on amount of RAM
|                  |
|                  |
| Extended Memory  |
|                  |
|                  |
+------------------+  <- 0x00100000 (1MB)
|     BIOS ROM     |
+------------------+  <- 0x000F0000 (960KB)
|  16-bit devices, |
|  expansion ROMs  |
+------------------+  <- 0x000C0000 (768KB)
|   VGA Display    |
+------------------+  <- 0x000A0000 (640KB)
|                  |
|    Low Memory    |
|                  |
+------------------+  <- 0x00000000

        上图是现代PC物理地址空间的总体分布。从图中可以看出,RAM空间被分割为了两部分:0x00000000-0x000A000和0x00100000-0xFFFFFFFF。因为对于之前的16位处理器,最大寻址空间是1MB(分段模式),当时的物理地址空间分布情况如上图0x00000000-0x00100000(1MB)范围所示。后来处理器位数增加,寻址范围变大之后,为了向后兼容便保留了低1MB地址空间的布局,形成了现在的地址空间形式。


The ROM BIOS

Exercise 2. Use GDB's si (Step Instruction) command to trace into the ROM BIOS for a few more instructions, and try to guess what it might be doing. You might want to look at Phil Storrs I/O Ports Description, as well as other materials on the 6.828 reference materials page. No need to figure out all the details - just the general idea of what the BIOS is doing first.


利用gdb跟踪了一些BIOS前面一些指令,会有一些设置中断、NMI的工作,其中0x70、0x71是port I/O地址,应该是和NMI有关(http://wiki.osdev.org/CMOS)。寄存器%cro是控制寄存器,主要和中断控制、地址模式切换、分页控制等有关(http://en.wikipedia.org/wiki/Control_register),再往下的指令就没有看了。

Part 2: The Boot Loader

        80x86结构的CPU启动之后会自动进入实模式,并开始执行位于0xFFFF0处的BIOS指令,BIOS执行系统检测,从物理地址0开始初始化中断向量表。然后,它会将可启动设备的第一个扇区(引导扇区,512字节)读入内存绝对地址0x7C00处,并跳转到0x7C00处。

        JOS的boot.S(boot/boot.S)是用汇编语言编写的,它会被BIOS读入到内存绝对地址0x7c00出,它的主要工作是将CPU从实模式(real model)切换到保护模式(protected model),切换到保护模式之后便可以访问超过1MB的地址空间。

         boot.S然后会跳转到由C语言编写的bootmain()(boot/main.c)中执行。bootmain主要工作是将位于1扇区的内核程序读入到内存地址0x10000处,然后再根据ELF格式,将各个段读入到指定的加载地址处。读入完毕之后会跳转到内核程序入口地址处。

Part 3: The Kernel

Exercise 8. We have omitted a small fragment of code - the code necessary to print octal numbers using patterns of the form "%o". Find and fill in this code fragment.


	// (unsigned) octal
		case 'o':
			num = getuint(&ap,lflag);
			base = 8;
			goto number;

		// pointer
		case 'p':
			putch('0', putdat);
			putch('x', putdat);
			num = (unsigned long long)
				(uintptr_t) va_arg(ap, void *);
			base = 16;
			goto number;


Exercise 9.
 Determine where the kernel initializes its stack, and exactly where in memory its stack is located. How does the kernel reserve space for its stack? And at which "end" of this reserved area is the stack pointer initialized to point to?


内核初始化堆栈是在entry.S中:

relocated:

	# Clear the frame pointer register (EBP)
	# so that once we get into debugging C code,
	# stack backtraces will be terminated properly.
	movl	$0x0,%ebp			# nuke frame pointer

	# Set the stack pointer
	movl	$(bootstacktop),%esp

	# now to C code
	call	i386_init

Exercise 11. Implement the backtrace function as specified above. Use the same format as in the example, since otherwise the grading script will be confused. When you think you have it working right, run make grade to see if its output conforms to what our grading script expects, and fix it if it doesn't. After you have handed in your Lab 1 code, you are welcome to change the output format of the backtrace function any way you like.


Exercise 12. Modify your stack backtrace function to display, for each eip, the function name, source file name, and line number corresponding to that eip.

In debuginfo_eip, where do __STAB_* come from? This question has a long answer; to help you to discover the answer, here are some things you might want to do:

  • look in the file kern/kernel.ld for __STAB_*
  • run i386-jos-elf-objdump -h obj/kern/kernel
  • run i386-jos-elf-objdump -G obj/kern/kernel
  • run i386-jos-elf-gcc -pipe -nostdinc -O2 -fno-builtin -I. -MD -Wall -Wno-format -DJOS_KERNEL -gstabs -c -S kern/init.c, and look at init.s.
  • see if the bootloader loads the symbol table in memory as part of loading the kernel binary

Complete the implementation of debuginfo_eip by inserting the call to stab_binsearch to find the line number for an address.

Add a backtrace command to the kernel monitor, and extend your implementation of mon_backtrace to call debuginfo_eip and print a line for each stack frame of the form:

K> backtrace
Stack backtrace:
  ebp f010ff78  eip f01008ae  args 00000001 f010ff8c 00000000 f0110580 00000000
         kern/monitor.c:143: monitor+106
  ebp f010ffd8  eip f0100193  args 00000000 00001aac 00000660 00000000 00000000
         kern/init.c:49: i386_init+59
  ebp f010fff8  eip f010003d  args 00000000 00000000 0000ffff 10cf9a00 0000ffff
         kern/entry.S:70: <unknown>+0
K> 

Each line gives the file name and line within that file of the stack frame's eip, followed by the name of the function and the offset of the eip from the first instruction of the function (e.g., monitor+106 means the return eip is 106 bytes past the beginning of monitor).

Be sure to print the file and function names on a separate line, to avoid confusing the grading script.

Tip: printf format strings provide an easy, albeit obscure, way to print non-null-terminated strings like those in STABS tables.printf("%.*s", length, string) prints at most length characters of string. Take a look at the printf man page to find out why this works.

You may find that some functions are missing from the backtrace. For example, you will probably see a call to monitor() but not to runcmd(). This is because the compiler in-lines some function calls. Other optimizations may cause you to see unexpected line numbers. If you get rid of the -O2 from GNUMakefile, the backtraces may make more sense (but your kernel will run more slowly).


完成Exercise11、12需要添加自己的代码,一处是在kern/kdebug.c:debuginfo_eip()中,添加完之后的完整函数如下:

// debuginfo_eip(addr, info)
//
//	Fill in the 'info' structure with information about the specified
//	instruction address, 'addr'.  Returns 0 if information was found, and
//	negative if not.  But even if it returns negative it has stored some
//	information into '*info'.
//
int
debuginfo_eip(uintptr_t addr, struct Eipdebuginfo *info)
{
	const struct Stab *stabs, *stab_end;
	const char *stabstr, *stabstr_end;
	int lfile, rfile, lfun, rfun, lline, rline;

	// Initialize *info
	info->eip_file = "<unknown>";
	info->eip_line = 0;
	info->eip_fn_name = "<unknown>";
	info->eip_fn_namelen = 9;
	info->eip_fn_addr = addr;
	info->eip_fn_narg = 0;

	// Find the relevant set of stabs
	if (addr >= ULIM) {
		stabs = __STAB_BEGIN__;
		stab_end = __STAB_END__;
		stabstr = __STABSTR_BEGIN__;
		stabstr_end = __STABSTR_END__;
	} else {
		// Can't search for user-level addresses yet!
  	        panic("User address");
	}

	// String table validity checks
	if (stabstr_end <= stabstr || stabstr_end[-1] != 0)
		return -1;

	// Now we find the right stabs that define the function containing
	// 'eip'.  First, we find the basic source file containing 'eip'.
	// Then, we look in that source file for the function.  Then we look
	// for the line number.

	// Search the entire set of stabs for the source file (type N_SO).
	lfile = 0;
	rfile = (stab_end - stabs) - 1;
	stab_binsearch(stabs, &lfile, &rfile, N_SO, addr);
	if (lfile == 0)
		return -1;

	// Search within that file's stabs for the function definition
	// (N_FUN).
	lfun = lfile;
	rfun = rfile;
	stab_binsearch(stabs, &lfun, &rfun, N_FUN, addr);

	if (lfun <= rfun) {
		// stabs[lfun] points to the function name
		// in the string table, but check bounds just in case.
		if (stabs[lfun].n_strx < stabstr_end - stabstr)
			info->eip_fn_name = stabstr + stabs[lfun].n_strx;
		info->eip_fn_addr = stabs[lfun].n_value;
		addr -= info->eip_fn_addr;
		// Search within the function definition for the line number.
		lline = lfun;
		rline = rfun;
	} else {
		// Couldn't find function stab!  Maybe we're in an assembly
		// file.  Search the whole file for the line number.
		info->eip_fn_addr = addr;
		lline = lfile;
		rline = rfile;
	}
	// Ignore stuff after the colon.
	info->eip_fn_namelen = strfind(info->eip_fn_name, ':') - info->eip_fn_name;


	// Search within [lline, rline] for the line number stab.
	// If found, set info->eip_line to the right line number.
	// If not found, return -1.
	//
	// Hint:
	//	There's a particular stabs type used for line numbers.
	//	Look at the STABS documentation and <inc/stab.h> to find
	//	which one.
	// Your code here.
	stab_binsearch(stabs, &lline, &rline, N_SLINE, addr);
	if (lline <= rline) {
		info->eip_line = stabs[lline].n_desc;		
	} else {
		return -1;
	}


	// Search backwards from the line number for the relevant filename
	// stab.
	// We can't just use the "lfile" stab because inlined functions
	// can interpolate code from a different file!
	// Such included source files use the N_SOL stab type.
	while (lline >= lfile
	       && stabs[lline].n_type != N_SOL
	       && (stabs[lline].n_type != N_SO || !stabs[lline].n_value))
		lline--;
	if (lline >= lfile && stabs[lline].n_strx < stabstr_end - stabstr)
		info->eip_file = stabstr + stabs[lline].n_strx;


	// Set eip_fn_narg to the number of arguments taken by the function,
	// or 0 if there was no containing function.
	if (lfun < rfun)
		for (lline = lfun + 1;
		     lline < rfun && stabs[lline].n_type == N_PSYM;
		     lline++)
			info->eip_fn_narg++;

	return 0;
}

另外就是主要添加的kern/monitor.c : mon_backtrace()函数了,我写的函数如下:

int
mon_backtrace(int argc, char **argv, struct Trapframe *tf)
{
	// Your code here.
	uint32_t *ebp = 0;
	uint32_t *saved_ebp = 0;
	uint32_t *eip = (uint32_t*)mon_backtrace; 
	uint32_t *arg = 0;
	int       ret = 0;
	int       i = 0;
	struct Eipdebuginfo info;

	cprintf("Stack backtrace:\n");

	ebp = (uint32_t*)read_ebp();

	for (; ebp ; ebp = saved_ebp) { 
		saved_ebp  = (uint32_t*)(*ebp); 
		debuginfo_eip((uintptr_t)eip, &info);
		eip = (uint32_t*)(*(ebp+1));
				
		cprintf("  ebp %p  eip %p  args", ebp, eip);

		for(arg = ebp + 2, i = 0; i < info.eip_fn_narg; ++arg, ++i) {
			cprintf(" %08x", *arg); 
		}
		cprintf("\n          %s:%d: %.*s+%u\n", 
				info.eip_file, 
				info.eip_line,
				info.eip_fn_namelen,
				info.eip_fn_name,
				(uintptr_t)eip-info.eip_fn_addr
				);
	}
	return 0;
}

上面的函数有不对的地方:输出函数参数时,只考虑参数个数,没有考虑参数的类型。

你可能感兴趣的:(MIT6.828,lab1)