Section 0: Introduction
MIT 这门公开课是基于 x86 一个简易的 OS kernel,我们想要了解 OS 的 topic 必须知道整个机器是如何上电启动的,比如你按下计算机的电源键,或者在 embedded device 上按下 reset button 整套 software 会怎么跑,最后进入我们熟知的 main function。
在不同的 Architecture 上,OS 被 hardware 带起来 (booting) 的方式是不同的。ARM 的 booting flow 和 X86 的 booting flow 发现其实都异曲同工。 简单来说整个 OS 起来的过程都是从 ROM -> Boot Loader -> OS control system。上电启动时 ARM 会去读取 physical address 0x0, 0x4 处内容分别设置好 stack pointer (SP) 和 program counter(PC)。然后 cpu 从 PC 里面 Reset Handler 的地址开始跑第一条指令。 接下来我们可以详细看 X86 的上电启动过程。
LAB 1 一共有12个问题,将逐一在后续解答。
环境设置:我是采用虚拟机上面跑 Ubuntu。
具体环境设置可以参考课程: https://pdos.csail.mit.edu/6.828/2018/tools.html
环境配置好后进入 dir 后应该如图所示,如果中间有疑问可以参考其他有关 MIT 6828 JOS 环境配置的 blog。
Section 1: PC Bootstrap
ROM -> Bootloader -> Kernel
这部分内容主要围绕硬件上电后的 sequence,此部分的代码无论在什么平台上应该是 non-volatile 的,即掉电之后依然在 memory 里面存在。RAM 掉电之后再上电则会丢失内容。 这部分 code 放的位置也是固定的,ARM 0x0 开始放,X86 放在 BIOS 处。之后会有 memory map 具体描述。
第一个问题其实没必要去完全理解整个 ASM language for X86,但这门课的预先要求其实还是蛮高的,最好要有良好的 C 语言背景,以及计算机组织架构的知识。
第二个问题让你会用 GDB 这里也不详细说了。
看整个 memory map 应该有一个 sense 就是哪段 memory 是 RAM ,哪段 memory 是 ROM。 具体装了什么 code, linker 的过程,以及 booting up 的过程会把 code 从哪里搬到哪里以及为什么。
这里总结一下:
上电后因为 CPU 架构的设计,会从 BIOS 0xFFFF0 开始跑 BIOS 的程序 The PC starts executing with CS = 0xf000 and IP = 0xfff0.
课程里面写到了这个信息,以及当我们使用 gdb 时第一条指令也是如此。
BIOS 接下来的工作就是为了把 Boot loader 从 Hard Disk 或者其他 memory 搬到 RAM 上面开始跑。(注意 RAM 的地址固定是 0x7C00)
BootLoader 和 kernel 你都可以自己随意 build (make),前提是 BIOS 固定, BootLoader LINK 的 address 必须 match 0x7C00。
BIOS 跑完后会开始跑 BootLoader,Bootloader 里面的 code 是为了让 kernel 跑起来做准备,同样也会从 disk 里面把 kernel 的 image 搬到特定地址开始跑 (0x100000)。
简而言之整个 flow 你可以看成两个 stage :
stage 1: ROM (BIOS 固化在memory中) -> Bootloader (从外面的memory里拿到 RAM 里跑)
stage 2: Bootloader -> Kernel
这样做的意义就是合理利用了整个 memory 的 architecture。
因为系统刚上来是没有运行 c code 的环境的所以要 BIOS 配置好硬件,然后再用一小段 code (Bootloader) 去 copy 较大的 kernel image。
有了这个整体的理解再好好回去读课程的介绍就会比较清晰。
When the BIOS runs, it sets up an interrupt descriptor table and initializes various devices such as the VGA display. This is where the “Starting SeaBIOS” message you see in the QEMU window comes from.
After initializing the PCI bus and all the important devices the BIOS knows about, it searches for a bootable device such as a floppy, hard drive, or CD-ROM. Eventually, when it finds a bootable disk, the BIOS reads the boot loader from the disk and transfers control to it.
我们可以更好地理解上述这段话。BIOS 我们看不到的,具体做了什么可以参考上述描述,总结就是 准备好硬件环境。
我之前做的ARM 的 ROM bootup (相当于BIOS) 其实也是要先开一些硬件,主要是设定一些寄存器 CPU/外设。
Section 2: PC Bootstrap
ROM -> Bootloader -> Kernel
现在进入 BootLoader (在 boot dir 里面的 boot.s, main.c)都可以看成 boot image。当 make 整个 project 的时候,会产生一个新的 directory ./obj,在下面有 boot,和 kernel 两个 folder。boot.out 就是生成的 img。其中 asm file 非常有用,特别是 debug 的时候,如果不会用 asm file debug 的话,这门课的预先要求其实是达不到的。
粗略来说就是一个 image 由 1.c, 2.c, 3.c 等 c source file 经过 compile 然后 link 最后产生了一个 image,同时 tool chain 也会给你一个 asm file,最后如果有.bin (里面全是0101这样的机器码)是由 tool chain 中相当于翻译官角色的编译器将 asm file (每个平台上有针对各自架构 cpu 指令集不同的编译器) 翻译成机器码,asm 可以让我们知道每个地址放的指令,以及 CPU 执行指令的顺序等重要信息。
Q1 :At what point does the processor start executing 32-bit code? What exactly causes the switch from 16- to 32-bit mode?
A : 读boot.s里面的注释较为清晰。
lgdt gdtdesc
movl %cr0, %eax
orl $CR0_PE_ON, %eax
movl %eax, %cr0
ljmp $PROT_MODE_CSEG, $protcseg
Q2: What is the last instruction of the boot loader executed
A:从disk copy 完后跳转到 kernel上面去。在main.c中:
((void (*)(void)) (ELFHDR->e_entry))();
在boot.asm中:具体的asm中的数字可能不同由于我们使用平台不同。
7d6b: ff 15 18 00 01 00 call *0x10018
Q3: and what is the first instruction of the kernel it just loaded?
A: 当然这个是在kernel.asm中找。刚才我们不是 call *0x10018吗,18这个地址里装的内容就是0xf010000c。
f010000c: 66 c7 05 72 04 00 00 movw $0x1234,0x472
(gdb) x/1x 0x10018
0x10018: 0x0010000c
Q4:How does the boot loader decide how many sectors it must read in order to fetch the entire kernel from disk? Where does it find this information?
A: 在 structure 里面都会装这些信息的,我们 build 出来的kernel image 在固定的地址处放的内容就是这些信息,到时候 boot loader直接去拿就好了。
ph = (struct Proghdr *) ((uint8_t *) ELFHDR + ELFHDR->e_phoff);
eph = ph + ELFHDR->e_phnum;
for (; ph < eph; ph++)
readseg(ph->p_pa, ph->p_memsz, ph->p_offset);
Exercise 4 C 语言指针,这里不展开讲,可以参考其他指针讲解的文章。之后有时间会上传一版我自己的 blog。
Exercise 5 改 link address,这里我们如果把 boot loader 的地址改掉,让它不等于默认值 0x7c00,显然是 boot 不起来的,因为 BIOS 默认的地址就是 0x7c00。BIOS 之前提到是固化的代码,改不了。
ljmp $PROT_MODE_CSEG, $protcseg
Right:
[ 0:7c2d] 0x7c2d: ljmp $0x8,$0x7c32
The target architecture is assumed to be i386
0x7c32: mov $0x10,%ax
0x7c36: mov %eax,%ds
0x7c38: mov %eax,%es
0x7c3a: mov %eax,%fs
Wrong:
[ 0:7c2d] 0x7c2d: ljmp $0x8,$0x7c36
[f000:e05b] 0xfe05b: cmpl $0x0,%cs:0x66d4
[f000:e062] 0xfe062: jne 0xfd3da
[f000:d3da] 0xfd3da: cli
[f000:d3db] 0xfd3db: cld
这个问题其实值得深挖一下,我们做个试验,将 link address 改成 0x7e00。然后设置 break point 在 0x7c00 处,最开始无论 0x7c00, 0x7e00 都是没有内容的全是 0x0。当我们跑到 break point 的时候,0x7c00 里面开始有值,就是 BIOS 搬过去的。但是 Jump instruction 会让程序跑到 0x7e00 这个位置也就没有任何内容。会进中断。
Exercise 6 让你不用 gdb 自己想为什么。就是搬 memory 咯,BootLoader 里面 boot main 搬的。
Section 3: Kernel
ROM -> Bootloader -> Kernel
关于 virtual address 的部分目前不是太熟悉,需要 lab 2 之后再填补这方面知识的空白,所以我们先跳过 virtual memory 的部分。
Exercise 7: Use QEMU and GDB to trace into the JOS kernel and stop at the movl %eax, %cr0. Examine memory at 0x00100000 and at 0xf0100000. Now, single step over that instruction using the stepi GDB command. Again, examine memory at 0x00100000 and at 0xf0100000. Make sure you understand what just happened.
A: Paging enabled.
具体 gdb 就不贴了,就是执行完这个 instruction 之后 0xf0100000 处的content和 0x100000 会一致。
Exercise 8 见 GitHub https://github.com/Jason0LiYaoCN/JOS
Q1: Explain the interface between printf.c and console.c. Specifically, what function does console.c export? How is this function used by printf.c?
console.c exports cputchar getchar iscons, while cputchar is used as a parameter when printf.c calls vprintfmt in printfmt.c
Q2: Explain the following from console.c:
if (crt_pos >= CRT_SIZE) {
int i;
memcpy(crt_buf, crt_buf + CRT_COLS, (CRT_SIZE - CRT_COLS) * sizeof(uint16_t));
for (i = CRT_SIZE - CRT_COLS; i < CRT_SIZE; i++)
crt_buf[i] = 0x0700 | ' ';
crt_pos -= CRT_COLS;
}
When the screen is full, scroll down one row to show newer information.
Q3: For the following questions you might wish to consult the notes for Lecture 2. These notes cover GCC’s calling convention on the x86. Trace the execution of the following code step-by-step:
int x = 1, y = 3, z = 4;
cprintf("x %d, y %x, z %d\n", x, y, z);
In the call to cprintf(), to what does fmt point? To what does ap point?
In the call to cprintf(), fmt point to the format string of its arguments, ap points to the variable arguments after fmt.
Q4: Run the following code.
unsigned int i = 0x00646c72;
cprintf("H%x Wo%s", 57616, &i);
The output is He110 World, because 57616=0xe110, so the first half of output is He110, i=0x00646c72 is treated as a string, so it will be printed as ‘r’=(char)0x72 ‘l’=(char)0x6c ‘d’=(char)0x64, and 0x00 is treated as a mark of end of string.
Q5: In the following code, what is going to be printed after ‘y=’? (note: the answer is not a specific value.) Why does this happen?
cprintf("x=%d y=%d", 3);
It will be the decimal value of the 4 bytes right above where 3 is placed in the stack.
Q6: Let’s say that GCC changed its calling convention so that it pushed arguments on the stack in declaration order, so that the last argument is pushed last. How would you have to change cprintf or its interface so that it would still be possible to pass it a variable number of arguments?
Push an integer after the last argument indicating the number of arguments.
Challenge: Enhance the console to allow text to be printed in different colors. The traditional way to do this is to make it interpret ANSI escape sequences embedded in the text strings printed to the console, but you may use any mechanism you like. There is plenty of information on the 6.828 reference page and elsewhere on the web on programming the VGA display hardware. If you’re feeling really adventurous, you could try switching the VGA hardware into a graphics mode and making the console draw text onto the graphical frame buffer.
具体可以看我的 Github 里的 code 在 kernel.c monitor.c printfmt.c 中均有改动。
Exercise 9. Determine where the kernel initializes its stack, and exactly where in memory its stack is located. How does the kernel reserve space for its stack? And at which “end” of this reserved area is the stack pointer initialized to point to?
In the entry.S initialize the stack:
# Clear the frame pointer register (EBP)
# so that once we get into debugging C code,
# stack backtraces will be terminated properly.
movl $0x0,%ebp # nuke frame pointer
# Set the stack pointer
movl $(bootstacktop),%esp
In the kernel.asm: 具体地址是0xf0110000
# Clear the frame pointer register (EBP)
# so that once we get into debugging C code,
# stack backtraces will be terminated properly.
movl $0x0,%ebp # nuke frame pointer
f010002f: bd 00 00 00 00 mov $0x0,%ebp
# Set the stack pointer
movl $(bootsmdtacktop),%esp
f0100034: bc 00 60 11 f0 mov $0xf0110000,%esp
.data
###################################################################
# boot stack
###################################################################
.p2align PGSHIFT # force page alignment
.globl bootstack
bootstack:
.space KSTKSIZE
.globl bootstacktop
bootstacktop:
Exercise 10. To become familiar with the C calling conventions on the x86, find the address of the test_backtrace function in obj/kern/kernel.asm, set a breakpoint there, and examine what happens each time it gets called after the kernel starts. How many 32-bit words does each recursive nesting level of test_backtrace push on the stack, and what are those words?
Note that, for this exercise to work properly, you should be using the patched version of QEMU available on the tools page or on Athena. Otherwise, you’ll have to manually translate all breakpoint and memory addresses to linear addresses.
Exercise 11. Implement the backtrace function as specified above. Use the same format as in the example, since otherwise the grading script will be confused.
Exercise 12. Modify your stack backtrace function to display, for each eip, the function name, source file name, and line number corresponding to that eip.
见github。
int
mon_backtrace(int argc, char **argv, struct Trapframe *tf)
{
// Your code here.
uint32_t *ebp = (uint32_t *)read_ebp(); //获取ebp的值
while (ebp != 0) { //终止条件是ebp为0
//打印ebp, eip, 最近的五个参数
uint32_t eip = *(ebp + 1);
cprintf("ebp %08x eip %08x args %08x %08x %08x %08x %08x\n", ebp, eip, *(ebp + 2), *(ebp + 3), *(ebp + 4), *(ebp + 5), *(ebp + 6));
//更新ebp
ebp = (uint32_t *)(*ebp);
}
return 0;
}
result:
6828 decimal is 15254 octal!
entering test_backtrace 5
entering test_backtrace 4
entering test_backtrace 3
entering test_backtrace 2
entering test_backtrace 1
entering test_backtrace 0
Stack backtrace:
ebp f010ff18 eip f0100087 args 00000000 00000000 00000000 00000000 f0100a8c
kern/init.c:21: test_backtrace+71
ebp f010ff38 eip f0100069 args 00000000 00000001 f010ff78 00000000 f0100a8c
kern/init.c:18: test_backtrace+41
ebp f010ff58 eip f0100069 args 00000001 00000002 f010ff98 00000000 f0100a8c
kern/init.c:18: test_backtrace+41
ebp f010ff78 eip f0100069 args 00000002 00000003 f010ffb8 00000000 f0100a8c
kern/init.c:18: test_backtrace+41
ebp f010ff98 eip f0100069 args 00000003 00000004 00000000 00000000 00000000
kern/init.c:18: test_backtrace+41
ebp f010ffb8 eip f0100069 args 00000004 00000005 00000000 00010094 00010094
kern/init.c:18: test_backtrace+41
ebp f010ffd8 eip f01000ea args 00000005 00001aac 00000648 00000000 00000000
kern/init.c:45: i386_init+77
ebp f010fff8 eip f010003e args 00111021 00000000 00000000 00000000 00000000
kern/entry.S:83: +0
leaving test_backtrace 0
leaving test_backtrace 1
leaving test_backtrace 2
leaving test_backtrace 3
leaving test_backtrace 4
leaving test_backtrace 5
Welcome to the JOS kernel monitor!
Type ‘help’ for a list of commands.
blue
green
red
K>