The previous post described motherboards and the memory map in Intel computers to set the scene for the initial phases of boot. Booting is an involved, hacky, multi-stage affair – fun stuff. Here’s an outline of the process:
前篇文章我们阐述的是Intel PC机主板布局以及内存映射知识,为本篇计算机引导过程奠定了基础。 计算机的引导过程一般比较复杂,软件上使用了很多技巧,并且通常是通过多个阶段协调工作得以完成,总之是个很有趣的过程。 这里有计算机引导过程的简图:
An outline of the boot sequence
Things start rolling when you press the power button on the computer (no! do tell!). Once the motherboard is powered up it initializes its own firmware – the chipset and other tidbits – and tries to get the CPU running. If things fail at this point (e.g., the CPU is busted or missing) then you will likely have a system that looks completely dead except for rotating fans. A few motherboards manage to emit beeps for an absent or faulty CPU, but the zombie-with-fans state is the most common scenario based on my experience. Sometimes USB or other devices can cause this to happen: unplugging allnon-essential devices is a possible cure for a system that was working and suddenly appears dead like this. You can then single out the culprit device by elimination.
当我们按下计算机的电源按钮,计算机就开始运转了(当然,现在别按^_^)。 一旦主板上电后,它就初始化自身的固件,初始化芯片组和其他组件,并且尝试启动CPU。 如果在这个阶段失败了(也就是说没有检测到CPU或者CPU损坏了),那么计算机除了风扇在转动外,整个系统是完全没有响应的。 一些主板会因为没有检测到CPU或是发现CPU有故障时发出蜂鸣声,以示警告,但是以我的经验,大部分计算机都成僵死状态,并无警报声。 有的时候,USB设备或者其它相关设备也会引起计算机的这种僵死状态:当计算机工作得好好的,突然呈现这种僵死状态时,可以尝试拔掉所有的非必要设备,也许可以解决问题。 你也可以一一拔掉这些设备,从而找到引起计算机故障的设备。
If all is well the CPU starts running. In a multi-processor or multi-core system one CPU is dynamically chosen to be the bootstrap processor (BSP) that runs all of the BIOS and kernel initialization code. The remaining processors, called application processors (AP) at this point, remain halted until later on when they are explicitly activated by the kernel. Intel CPUs have been evolving over the years but they’re fully backwards compatible, so modern CPUs can behave like the original 1978 Intel 8086, which is exactly what they do after power up. In this primitive power up state the processor is in real mode with memory paging disabled. This is like ancient MS-DOS where only 1 MB of memory can be addressed and any code can write to any place in memory – there’s no notion of protection or privilege.
如果一切正常,CPU就开始运行了。在一个多处理器或者多核系统中,会动态地选择一个CPU作为自引导处理器(BSP)去运行BIOS代码以及内核初始化代码。 其它的CPU,也被称为应用处理器(AP),依然处于停止状态,直到内核启动后显示地激活它们。 Intel CPU尽管已经发展多年了,但是它们完全是向前兼容的,所以现代的CPU在机器上电之后所做的工作依然可以像1987年的8086处理器一样。 在上电启动后,CPU处于实模式模式。并且分页功能是禁止的。 这有点像曾经的MS-DOS系统,只能访问1M的物理内存,并且程序可以读写内存的任何地方 -- 可见那时根本没有保护和优先级的概念。
Most registers in the CPU have well-defined values after power up, including the instruction pointer (EIP) which holds the memory address for the instruction being executed by the CPU. Intel CPUs use a hack whereby even though only 1MB of memory can be addressed at power up, a hidden base address (an offset, essentially) is applied to EIP so that the first instruction executed is at address 0xFFFFFFF0 (16 bytes short of the end of 4 gigs of memory and well above one megabyte). This magical address is called the reset vector and is standard for modern Intel CPUs.
在计算机上电之后,CPU内部的寄存器都被初始化为相应的值,包括指令指针寄存器(EIP),CPU就是依据该寄存器存储的地址值来运行指令的。 上电后,尽管CPU处在实模式,只能访问1M的物理内存,但是执行的第一条指令地址却在0xFFFFFFF0处(离4G内存末端仅16字节,远超过1M内存范围),这是因为CS段寄存器的段描述符缓冲部分中的基地址在上电后或者重启后的初始值是0xFFFF0000,EIP被初始化为0xFFF0,两个值相加就得到了0xFFFFFFF0这个地址。 这个特殊的地址被称作复位向量,并且已经成为现代CPU的标准。
The motherboard ensures that the instruction at the reset vector is a jump to the memory location mapped to the BIOS entry point. This jump implicitly clears the hidden base address present at power up. All of these memory locations have the right contents needed by the CPU thanks to the memory map kept by the chipset. They are all mapped to flash memory containing the BIOS since at this point the RAM modules have random crap in them. An example of the relevant memory regions is shown below:
主板可以确保复位向量中保存的是一个跳转指令,该指令跳转到BIOS执行入口点所在的内存映射地址。 在该跳转指令执行的同时也会隐式地清除CS段寄存器中隐藏的基地址(这个基地址是在上电阶段初始化的)。 当然,也多亏了芯片组中的内存映射表,才使得这些内存地址处保存着CPU期望的内容。 这些地址都被映射到BIOS所在的flash内存中,这是因为在这个阶段,真实的RAM模块中都是些随机垃圾值。 下图展现的是相关的内存区域:
Important memory regions during boot
The CPU then starts executing BIOS code, which initializes some of the hardware in the machine. Afterwards the BIOS kicks off the Power-on Self Test (POST) which tests various components in the computer. Lack of a working video card fails the POST and causes the BIOS to halt and emit beeps to let you know what’s wrong, since messages on the screen aren’t an option. A working video card takes us to a stage where the computer looks alive: manufacturer logos are printed, memory starts to be tested, angels blare their horns. Other POST failures, like a missing keyboard, lead to halts with an error message on the screen. The POST involves a mixture of testing and initialization, including sorting out all the resources – interrupts, memory ranges, I/O ports – for PCI devices. Modern BIOSes that follow the Advanced Configuration and Power Interface build a number of data tables that describe the devices in the computer; these tables are later used by the kernel.
紧接着,CPU开始执行BIOS代码,初始化机器中的一些硬件。 之后BIOS开始执行开机自检程序,用来检测计算机中的各部分组件。 如果检测到没有显卡的话,那么BIOS指令就会停止运行,并且发出蜂鸣声告诉我们出错了,对没有显卡这样的错误是无法容忍的,因为在显示器上显示信息是必须的。因为如果显卡存在并且可以正常工作,我们就可以轻易地根据显示在显示器上的信息知道计算机的活动:打印生产厂商的商标图案,显示正在检测内存等等信息。 其它的检测错误,譬如没有检测到键盘,也会导致计算机停止运行,并且在显示器上打印相应的错误信息。 开机自检是对计算机复杂的检测和初始化的过程,其中也包括为各种PCI设备设置系统资源 -- 设置中断号,分配内存区域,以及设置IO端口号。 现如今的BIOS都遵循高级配置与电源接口协议(ACPI)创建描述设备的数据表格,这些表格数据会被之后启动的内核使用。
After the POST the BIOS wants to boot up an operating system, which must be found somewhere: hard drives, CD-ROM drives, floppy disks, etc. The actual order in which the BIOS seeks a boot device is user configurable. If there is no suitable boot device the BIOS halts with a complaint like “Non-System Disk or Disk Error.” A dead hard drive might present with this symptom. Hopefully this doesn’t happen and the BIOS finds a working disk allowing the boot to proceed.
自检完并且一切正常之后,BIOS就可以引导一个操作系统了,当然该操作系统应该存在于某个存储介质上:硬盘,CD光盘,软盘等等。 用户是可以设置BIOS寻找引导设备的顺序的。 当没有检测到可用的引导设备时,BIOS就会停止运行,并且向用户抱怨“没有系统引导设备或者引导设备损坏”。 譬如,当硬盘出现故障时,就会导致此类错误。 当一切顺利,BIOS会找到相应的引导设备,继续运行。
The BIOS now reads the first 512-byte sector (sector zero) of the hard disk. This is called the Master Boot Record and it normally contains two vital components: a tiny OS-specific bootstrapping program at the start of the MBR followed by a partition table for the disk. The BIOS however does not care about any of this: it simply loads the contents of the MBR into memory location 0x7c00 and jumps to that location to start executing whatever code is in the MBR.
现在BIOS读取硬盘的第一扇区,大小512字节。 此扇区也被称为主引导扇区,一般由两个关键的部分组成:开始是一个操作系统的自举程序,紧接着该程序是该硬盘的分区表。 BIOS是不管主引导扇区里是什么数据的,它仅仅要做的是加载主引导扇区的数据到内存的0x7c00地址处,接着跳转到该地址运行引导扇区上的指令。
(译者注: 其实扇区是从1开始编号的)Master Boot Record
The specific code in the MBR could be a Windows MBR loader, code from Linux loaders such as LILO or GRUB, or even a virus. In contrast the partition table is standardized: it is a 64-byte area with four 16-byte entries describing how the disk has been divided up (so you can run multiple operating systems or have separate volumes in the same disk). Traditionally Microsoft MBR code takes a look at the partition table, finds the (only) partition marked as active, loads the boot sector for thatpartition, and runs that code. The boot sector is the first sector of a partition, as opposed to the first sector for the whole disk. If something is wrong with the partition table you would get messages like “Invalid Partition Table” or “Missing Operating System.” This message does not come from the BIOS but rather from the MBR code loaded from disk. Thus the specific message depends on the MBR flavor.
MBR中的代码可以是Windows的引导装载程序,也可以是linux的引导加载程序(譬如我们熟知的LILO或者GRUB),甚至也可能是个病毒程序。 相反,分区表的内容却是标准不变的:64字节被均分为4项,用来记录硬盘的分区情况(因此一个硬盘拥有多个卷标,并且在一个硬盘上可以安装多个操作系统)。 传统Windows的MBR代码会读取分区表信息,找到系统中唯一的激活的主分区,加载该分区的引导扇区代码,并执行其中的代码。 引导扇区是一个分区的第一块扇区,而不一定就是硬盘的第一块扇区。 如果系统检测到分区表发生错误,你就会得到诸如“无效的分区表”或者“丢失操作系统”类似的警告信息。 注意此类警告信息并不是BIOS打印的,而是来自MBR中的代码。 因此这些信息完全依赖与MBR中的代码内容。
Boot loading has gotten more sophisticated and flexible over time. The Linux boot loaders Lilo and GRUB can handle a wide variety of operating systems, file systems, and boot configurations. Their MBR code does not necessarily follow the “boot the active partition” approach described above. But functionally the process goes like this:
随着时间的推移,引导装载过程已经变得越来越复杂了,并且也越来越灵活了。 Linux的引导装载程序LILO和GRUB已经可以引导加载很多不同的操作系统,识别各式各样的文件系统,并且是可以配置的。它们并不需要像上面描述的那样,从激活分区中加载引导扇区代码,它们的工作流程大致如下:
There’s a complication worth mentioning (aka, I told you this thing is hacky). The image for a current Linux kernel, even compressed, does not fit into the 640K of RAM available in real mode. My vanilla Ubuntu kernel is 1.7 MB compressed. Yet the boot loader must run in real mode in order to call the BIOS routines for reading from the disk, since the kernel is clearly not available at that point. The solution is the venerable unreal mode. This is not a true processor mode (I wish the engineers at Intel were allowed to have fun like that), but rather a technique where a program switches back and forth between real mode and protected mode in order to access memory above 1MB while still using the BIOS. If you read GRUB source code, you’ll see these transitions all over the place (look under stage2/ for calls to real_to_prot and prot_to_real). At the end of this sticky process the loader has stuffed the kernel in memory, by hook or by crook, but it leaves the processor in real mode when it’s done.
最后值得一提的是,现在的linux内核即便经过了压缩处理,大小也会超过640K。 我的vanilla Ubuntu经过压缩后的内核大小都足足有1.7M。 然而如今的引导加载程序必须运行在实模式下,这是因为它必须借助BIOS程序来读取磁盘,所以此时内核代码是完全没法用的。 解决的办法是通过利用“unreal mode”的特性。它并非一个真正的处理器运行模式(希望Intel的工程师允许我这么说,自娱自乐呵),不过却允许程序在实模式和保护模式之间来回切换,这样就可以访问超过1M的物理内存,并且依然可以使用BIOS程序。 如果你阅读GRUB的源代码,你就会发现这种切换到处都是(看看stage2/目录下的程序,对real_to_prot和prot_to_real函数的调用)。 经过这个复杂的过程,引导加载程序终于将内核全部加载进内存,最后CPU仍处在实模式运行模式。
We’re now at the jump from “Boot Loader” to “Early Kernel Initialization” as shown in the first diagram. That’s when things heat up as the kernel starts to unfold and set things in motion. The next post will be a guided tour through the Linux Kernel initialization with links to sources at the Linux Cross Reference. I can’t do the same for Windows but I’ll point out the highlights.
从引导加载程序跳转到内核中,内核就会进行一系列初始化操作。 下篇文章我将结合Linux Cross Reference探讨下内核的初始化过程。 对于windows的初始化过程,我会把要点指出来。
参考链接:http://blog.csdn.net/drshenlei/article/details/4250306