推荐下载:《Linux内核修炼之道》精华版之方法论
首先感谢国家。其次感谢上大的钟莉颖,让我知道了大学不仅有校花,还有校鸡,而且很多时候这两者其实没什么差别。最后感谢清华女刘静,让我深刻体会到了素质教育的重要性,让我感到有责任写写子系统的初始化。
各个子系统的初始化是内核整个初始化过程必然要完成的基本任务,这些任务按照固定的模式来处理,可以归纳为两个部分:内核选项的解析以及那些子系统入口(初始化)函数的调用。
内核选项
Linux允许用户传递内核配置选项给内核,内核在初始化过程中调用parse_args函数对这些选项进行解析,并调用相应的处理函数。
parse_args函数能够解析形如“变量名=值”的字符串,在模块加载时,它也会被调用来解析模块参数。
内核选项的使用格式同样为“变量名=值”,打开系统的grub文件,然后找到kernel行,比如:
kernel/boot/vmlinuz-2.6.18 root=/dev/sda1 ro splash=silent vga=0x314 pci=noacpi
其中的“pci=noacpi”等都表示内核选项。
内核选项不同于模块参数,模块参数通常在模块加载时通过“变量名=值”的形式指定,而不是内核启动时。如果希望在内核启动时使用模块参数,则必须添加模块名做为前缀,使用“模块名.参数=值”的形式,比如,使用下面的命令在加载usbcore时指定模块参数autosuspend的值为2。
$ modprobe usbcore autosuspend=2
若是在内核启动时指定,则必须使用下面的形式:
usbcore.autosuspend=2
从Documentation/kernel-parameters.txt文件里可以查询到某个子系统已经注册的内核选项,比如PCI子系统注册的内核选项为:
pci=option[,option...][PCI] various PCI subsystem options:
off[X86-32] don't probe for the PCI bus
bios[X86-32] force use of PCI BIOS, don't access
the hardware directly. Use this if your machine
has a non-standard PCI host bridge.
nobios[X86-32] disallow use of PCI BIOS, only direct
hardware access methods are allowed. Use this
if you experience crashes upon bootup and you
suspect they are caused by the BIOS.
conf1[X86-32] Force use of PCI Configuration
Mechanism 1.
conf2[X86-32] Force use of PCI Configuration
Mechanism 2.
nommconf[X86-32,X86_64] Disable use of MMCONFIG for PCI
Configuration
nomsi[MSI] If the PCI_MSI kernel config parameter is
enabled, this kernel boot option can be used to
disable the use of MSI interrupts system-wide.
nosort[X86-32] Don't sort PCI devices according to
order given by the PCI BIOS. This sorting is
done to get a device order compatible with
older kernels.
biosirq[X86-32] Use PCI BIOS calls to get the interrupt
routing table. These calls are known to be buggy
on several machines and they hang the machine
when used, but on other computers it's the only
way to get the interrupt routing table. Try
this option if the kernel is unable to allocate
IRQs or discover secondary PCI buses on your
motherboard.
rom[X86-32] Assign address space to expansion ROMs.
Use with caution as certain devices share
address decoders between ROMs and other
resources.
irqmask=0xMMMM[X86-32] Set a bit mask of IRQs allowed to be
assigned automatically to PCI devices. You can
make the kernel exclude IRQs of your ISA cards
this way.
pirqaddr=0xAAAAA[X86-32] Specify the physical address
of the PIRQ table (normally generated
by the BIOS) if it is outside the
F0000h-100000h range.
lastbus=N[X86-32] Scan all buses thru bus #N. Can be
useful if the kernel is unable to find your
secondary buses and you want to tell it
explicitly which ones they are.
assign-busses[X86-32] Always assign all PCI bus
numbers ourselves, overriding
whatever the firmware may have done.
usepirqmask[X86-32] Honor the possible IRQ mask stored
in the BIOS $PIR table. This is needed on
some systems with broken BIOSes, notably
some HP Pavilion N5400 and Omnibook XE3
notebooks. This will have no effect if ACPI
IRQ routing is enabled.
noacpi[X86-32] Do not use ACPI for IRQ routing
or for PCI scanning.
routeirqDo IRQ routing for all PCI devices.
This is normally done in pci_enable_device(),
so this option is a temporary workaround
for broken drivers that don't call it.
firmware[ARM] Do not re-enumerate the bus but instead
just use the configuration from the
bootloader. This is currently used on
IXP2000 systems where the bus has to be
configured a certain way for adjunct CPUs.
noearly[X86] Don't do any early type 1 scanning.
This might help on some broken boards which
machine check when some devices' config space
is read. But various workarounds are disabled
and some IOMMU drivers will not work.
bfsortSort PCI devices into breadth-first order.
This sorting is done to get a device
order compatible with older (<= 2.4) kernels.
nobfsortDon't sort PCI devices into breadth-first order.
cbiosize=nn[KMG]The fixed amount of bus space which is
reserved for the CardBus bridge's IO window.
The default value is 256 bytes.
cbmemsize=nn[KMG]The fixed amount of bus space which is
reserved for the CardBus bridge's memory
window. The default value is 64 megabytes.
注册内核选项
就像我们不需要明白钟莉颖是如何走上校鸡的修炼之道,我们也不必理解parse_args函数的实现细节。但我们必须知道如何注册内核选项:模块参数使用module_param系列的宏注册,内核选项则使用__setup宏来注册。
__setup宏在include/linux/init.h文件中定义。
171 #define __setup(str, fn)/
172 __setup_param(str, fn, fn, 0)
__setup需要两个参数,其中str是内核选项的名字,fn是该内核选项关联的处理函数。__setup宏告诉内核,在启动时如果检测到内核选项str,则执行函数fn。str除了包括内核选项名字之外,必须以“=”字符结束。
不同的内核选项可以关联相同的处理函数,比如内核选项netdev和ether都关联了netdev_boot_setup函数。
除了__setup宏之外,还可以使用early_param宏注册内核选项。它们的使用方式相同,不同的是,early_param宏注册的内核选项必须要在其他内核选项之前被处理。
两次解析
相应于__setup宏和early_param宏两种注册形式,内核在初始化时,调用了两次parse_args函数进行解析。
parse_early_param();
parse_args("Booting kernel", static_command_line, __start___param,
__stop___param - __start___param,
&unknown_bootoption);
parse_args的第一次调用就在parse_early_param函数里面,为什么会出现两次调用parse_args的情况?这是因为内核选项又分成了两种,就像现实世界中的我们,一种是普普通通的,一种是有特权的,有特权的需要在普通选项之前进行处理。
现实生活中特权的定义好像很模糊,不同的人有不同的诠释,比如哈医大二院的纪委书记在接受央视的采访“老人住院费550万元”时如是说:“我们就是一所人民医院……就是一所贫下中农的医院,从来不用特权去索取自己身外的任何利益……我们不但没有多收钱还少收了。”
人生就是如此的复杂和奇怪。内核选项相对来说就要单纯得多,特权都是阳光下的,不会藏着掖着,直接使用early_param宏去声明,让你一眼就看出它是有特权的。使用early_param声明的那些选项就会首先由parse_early_param去解析。