sysvinit源码分析 Linux-init-process-analyse

Linux-init-process-analyse

init 进程探悉
前言… 2
INIT配置文件分析 … 4
INIT的官方资料 … 9
INIT命令的手册… 9
配置文件/ETC/INITTAB的手册 … 15
INIT详细分析 … 21
INIT PROCESS是怎么被启动的? … 21
INIT进程分析… 26
init 1 的运行… 27
主流程分析… 27
辅助函数介绍… 54
init 2 的运行… 68
init 3 的运行… 74
主流程分析… 74
辅助函数… 82
后记… 98
联系… 99
附录… 100
环境… 100
INITTAB中ACTION的注解 … 100
关机分析… 101
关机流程介绍 … 101
Shutdown源码 … 106
前言
init是个普通的用户态进程,它是Unix系统内核初始化与用户态初始化的接合点,它是所有process的祖宗。在运行init以前是内核态初始化,该
过程(内核初始化)的最后一个动作就是运行/sbin/init可执行文件。从init process运行开始进入Unix系统的用户态初始化。我对整个系统
初始化的定义是从开机到屏幕上出现登录界面为止。这整个过程被init一分为二。当然init不单单启动了用户态的初始化,而且它在系统运行的
整个期间都扮演着非常重要的角色。比如
ƒ 在运行当中,具有 root 权限的用户可以通过再次运行 init 来切换到不同的运行级别(run level)
ƒ init process 有认领系统中的所有孤儿进程的责任
ƒ 当 root 权限用户想通过按 Ctrl-Alt-Del 三键来重启系统,也是由 init 最终来处理的
ƒ 如果你想要一个 daemon 进程有这样的效果,它在整个系统运行期间一直要运行,即使它由于各种各样的原因(如在某种情况下它出错而退出
了,或被某个用户 kill 掉了)停止运行了,也希望能马上被再次启动(当然不是依靠人力来手工启动),你可以在 init 运行的配置文件中加
入类是与下面的一行:
myrun::ondemand:/home/wzhou/mydaemon
则/home/wzhou/mydaemon 这个脚本只要系统在运行,它必然也在运行。即使有人把它 kill 掉,等一会儿马上又会被 init process 启

ƒ 等等
而这一切都依赖于 init process。

init 配置文件分析
init process的运行完全受其配置文件/etc/inittab的控制,这里分析一下该配置文件。
来个现实系统中的/etc/inittab 配置文件来解释一下。
[wzhou@dcmp10 ~]$ cat /etc/inittab

inittab This file describes how the INIT process should set up

the system in a certain run-level.

Author: Miquel van Smoorenburg, [email protected]

Modified for RHS Linux by Marc Ewing and Donnie Barnes

Default runlevel. The runlevels used by RHS are:

0 - halt (Do NOT set initdefault to this)

1 - Single user mode

2 - Multiuser, without NFS (The same as 3, if you do not have networking)

3 - Full multiuser mode

4 - unused

5 - X11

6 - reboot (Do NOT set initdefault to this)

id:5:initdefault:

System initialization.

si::sysinit:/etc/rc.d/rc.sysinit
l0:0:wait:/etc/rc.d/rc 0
l1:1:wait:/etc/rc.d/rc 1
l2:2:wait:/etc/rc.d/rc 2
l3:3:wait:/etc/rc.d/rc 3
l4:4:wait:/etc/rc.d/rc 4
l5:5:wait:/etc/rc.d/rc 5
l6:6:wait:/etc/rc.d/rc 6

Trap CTRL-ALT-DELETE

ca::ctrlaltdel:/sbin/shutdown -t3 -r now

When our UPS tells us power has failed, assume we have a few minutes

of power left. Schedule a shutdown for 2 minutes from now.

This does, of course, assume you have powerd installed and your

UPS connected and working correctly.

pf::powerfail:/sbin/shutdown -f -h +2 “Power Failure; System Shutting Down”

If power was restored before the shutdown kicked in, cancel it.

pr:12345:powerokwait:/sbin/shutdown -c “Power Restored; Shutdown Cancelled”

Run gettys in standard runlevels

1:2345:respawn:/sbin/mingetty tty1
2:2345:respawn:/sbin/mingetty tty2
3:2345:respawn:/sbin/mingetty tty3
4:2345:respawn:/sbin/mingetty tty4
5:2345:respawn:/sbin/mingetty tty5
6:2345:respawn:/sbin/mingetty tty6

Run xdm in runlevel 5

x:5:respawn:/etc/X11/prefdm -nodaemon
id:5:initdefault:
这一行表示系统启动后将运行在 run level 5,即 X Window 的 Full multiuser mode
si::sysinit:/etc/rc.d/rc.sysinit
sysinit 表示这是用户态系统启动,不管任何运行级别(run level)都要执行脚本/etc/rc.d/rc.sysinit
如果你要追踪操作系统内核态的初始化过程,则要从 init/main.c 中的 start_kernel()开始;而如果你想追踪操作系统用户态的启动过程,
则可以从/etc/rc.d/rc.sysinit 脚本开始。
l0:0:wait:/etc/rc.d/rc 0 如果系统的 run level 是 0,则运行/etc/rc.d/rc 脚本,参数为 0
l1:1:wait:/etc/rc.d/rc 1 如果系统的 run level 是 1,则运行/etc/rc.d/rc 脚本,参数为 1
l2:2:wait:/etc/rc.d/rc 2 如果系统的 run level 是 2,则运行/etc/rc.d/rc 脚本,参数为 2
l3:3:wait:/etc/rc.d/rc 3 如果系统的 run level 是 3,则运行/etc/rc.d/rc 脚本,参数为 3
l4:4:wait:/etc/rc.d/rc 4 如果系统的 run level 是 4,则运行/etc/rc.d/rc 脚本,参数为 4
l5:5:wait:/etc/rc.d/rc 5 如果系统的 run level 是 5,则运行/etc/rc.d/rc 脚本,参数为 5
l6:6:wait:/etc/rc.d/rc 6 如果系统的 run level 是 6,则运行/etc/rc.d/rc 脚本,参数为 6
显然/etc/rc.d/rc 也是个系统初始化的很重要的脚本。上面的 wait action 表示 init process 在启动其他的动作以前,必须等待该行上的动作
所代表的 process 的完成。
ca::ctrlaltdel:/sbin/shutdown -t3 -r now
这一行表示无论在什么 run level,如果 root 用户按了 Ctrl+Alt+Del 三键则运行如下命令:
/sbin/shutdown -t3 -r now
即让 init 进程监视 Ctrl+Alt+Del 事件,一旦收到,它应当运行该命令。shutdown 命令会从现在(now)开始先向系统中的所有进程发
warning,然后等待 3 秒,再杀死进程,让系统重启。

When our UPS tells us power has failed, assume we have a few minutes

of power left. Schedule a shutdown for 2 minutes from now.

This does, of course, assume you have powerd installed and your

UPS connected and working correctly.

pf::powerfail:/sbin/shutdown -f -h +2 “Power Failure; System Shutting Down”
从上面的注释可以知道该行的动作。同样该行是不分 run level 的,只管是否发生“powerfail”的事件。

If power was restored before the shutdown kicked in, cancel it.

pr:12345:powerokwait:/sbin/shutdown -c “Power Restored; Shutdown Cancelled”
在运行级别为 1,2,3,4,5 的情况下,如果发生“powerokwait” action,则运行命令/sbin/shutdown -c “Power Restored; Shutdown
Cancelled”,即取消发出的关机指令。

Run gettys in standard runlevels

1:2345:respawn:/sbin/mingetty tty1
2:2345:respawn:/sbin/mingetty tty2
3:2345:respawn:/sbin/mingetty tty3
4:2345:respawn:/sbin/mingetty tty4
5:2345:respawn:/sbin/mingetty tty5
6:2345:respawn:/sbin/mingetty tty6
上面的 6 行指示 init process 在 run level 是 2,3,4,5 的情况下,运行脚本/sbin/mingetty,并接受不同的参数。这里的功能是在从 tty1
到 tty6 的终端上启动字符登陆界面。
上图中用蓝框围起来的就是启动的 6 个虚拟终端。我用 root 帐号登录在 tty1,所以该终端显示“login – root”,而其他 5 个虚拟终端并没有用户
登录,所以还是由 mingetty 在等待着。

Run xdm in runlevel 5

x:5:respawn:/etc/X11/prefdm -nodaemon
该行表示如果 run level 是 5,则要运行脚本/etc/X11/prefdm –nodaemon,其实就是启动 X Window,进入 GUI 界面。
上面是对 inittab 配置文件的静态的解释,下面解释 init process 依据该配置文件动态运行情况。
„ init process 由“initdefault”知道系统将在 run level 5 下运行
„ init process 首先运行“sysinit”标注的 action,即运行/etc/rc.d/rc.sysinit 脚本
„ 运行 identifier 为“l5”的动作
l5:5:wait:/etc/rc.d/rc 5
由于该行告诉 init process 的反映是“wait”,即在 init process 继续执行 inittab 配置文件中其他 action 以前,必须等待
“/etc/rc.d/rc 5”的结束
„ 接下来执行下面的 6 个 action
1:2345:respawn:/sbin/mingetty tty1
2:2345:respawn:/sbin/mingetty tty2
3:2345:respawn:/sbin/mingetty tty3
4:2345:respawn:/sbin/mingetty tty4
5:2345:respawn:/sbin/mingetty tty5
6:2345:respawn:/sbin/mingetty tty6
由于上面 6 行的 run level 告诉 init process,在 2,3,4,5 之下都要执行这里的命令“/sbin/mingetty tty5”。同时这里的

respawn”表示如果/sbin/mingetty 所代表的 process 不运行了(无论哪种情况,是自己退出或出现问题而 crash),init process 都有
责任让他再次运行。当启动 Linux 后我们通过 Alt-F1,…,Alt-F6 可以切换到相应的终端,就是这几行运行的缘故。另外,当你登录到某个终
端,比如 tty1,然后在命令行上输入 exit,在该终端上又会出现登录界面,这就是 init process 在响应“respawn”动作。当你输入 exit 时,
/sbin/mingetty 代表的 process 退出,被 init process 监控到,马上在该终端上又运行“/sbin/mingetty tty1”,从而在退出的
tty1 上再次出现登录界面。
„ 最后运行的是/etc/X11/prefdm –nodaemon,即启动 X Window 登录。
„ 在配置文件中的下面的配置行并不会执行,但会被 init process 纪录状态。只有当系统出现对应的情况时,才会运行。
ca::ctrlaltdel:/sbin/shutdown -t3 -r now
pf::powerfail:/sbin/shutdown -f -h +2 “Power Failure; System Shutting Down”
pr:12345:powerokwait:/sbin/shutdown -c “Power Restored; Shutdown Cancelled”
比如当 root 用户按了 Ctrl-Alt-Del 键以后,init process 将执行如下命令行“/sbin/shutdown -t3 -r now”进行关机;而当
UPS 报告电源出现故障,马上要断电时,就执行“/sbin/shutdown -f -h +2 “Power Failure; System Shutting Down””;当
UPS 报告从电源故障恢复以后,执行“/sbin/shutdown -c “Power Restored; Shutdown Cancelled””。那么 init process 是
怎么感知这些消息的呢?即该进程怎么知道 root 权限用户按下了 Ctrl-Alt-Del 键,UPS 报告电源出现故障及电源恢复呢?都是通过 Unix
特有的 signal 机制。对 init process 来说,它只要正确处理对应的 signal 就好。
init 的官方资料
init 的作者亲手写了与 init 相关的手册,即 man init 与 man inittab。仔细看看,对理解 init process 有很大帮助。
init 命令的手册
NAME
init, telinit - process control initialization
SYNOPSIS
/sbin/init [ -a ] [ -s ] [ -b ] [ -z xxx ] [ 0123456Ss ]
/sbin/telinit [ -t sec ] [ 0123456sSQqabcUu ]
DESCRIPTION
Init
Init is the parent of all processes. Its primary role is to create processes
from a script stored in the file /etc/inittab (see inittab(5)). This file
usually has entries which cause init to spawn gettys on each line that users
can log in. It also controls autonomous processes required by any particular
system.
RUNLEVELS
A runlevel is a software configuration of the system which allows only a
selected group of processes to exist. The processes spawned by init for each
of these runlevels are defined in the /etc/inittab file. Init can be in one
of eight runlevels: 0–6 and S or s. The runlevel is changed by having a priv-
ileged user run telinit, which sends appropriate signals to init, telling it
which runlevel to change to.
Runlevels 0, 1, and 6 are reserved. Runlevel 0 is used to halt the system,
runlevel 6 is used to reboot the system, and runlevel 1 is used to get the
system down into single user mode. Runlevel S is not really meant to be used
directly, but more for the scripts that are executed when entering runlevel 1.
For more information on this, see the manpages for shutdown(8) and inittab(5).
Runlevels 7-9 are also valid, though not really documented. This is because
“traditional” Unix variants don’t use them. In case you’re curious, runlevels
S and s are in fact the same. Internally they are aliases for the same run-
level.
BOOTING
After init is invoked as the last step of the kernel boot sequence, it looks
for the file /etc/inittab to see if there is an entry of the type initdefault
(see inittab(5)). The initdefault entry determines the initial runlevel of the
system. If there is no such entry (or no /etc/inittab at all), a runlevel
must be entered at the system console.
Runlevel S or s bring the system to single user mode and do not require an
/etc/inittab file. In single user mode, a root shell is opened on /dev/con-
sole.
When entering single user mode, init initializes the consoles stty settings to
sane values. Clocal mode is set. Hardware speed and handshaking are not
changed.
When entering a multi-user mode for the first time, init performs the boot and
bootwait entries to allow file systems to be mounted before users can log in.
Then all entries matching the runlevel are processed.
When starting a new process, init first checks whether the file
/etc/initscript exists. If it does, it uses this script to start the process.
Each time a child terminates, init records the fact and the reason it died in
/var/run/utmp and /var/log/wtmp, provided that these files exist.
CHANGING RUNLEVELS
After it has spawned all of the processes specified, init waits for one of its
descendant processes to die, a powerfail signal, or until it is signaled by
telinit to change the system’s runlevel. When one of the above three condi-
tions occurs, it re-examines the /etc/inittab file. New entries can be added
to this file at any time. However, init still waits for one of the above
three conditions to occur. To provide for an instantaneous response, the
telinit Q or q command can wake up init to re-examine the /etc/inittab file.
If init is not in single user mode and receives a powerfail signal (SIGPWR),
it reads the file /etc/powerstatus. It then starts a command based on the con-
tents of this file:
F(AIL) Power is failing, UPS is providing the power. Execute the powerwait and
powerfail entries.
O(K) The power has been restored, execute the powerokwait entries.
L(OW) The power is failing and the UPS has a low battery. Execute the power-
failnow entries.
If /etc/powerstatus doesn’t exist or contains anything else then the letters
F, O or L, init will behave as if it has read the letter F.
Usage of SIGPWR and /etc/powerstatus is discouraged. Someone wanting to inter-
act with init should use the /dev/initctl control channel - see the source
code of the sysvinit package for more documentation about this.
When init is requested to change the runlevel, it sends the warning signal
SIGTERM to all processes that are undefined in the new runlevel. It then
waits 5 seconds before forcibly terminating these processes via the SIGKILL
signal. Note that init assumes that all these processes (and their descen-
dants) remain in the same process group which init originally created for
them. If any process changes its process group affiliation it will not
receive these signals. Such processes need to be terminated separately.
TELINIT
/sbin/telinit is linked to /sbin/init. It takes a one-character argument and
signals init to perform the appropriate action. The following arguments serve
as directives to telinit:
0,1,2,3,4,5 or 6
tell init to switch to the specified run level.
a,b,c tell init to process only those /etc/inittab file entries having run-
level a,b or c.
Q or q tell init to re-examine the /etc/inittab file.
S or s tell init to switch to single user mode.
U or u tell init to re-execute itself (preserving the state). No re-examining
of /etc/inittab file happens. Run level should be one of Ss12345, oth-
erwise request would be silently ignored.
telinit can also tell init how long it should wait between sending processes
the SIGTERM and SIGKILL signals. The default is 5 seconds, but this can be
changed with the -t sec option.
telinit can be invoked only by users with appropriate privileges.
The init binary checks if it is init or telinit by looking at its process id;
the real init’s process id is always 1. From this it follows that instead of
calling telinit one can also just use init instead as a shortcut.
ENVIRONMENT
Init sets the following environment variables for all its children:
PATH /usr/local/sbin:/sbin:/bin:/usr/sbin:/usr/bin
INIT_VERSION
As the name says. Useful to determine if a script runs directly from
init.
RUNLEVEL
The current system runlevel.
PREVLEVEL
The previous runlevel (useful after a runlevel switch).
CONSOLE
The system console. This is really inherited from the kernel; however
if it is not set init will set it to /dev/console by default.
BOOTFLAGS
It is possible to pass a number of flags to init from the boot monitor (eg.
LILO). Init accepts the following flags:
-s, S, single
Single user mode boot. In this mode /etc/inittab is examined and the
bootup rc scripts are usually run before the single user mode shell is
started.
1-5 Runlevel to boot into.
-b, emergency
Boot directly into a single user shell without running any other startup
scripts.
-a, auto
The LILO boot loader adds the word “auto” to the command line if it
booted the kernel with the default command line (without user interven-
tion). If this is found init sets the “AUTOBOOT” environment variable to
“yes”. Note that you cannot use this for any security measures - of
course the user could specify “auto” or -a on the command line manually.
-z xxx
The argument to -z is ignored. You can use this to expand the command
line a bit, so that it takes some more space on the stack. Init can then
manipulate the command line so that ps(1) shows the current runlevel.
INTERFACE
Init listens on a fifo in /dev, /dev/initctl, for messages. Telinit uses this
to communicate with init. The interface is not very well documented or fin-
ished. Those interested should study the initreq.h file in the src/ subdirec-
tory of the init source code tar archive.
SIGNALS
Init reacts to several signals:
SIGHUP
Has the same effect as telinit q.
SIGUSR1
On receipt of this signals, init closes and re-opens its control fifo,
/dev/initctl. Useful for bootscripts when /dev is remounted.
SIGINT
Normally the kernel sends this signal to init when CTRL-ALT-DEL is
pressed. It activates the ctrlaltdel action.
SIGWINCH
The kernel sends this signal when the KeyboardSignal key is hit. It
activates the kbrequest action.
CONFORMING TO
Init is compatible with the System V init. It works closely together with the
scripts in the directories /etc/init.d and /etc/rc{runlevel}.d. If your sys-
tem uses this convention, there should be a README file in the directory
/etc/init.d explaining how these scripts work.
FILES
/etc/inittab
/etc/initscript
/dev/console
/var/run/utmp
/var/log/wtmp
/dev/initctl
WARNINGS
Init assumes that processes and descendants of processes remain in the same
process group which was originally created for them. If the processes change
their group, init can’t kill them and you may end up with two processes read-
ing from one terminal line.
DIAGNOSTICS
If init finds that it is continuously respawning an entry more than 10 times
in 2 minutes, it will assume that there is an error in the command string,
generate an error message on the system console, and refuse to respawn this
entry until either 5 minutes has elapsed or it receives a signal. This pre-
vents it from eating up system resources when someone makes a typographical
error in the /etc/inittab file or the program for the entry is removed.
AUTHOR
Miquel van Smoorenburg ([email protected]), initial manual page by Michael
Haardt ([email protected]).
SEE ALSO
getty(1), login(1), sh(1), runlevel(8), shutdown(8), kill(1), inittab(5),
initscript(5), utmp(5)
18 April 2003 INIT(8)
配置文件/etc/inittab 的手册
INITTAB(5) Linux System Administrator’s Manual INITTAB(5)
NAME
inittab - format of the inittab file used by the sysv-compatible init process
DESCRIPTION
The inittab file describes which processes are started at bootup and during normal operation
(e.g. /etc/init.d/boot, /etc/init.d/rc, gettys…). Init(8) distinguishes multiple runlevels,
each of which can have its own set of processes that are started. Valid runlevels are 0-6 plus
A, B, and C for ondemand entries. An entry in the inittab file has the following format:
id:runlevels:action:process
Lines beginning with ‘#’ are ignored.
id is a unique sequence of 1-4 characters which identifies an entry in inittab (for ver-
sions of sysvinit compiled with the old libc5 (< 5.2.18) or a.out libraries the limit is
2 characters).
Note: traditionally, for getty and other login processes, the value of the id field is
kept the same as the suffix of the corresponding tty, e.g. 1 for tty1. Some ancient
login accounting programs might expect this, though I can’t think of any.
runlevels
lists the runlevels for which the specified action should be taken.
action describes which action should be taken.
process
specifies the process to be executed. If the process field starts with a ‘+’ character,
init will not do utmp and wtmp accounting for that process. This is needed for gettys
that insist on doing their own utmp/wtmp housekeeping. This is also a historic bug.
The runlevels field may contain multiple characters for different runlevels. For example, 123
specifies that the process should be started in runlevels 1, 2, and 3. The runlevels for onde-
mand entries may contain an A, B, or C. The runlevels field of sysinit, boot, and bootwait
entries are ignored.
When the system runlevel is changed, any running processes that are not specified for the new
runlevel are killed, first with SIGTERM, then with SIGKILL.
Valid actions for the action field are:
respawn
The process will be restarted whenever it terminates (e.g. getty).
wait The process will be started once when the specified runlevel is entered and init will
wait for its termination.
once The process will be executed once when the specified runlevel is entered.
boot The process will be executed during system boot. The runlevels field is ignored.
bootwait
The process will be executed during system boot, while init waits for its termination
(e.g. /etc/rc). The runlevels field is ignored.
off This does nothing.
ondemand1
A process marked with an ondemand runlevel will be executed whenever the specified onde-
mand runlevel is called. However, no runlevel change will occur (ondemand runlevels are
‘a’, ‘b’, and ‘c’).
initdefault
An initdefault entry specifies the runlevel which should be entered after system boot.
If none exists, init will ask for a runlevel on the console. The process field is
ignored.
sysinit
The process will be executed during system boot. It will be executed before any boot or
bootwait entries. The runlevels field is ignored.
powerwait
The process will be executed when the power goes down. Init is usually informed about
this by a process talking to a UPS connected to the computer. Init will wait for the
process to finish before continuing.
powerfail
As for powerwait, except that init does not wait for the process’s completion.
1 Ondemand
与respawn的区别是其与运行级别(run level)无关
powerokwait
This process will be executed as soon as init is informormed that the power has been
restored.
powerfailnow
This process will be executed when init is told that the battery of the external UPS is
almost empty and the power is failing (provided that the external UPS and the monitoring
process are able to detect this condition).
ctrlaltdel
The process will be executed when init receives the SIGINT signal. This means that
someone on the system console has pressed the CTRL-ALT-DEL key combination. Typically
one wants to execute some sort of shutdown either to get into single-user level or to
reboot the machine.
kbrequest
The process will be executed when init receives a signal from the keyboard handler that
a special key combination was pressed on the console keyboard.
The documentation for this function is not complete yet; more documentation can be found
in the kbd-x.xx packages (most recent was kbd-0.94 at the time of this writing). Basi-
cally you want to map some keyboard combination to the “KeyboardSignal” action. For
example, to map Alt-Uparrow for this purpose use the following in your keymaps file:
alt keycode 103 = KeyboardSignal
EXAMPLES
This is an example of a inittab which resembles the old Linux inittab:

inittab for linux

id:1:initdefault:
rc::bootwait:/etc/rc
1:1:respawn:/etc/getty 9600 tty1
2:1:respawn:/etc/getty 9600 tty2
3:1:respawn:/etc/getty 9600 tty3
4:1:respawn:/etc/getty 9600 tty4
This inittab file executes /etc/rc during boot and starts gettys on tty1-tty4.
A more elaborate inittab with different runlevels (see the comments inside):

Level to run in

id:2:initdefault:

Boot-time system configuration/initialization script.

si::sysinit:/etc/init.d/rcS

What to do in single-user mode.

~:S:wait:/sbin/sulogin

/etc/init.d executes the S and K scripts upon change

of runlevel.

Runlevel 0 is halt.

Runlevel 1 is single-user.

Runlevels 2-5 are multi-user.

Runlevel 6 is reboot.

l0:0:wait:/etc/init.d/rc 0
l1:1:wait:/etc/init.d/rc 1
l2:2:wait:/etc/init.d/rc 2
l3:3:wait:/etc/init.d/rc 3
l4:4:wait:/etc/init.d/rc 4
l5:5:wait:/etc/init.d/rc 5
l6:6:wait:/etc/init.d/rc 6

What to do at the “3 finger salute”.

ca::ctrlaltdel:/sbin/shutdown -t1 -h now

Runlevel 2,3: getty on virtual consoles

Runlevel 3: getty on terminal (ttyS0) and modem (ttyS1)

1:23:respawn:/sbin/getty tty1 VC linux
2:23:respawn:/sbin/getty tty2 VC linux
3:23:respawn:/sbin/getty tty3 VC linux
4:23:respawn:/sbin/getty tty4 VC linux
S0:3:respawn:/sbin/getty -L 9600 ttyS0 vt320
S1:3:respawn:/sbin/mgetty -x0 -D ttyS1
FILES
/etc/inittab
AUTHOR
Init was written by Miquel van Smoorenburg ([email protected]). This manual page was written
by Sebastian Lederer ([email protected]) and modified by Michael Haardt
([email protected]).
SEE ALSO
init(8), telinit(8)
init 详细分析
init process 是怎么被启动的?
init process 是 Linux 系统的第一个用户态进程,那自然没有父亲。它是由 Linux 内核直接启动的。
start_kernel()是内核的汇编与C语言的交接点,在该函数以前,内核的代码都是用汇编写的,完成一些最基本的初始化与环境设置工作,比如
内核代码载入内存并解压缩(现在的内核一般都经过压缩),CPU 的最基本初始化,为 C 代码的运行设置环境(C 代码的运行是有一定环境要求
的,比如 stack 的设置等)。这里一个不太确切的比喻是 start_kernel()就像是 C 代码中的 main()。我们知道对应用程序员而言,main()
是他的入口,但实际上程序的入口是被包在了C库中,在链接阶段,linker 会把它链接入你的程序中。而它的任务中有一项就是为 main()准备运
行环境。main()中的 argc,argv 等都不是平白无故来的,都是在调用 main()以前的代码做的准备。
在 start_kernel()中 Linux 将完成整个系统的内核初始化。内核初始化的最后一步就是启动 init 进程这个所有进程的祖先。
Linux-2.6.20/init/main.c
483 asmlinkage void __init start_kernel(void) 该函数是 Linux 内核的入口,其前面的代码都是用汇编编写
484 {
485 char * command_line;
486 extern struct kernel_param __start___param[], __stop___param[];
487
488 smp_setup_processor_id();
489
490 /*
491 * Need to run as early as possible, to initialize the
492 * lockdep hash:
493 /
494 unwind_init();
495 lockdep_init();
496
497 local_irq_disable();
498 early_boot_irqs_off();
499 early_init_irq_lock_class();
500
501 /

502 * Interrupts are still disabled. Do necessary setups, then
503 * enable them
504 /
505 lock_kernel();
506 boot_cpu_init();
507 page_address_init();
508 printk(KERN_NOTICE);
509 printk(linux_banner);
。。。。。。
611 cpuset_init();
612 taskstats_init_early();
613 delayacct_init();
614
615 check_bugs();
616
617 acpi_early_init(); /
before LAPIC and SMP init /
618
619 /
Do the rest non-__init’ed, we’re now alive /
620 rest_init(); 这是 Linux 内核初始化的尾声
621 }
416 static void noinline rest_init(void)
417 __releases(kernel_lock)
418 {
419 kernel_thread(init, NULL, CLONE_FS | CLONE_SIGHAND); 创建一个内核线程,实际上就是内核进程,
420 numa_default_policy(); Linux 内核是不支持类似 Windows NT 一样的
421 unlock_kernel(); 线程概念的。Linux 本质上只支持进程。这里的
422 init 只是一个函数,不要与 init process

423 /
混淆了。该函数见下面。
424 * The boot idle thread must execute schedule()
425 * at least one to get things moving:
426 /
427 preempt_enable_no_resched();
428 schedule();
429 preempt_disable();
430
431 /
Call into cpu_idle with preempt disabled /
432 cpu_idle();
433 }
716 static int init(void * unused) 内核创建的内核线程运行本函数,在本函数的结尾就是启动 init process
717 {
718 lock_kernel();
719 /

720 * init can run on any cpu.
721 /
722 set_cpus_allowed(current, CPU_MASK_ALL);
723 /

724 * Tell the world that we’re going to be the grim
725 * reaper of innocent orphaned children.
726 *
727 * We don’t want people to have to make incorrect
728 * assumptions about where in the task array this
729 * can be found.
730 /
731 init_pid_ns.child_reaper = current;
732
733 cad_pid = task_pid(current);
734
735 smp_prepare_cpus(max_cpus);
736
737 do_pre_smp_initcalls();
738
739 smp_init();
740 sched_init_smp();
741
742 cpuset_init_smp();
743
744 do_basic_setup();
745
746 /

747 * check if there is an early userspace init. If yes, let it do all
748 * the work
749 */
750
751 if (!ramdisk_execute_command)
752 ramdisk_execute_command = “/init”;
753
754 if (sys_access((const char __user ) ramdisk_execute_command, 0) != 0) {
755 ramdisk_execute_command = NULL;
756 prepare_namespace();
757 }
758
759 /

760 * Ok, we have completed the initial bootup, and
761 * we’re essentially up and running. Get rid of the
762 * initmem segments and start the user-mode stuff…
763 */
764 free_initmem();
765 unlock_kernel();
766 mark_rodata_ro();
767 system_state = SYSTEM_RUNNING;
768 numa_default_policy();
769
770 if (sys_open((const char __user ) “/dev/console”, O_RDWR, 0) < 0)
771 printk(KERN_WARNING “Warning: unable to open an initial console.\n”);
772
773 (void) sys_dup(0);
774 (void) sys_dup(0);
775
776 if (ramdisk_execute_command) {
777 run_init_process(ramdisk_execute_command);
778 printk(KERN_WARNING “Failed to execute %s\n”,
779 ramdisk_execute_command);
780 }
781
782 /

783 * We try each of these until one succeeds.
784 *
785 * The Bourne shell can be used instead of init if we are
786 * trying to recover a really broken machine.
787 */
788 if (execute_command) {
789 run_init_process(execute_command);
790 printk(KERN_WARNING “Failed to execute %s. Attempting "
791 “defaults…\n”, execute_command);
792 }
793 run_init_process(”/sbin/init"); run_init_process()实际上是通过嵌入汇编构建一个类似用户态代码
794 run_init_process("/etc/init"); 一样的 sys_execve()调用,其参数就是要执行的可执行文件名,也就
795 run_init_process("/bin/init"); 是这里的 init process 在磁盘上的文件。
796 run_init_process("/bin/sh");
797
798 panic(“No init found. Try passing init= option to kernel.”);
799 }
这里的 run_init_process 就是通过 execve()来运行 init 程序。这里首先运行“/sbin/init”,如果失败再运行“/etc/init”,然后是
“/bin/init”,然后是“/bin/sh”(也就是说,init 可执行文件可以放在上面代码中寻找的 4 个目录中都可以),如果都失败,则可以通过在系统
启动时在添加的启动参数来指定 init,比如 init=/home/wzhou/init。这里是内核初始化结束并开始用户态初始化的阴阳界。
710 static void run_init_process(char init_filename)
711 {
712 argv_init[0] = init_filename;
713 kernel_execve(init_filename, argv_init, envp_init);
714 }
Linux-2.6.20/arch/i386/kernel/sys_i386.c
254 /

255 * Do a system call from kernel instead of calling sys_execve so we
256 * end up with proper pt_regs.
257 */ 构造 sys_execve 系统调用
258 int kernel_execve(const char *filename, char *const argv[], char *const envp[])
259 {
260 long __res;
261 asm volatile (“push %%ebx ; movl %2,%%ebx ; int $0x80 ; pop %%ebx”
262 : “=a” (__res)
263 : “0” (__NR_execve),“ri” (filename),“c” (argv), “d” (envp) : “memory”);
264 return __res;
265 }
init 进程分析
个 init 的代码比较难读,倒不是真的 init process 要完成的工作有多么复杂,在我看来,这复杂大半的原因是设计者自找的。

  1. 行 init 这个第一个用户态的程序(它将作为所有用户态进程的共同祖先),它将
  2. 运行级别(run level)。比如当前运行级别是 3(Console 界面的 Full
    de),而 root
    的 init 命令并不真正运行 run level 切换的工作,只是通过 pipe(管道)把命令打包成 request,然后传递给作为 daemon 进程运行的
  3. 一是监控/etc/inittab 配置文件中的相关命令的执行,二就是通过 pipe(管道)接受 2 中发
    来的切换 run level 的 request(请求)并处理之。

    init 的执行常规分成三种:
    在系统启动阶段,操作系统内核部分初始化阶段的结尾,将运
    依据/etc/inittab 配置文件来对系统进行用户态的初始化。
    在系统运行当中 root 用户可以运行 init 命令把系统切换到不同的
    multiuser mo 想维护系统,他可以运行如下命令:

init 1 切换到 Single user mode,即单用户模式,有点像 Windows 下的安全模式

用户启动
init。
在系统起来以后,init 作为一个 daemon 进程运行,
设计者把上面的功能合三为一,把整个逻辑完全搅和在一起,造成代码的难读。我不敢怀疑代码作者的水平,我只是想这可能是 Linux 下有些开发
者的特点。就像 Linux 之父非常坚定地拒绝把 kernel 级的 debugger 支持引入其一手遮天的官方内核,虽然他有他的理由,但我想很多 Linux
下的内核黑客并不一定认同他的理由,只能无奈的接受他的这个“特点”。
我把 init process 的三种状态分别称为“init 1”,“init 2”,“init 3”,分别对应上面列举的三种状态。如果把这三种状态的 init
混在一块儿说的话,很容易搞成一团乱麻。我在下面把 init 按三种角色来说明,虽然实际上只有一个可执行文件,进程常规情况下是一个,当通
过 init 切换 run level 时会是两个,绝不会是三个。
init 1 的运行
主流程分析
init 1 是由内核启动的,不带任何命令行参数,即直接执行/sbin/init。
2594 /*
2595 * Main entry for init and telinit.
2596 */
2597 int main(int argc, char **argv) 这时 argc
为 1,argv[0] = “/sbin/init”
2598 {
2599 char p;
2600 int f;
2601 int isinit;
2602
2603 /
Get my own name /
2604 if ((p = strrchr(argv[0], ‘/’)) != NULL) argv[0] = /sbin/init
2605 p++;

p 指向 init
2606 else
2607 p = argv[0];
2608 umask(022);
2609
2610 /
Quick check /
2611 if (geteuid() != 0) { 检查是否拥有 root 权限,运行 init 必须拥有该权限,内核当然拥有 root 权限
2612 fprintf(stderr, “%s: must be superuser.\n”, p);
2613 exit(1);
2614 }
2615
2616 /

2617 * Is this telinit or init ?
2618 /
2619 isinit = (getpid() == 1); 内核启动的 init process
的 PID
为 1,满足,即 isinit = true
2620 for (f = 1; f < argc; f++) { init 1 没有命令行参数,所以不尽如循环
2621 if (!strcmp(argv[f], “-i”) || !strcmp(argv[f], “–init”))
2622 isinit = 1;
2623 break;
2624 }
2625 if (!isinit) exit(telinit(p, argc, argv)); init 1 不执行
2626
2627 /

2628 * Check for re-exec
2629 /
2630 if (check_pipe(STATE_PIPE)) { 检查 init 1
与 init 3 之间沟通的 pipe 是否建立,这时是内核启动
2631
的 init,当然还白废待新,什么都没有呢。所以这里 check_pipe()
2632 receive_state(STATE_PIPE); 返回 0,init 0 不会进入该 if 分支,跳到 L2646 执行
2633
2634 myname = istrdup(argv[0]);
2635 argv0 = argv[0];
2636 maxproclen = 0;
2637 for (f = 0; f < argc; f++)
2638 maxproclen += strlen(argv[f]) + 1;
2639 reload = 1;
2640 setproctitle(“init [%c]”,runlevel);
2641
2642 init_main();
2643 }
2644
2645 /
Check command line arguments / 由于启动 init 1 时没有带任何命令行参数,所以不会进入下面的循环
2646 maxproclen = strlen(argv[0]) + 1; 直接跳到 L2666 执行
2647 for(f = 1; f < argc; f++) {
2648 if (!strcmp(argv[f], “single”) || !strcmp(argv[f], “-s”))
2649 dfl_level = ‘S’;
2650 else if (!strcmp(argv[f], “-a”) || !strcmp(argv[f], “auto”))
2651 putenv(“AUTOBOOT=YES”);
2652 else if (!strcmp(argv[f], “-b”) || !strcmp(argv[f],“emergency”))
2653 emerg_shell = 1;
2654 else if (!strcmp(argv[f], “-z”)) {
2655 /
Ignore -z xxx /
2656 if (argv[f + 1]) f++;
2657 } else if (strchr(“0123456789sS”, argv[f][0])
2658 && strlen(argv[f]) == 1)
2659 dfl_level = argv[f][0];
2660 /
“init u” in the very beginning makes no sense /
2661 if (dfl_level == ‘s’) dfl_level = ‘S’;
2662 maxproclen += strlen(argv[f]) + 1;
2663 }
2664
2665 /
Start booting. /
2666 argv0 = argv[0]; 到此,init 1
的 argv0 = /sbin/init
2667 argv[1] = NULL;
2668 setproctitle(“init boot”);
2669 init_main(dfl_level); init 1 调用 init_main(0),dfl_level 被静态的初始化为 0
2670
2671 /NOTREACHED/
2672 return 0;
2673 }
OK,init 1 进入主函数 init_main()。
2340 /

2341 * The main loop
2342 /
2343 int init_main()
2344 {
2345 CHILD ch;
2346 struct sigaction sa;
2347 sigset_t sgt;
2348 pid_t rc;
2349 int f, st;
2350
2351 if (!reload) { init 1 不会修改 reload 值,所以还是 0,所以会进入这里的 if 分支
2352
2353 #if INITDEBUG debug init 用,忽略,debug init 1 可真有点技巧。因为/sbin/init 虽然是普通的用户
2354 /
程序,但别忘了它运行时的时机,debugger 根本还没机会介入呢
2355 * Fork so we can debug the init process.
2356 /
2357 if ((f = fork()) > 0) {
2358 static const char killmsg[] = “PRNT: init killed.\r\n”;
2359 pid_t rc;
2360
2361 while((rc = wait(&st)) != f)
2362 if (rc < 0 && errno == ECHILD)
2363 break;
2364 write(1, killmsg, sizeof(killmsg) - 1);
2365 while(1) pause();
2366 }
2367 #endif
2368
2369 #ifdef linux 因为该 init 的代码,FreeBSD 也会用到,所以用该 Macro 来表示是为 Linux 编译的
2370 /

2371 * Tell the kernel to send us SIGINT when CTRL-ALT-DEL
2372 * is pressed, and that we want to handle keyboard signals.
2373 /
2374 init_reboot(BMAGIC_SOFT); 调用 reboot(BMAGIC_SOFT),使得当按下 CTRL-ALT-DEL 后,,将向 init
process发SIGINT signal,见man 2 reboot
2375 if ((f = open(VT_MASTER, O_RDWR | O_NOCTTY)) >= 0) {
2376 (void) ioctl(f, KDSIGACCEPT, SIGWINCH);
2377 close(f);
2378 } else
2379 (void) ioctl(0, KDSIGACCEPT, SIGWINCH);
2380 #endif
2381
2382 /

2383 * Ignore all signals. 把当前所有 signal handler 都设为 ignore,即不处理该 signals,因为下
2384 / 面要重设这些 signals 的 handler
2385 for(f = 1; f <= NSIG; f++)
2386 SETSIG(sa, f, SIG_IGN, SA_RESTART);
2387 }
2388
背景介绍
对需要特殊处理的 signal 进行设置。
SIGALRM 为超时信号,即设置好时钟,当时钟到时后发该信号
SIGHUP 为连接断开信号,比如你通过 telnet 远程登陆到某台 Linux 机器上,启动 top 程序,然后从 telnet 退出,这是你刚才启动的 top 程序会
收到该信号。
SIGINT 为中断操作信号,当用户按 Ctrl-C 时,前台进程组会收到该信号,系统将会把 CTRL-ALT-DEL 转换成该 signal
SIGCHLD 为当进程被终止或停止时会发该信号给其父进程
SIGPWR 为当电源失效,UPS 开始工作时,系统会发该信号给 init 进程
SIGWINCH(WINdow CHange)不太了了
SIGUSR1 为用户定义信号
SIGSTOP 停止信号
SIGTSTP 交互停止信号,当用户在终端上按 Ctrl-Z 后,当前进程会挂起
SIGCONT 为与 SIGSTOP 相对的 continue 信号
SIGSEGV 段违例,一般指访问了非法内存
当 init process 收到 SIGALRM
SIGHUP
SIGINT
SIGPWR
SIGWINCH
SIGUSR1 时会执行 signal_handler(),signal_handler()只是把收到的 signal 记录在全局变量
got_signals 中了事。
当 init process 收到 SIGCHLD,会执行 chld_handler(),即当 init process 的子进程死亡时,会执行 chld_handler(),取得该
死亡儿子的退出码。
当 init process 收到 SIGSTOP,SIGTSTP,会执行 stop_handler()
当 init process 收到 SIGCONT,会执行 cont_handler()
当 init process 收到 SIGSEGV,会执行 segv_handler(),即 init process 访问非法内存后的处理,只是 init process 睡眠30
秒,然后接着干。一般 process 如果访问非法内存,都会死掉。
2389 SETSIG(sa, SIGALRM, signal_handler, 0); 重设 init process signal handler 关心的
2390 SETSIG(sa, SIGHUP, signal_handler, 0);
2391 SETSIG(sa, SIGINT, signal_handler, 0);
2392 SETSIG(sa, SIGCHLD, chld_handler, SA_RESTART);
2393 SETSIG(sa, SIGPWR, signal_handler, 0);
2394 SETSIG(sa, SIGWINCH, signal_handler, 0);
2395 SETSIG(sa, SIGUSR1, signal_handler, 0);
2396 SETSIG(sa, SIGSTOP, stop_handler, SA_RESTART);
2397 SETSIG(sa, SIGTSTP, stop_handler, SA_RESTART);
2398 SETSIG(sa, SIGCONT, cont_handler, SA_RESTART);
2399 SETSIG(sa, SIGSEGV, (void (
)(int))segv_handler, SA_RESTART);
让我们看看这里注册的几个 signal handler 到底干了点什么?
响应 SIGALRM, , , , , 的是 SIGHUP SIGINT SIGPWR SIGWINCH SIGUSR1 signal_handler。
543 /

544 * We got a signal (HUP PWR WINCH ALRM INT)
545 /
546 void signal_handler(int sig)
547 {
548 ADDSET(got_signals, sig); 对于 HUP PWR WINCH ALRM INT signal 只是记录一下了事,具体处理在
549 } init_main() process_signals() 中的
got_signals 是一个全局变量。
106 sig_atomic_t got_signals; /
Set if we received a signal. */
而 ADDSET()只是用来把发生的 signal 纪录到该全局变量中。
#define ADDSET(set, val) ((set) |= (1 << (val)))
对这些纪录在 got_signals 变量中的 signal 的处理在函数 process_signals()中。
2238 void process_signals() 处理 init process 受到的 , , , , , SIGALRM SIGHUP SIGINT SIGPWR SIGWINCH
2239 { SIGUSR1 signal
2240 CHILD ch;
2241 int pwrstat;
2242 int oldlevel;
2243 int fd;
2244 char c;
2245
2246 if (ISMEMBER(got_signals, SIGPWR)) { 收到过 SIGPWR signal UPS fail ,即 报告电源
2247 INITDBG(L_VB, “got SIGPWR”);
2248 /
See what kind of SIGPWR this is. /
2249 pwrstat = 0;
2250 if ((fd = open(PWRSTAT, O_RDONLY)) >= 0) { 打开/etc/powerstatus 文件,如果该文件存在,则可能是如
2251 c = 0; 下三个字符中的一个:“F”,“O”,“L”,应该是表示收到
2252 read(fd, &c, 1); SIGPWR signal 的原因吧。F 表示 fail,O 表示 OK,L表示 low
2253 pwrstat = c; 把电源 的原因纪录在变量 中 fail pwrstat
2254 close(fd);
2255 unlink(PWRSTAT); 删除/etc/powerstatus 文件
2256 }
2257 do_power_fail(pwrstat); 根据 powerfail 的原因来对 family 链表中的 action 进行处理,而该连表中的
action 是完全按照配置文件 中的“ inittab powerfail”配置来实行的,见下面的
pf::powerfail:/sbin/shutdown -f -h +2 “Power Failure; System Shutting Down”
这里 init process /sbin/shutdown -f -h +2 “Power Failure; System Shutting Down” 就会执行
2258 DELSET(got_signals, SIGPWR); 从 got_signals 全局变量中删去 标志 SIGPWR
2259 }
2260
2261 if (ISMEMBER(got_signals, SIGINT)) { 收到过 SIGINT signal
2262 INITDBG(L_VB, “got SIGINT”);
2263 /
Tell ctrlaltdel entry to start up /
2264 for(ch = family; ch; ch = ch->next)
2265 if (ch->action == CTRLALTDEL)
2266 ch->flags &= ~XECUTED; 允许 Ctrl-Alt-Del handler 的运行
2267 DELSET(got_signals, SIGINT);
2268 }
2269
2270 ISMEMBER(got_signals, SIGWINCH) if ( ) {
2271 INITDBG(L_VB, “got SIGWINCH”);
2272 /
Tell kbrequest entry to start up /
2273 f = family; ch; ch = ch->next) or(ch
2274 if (ch->action == KBREQUEST)
2275 ch->flags &= ~XECUTED;
2276 DELSET(got_signals, SIGWINCH);
2277 }
2278
2279 if (ISMEMBER(got_signals, SIGALRM)) { 对收到的 SIGALRM signal,只是删除标志了事
2280 INITDBG(L_VB, “got SIGALRM”);
2281 /
The timer went off: check it out /
2282 DELSET(got_signals, SIGALRM);
2283 }
2284
2285 ISMEMBER(got_signals, SIGCHLD) if ( ) { 收到 init process 的子进程死亡而发来的消息
2286 INITDBG(L_VB, “got SIGCHLD”);
2287 /
First set flag to 0 /
2288 DELSET(got_signals, SIGCHLD); 首先删除收到该 的标志 signal
2289
2290 /
See which child this was /
2291 for(ch = family; ch; ch = ch->next) 对 family 链表进行枚举,如果发现僵尸,则清除三个标志
2292 if (ch->flags & ZOMBIE) {
2293 INITDBG(L_VB, “Child died, PID= %d”, ch->pid);
2294 ch->flags &= ~(RUNNING|ZOMBIE|WAITING);
2295 if (ch->process[0] != ‘+’)
2296 write_utmp_wtmp("", ch->id, ch->pid, DEAD_PROCESS, NULL);
2297 }
2298
2299 }
2300
2301 ISMEMBER(got_signals, SIGHUP) if ( ) { signal SIGHUP 一般用来通知进程重读其配置文件,这里就是通知 init
2302 INITDBG(L_VB, “got SIGHUP”); process inittab 重新读取 文件中的配置
2303 # ANGE_WAIT if CH
2304 /
Are we waiting for a child? /
2305 for(ch = family; ch; ch = ch->next)
2306 if (ch->flags & WAITING) break;
2307 if (ch == NULL)
2308 #endif
2309 {
2310 /
We need to go into a new runlevel /
2311 oldlevel = runlevel;
2312 #ifdef INITLVL
2313 runlevel = read_level(0);
2314 #endif
2315 if (runlevel == ‘U’) {
2316 runlevel = oldlevel;
2317 re_exec();
2318 } else {
2319 if (oldlevel != ‘S’ && runlevel == ‘S’) console_stty();
2320 if (runlevel == ‘6’ || runlevel == ‘0’ ||
2321 runlevel == ‘1’) console_stty();
2322 read_inittab();
2323 fail_cancel();
2324 setproctitle(“init [%c]”, runlevel);
2325 DELSET(got_signals, SIGHUP);
2326 }
2327 }
2328 }
2329 if (ISMEMBER(got_signals, SIGUSR1)) { 收到用户自定义 ,这里用来重新打开 signal pipe /dev/initctl 文件
2330 /

2331 * SIGUSR1 means close and reopen /dev/initctl
2332 /
2333 INITDBG(L_VB, “got SIGUSR1”);
2334 close(pipe_fd);
2335 pipe_fd = -1;
2336 DELSET(got_signals, SIGUSR1);
2337 }
2338 }
继续刚才未完的 init_main()的执行。
2400
2401 console_init(); 对系统 console 的初始化
2402
2403 if (!reload) { 前面已经分析过,在 init 1 时,该值为 0,也即会进入下面的 if 分支
2404
2405 /
Close wh te files are open, and reset the console. / a ver
2406 close(0); 这里的一系列操作,关闭 0,1,2,调用 setsid()类似于 Linux 下 Daemon 编程的老套路。init 进程也
2407 close(1); 是 daemon process
2408 close(2);
2409 console_stty();
2410 setsid();
2411
2412 /

2413 * Set default PATH variable.
2414 /
2415 putenv(PATH_DFL); 设置 init 1 执行时的环境变量中的 PATH=“PATH=/bin:/usr/bin:/sbin:/usr/sbin”
2416
2417 /

2418 * Initialize /var/run/utmp (only works if /var is on
2419 * root and mounted rw)
2420 /
2421 (void) close(open(UTMP_FILE, O_WRONLY|O_CREAT|O_TRUNC, 0644)); 创建/var/run/utmp 文件
2422
2423 /

2424 * Say hello to the world
2425 /
2426 initlog(L_CO, bootmsg, “booting”);
2427
2428 /

2429 * See if we have to start an emergency shell.
2430 /
2431 if (emerg_shell) { 如果在命令行上指定启动紧急 shell,但 init 1 的命令行是空的,所以这里的 if 分支不会执行到
2432 SETSIG(sa, SIGCHLD, SIG_DFL, SA_RESTART);
2433 i a h_emerg, &f) > 0) { f (sp wn(&c
2434 while((rc = wait(&st)) != f)
2435 if (rc < 0 && errno == ECHILD)
2436 break;
2437 }
2438 SETSIG(sa, SIGCHLD, chld_handler, SA_RESTART);
2439 }
2440
2441 /

2442 * Start normal boot procedure.
2443 /
2444 runlevel = ‘#’; 表示现在还不知道 init 1 将要进入什么运行级别
2445 read_inittab(); 读取/etc/inittab 中的设置,非常关键的函数,见对该函数的注解
2446
2447 } else { re-exec 模式,即 init process 运行期间重读 inittab 配置文件的处理,init 1 不满足该条件,所以跳到 L2455
2448 /

2449 * Restart: unblock signals and let the show go on
2450 /
2451 initlog(L_CO, bootmsg, “reloading”);
2452 sigfillset(&sgt);
2453 sigprocmask(SIG_UNBLOCK, &sgt, NULL);
2454 }
2455 start_if_needed(); 枚举在 family 链表中的代表/etc/inittab 每一行的 action,并启动它们。该函数见辅助函数分析。
作为系统启动阶段,运行 inittab 文件中指定的命令行。
2456
2457 while(1) { 进入 init process 的主循环,init process 以后就在该无限循环中打转,永不出来。这实际上是 init 3,
2458 也即作为 daemon process 运行的 init 的工作了。
2459 /
See if we need to make the boot transitions. /
2460 boot_transitions();
2461 INITDBG(L_VB, “init_main: waiting…”);
2462
2463 /
Check if there are processes to be waited on. /
2464 for(ch = family; ch; ch = ch->next)
2465 if ((ch->flags & RUNNING) && ch->action != BOOT) break;
2466
2467 #if CHANGE_WAIT
2468 /
Wait until we get hit by some signal. /
2469 while (ch != NULL && got_signals == 0) { daemon process 检查是否有关心的事件发生。
2470 if (ISMEMBER(got_signals, SIGHUP)) { 检查发生了 ,即让 SIGHUP init process 重读 inittab 文件的事件吗
2471 /
See if there are processes to be waited on. /
2472 for(ch = family; ch; ch = ch->next) 标志为“wait”的配置行于“boot”配置行类似,不能与其他
2473 if (ch->flags & WAITING) break; 配置行并发执行
2474 }
2475 if (ch != NULL) check_init_fifo();
2476 }
2477 #else /
CHANGE_WAIT /
2478 if (ch != NULL && got_signals == 0) check_init_fifo();
2479 #endif /
CHANGE_WAIT /
check_init_fifo()首先建立 init 2
与 init 3 之间沟通的渠道“/dev/initctl”pipe。
2480
2481 /
Check the ‘failing’ flags /
2482 fail_check();
2483
2484 /
Process any signals. /
2485 process_signals(); 处理被记录下来的 signal,即当 init process 收到 signal,并纪录在 got_signals 变量
2486 后,要在 init process 的主循环中才能执行,所以 init process 对相关 signal 的真正
处理并不是实时的,即受到 signal 就处理,而是要在 L2457 的循环中查询到后才能执行
2487 /
See what we need to start up (again) /
2488 start_if_needed(); 有可能
在 family node 链表中的 状态已经改变,所以重新枚举整个链表,看是否有本来不能运行
2489 } 的动作(action)可以执行了
2490 /NOTREACHED/
2491 }
下面是 init process 及其重要的读取配置文件 inittab 并生成自己管理的
如下结构来管理:
数据结构。Inittab 配置文件的格式已经在上面说明过。、每一个
init process 管理的进程都用
/
Information about a process in the in-core inittab /
typedef struct child {
int flags; /
Status of this entry /
int exstat; /
Exit status of process /
int pid; /
Pid of this process /
time_t tm; /
When respawned last /
int count; /
Times respawned in the last 2 minutes /
char id[8]; /
Inittab id (must be unique) /
char rlevel[12]; /
run levels /
int action; /
what to do (see list below) /
char process[128]; /
The command line */
struct child new; / New entry (after inittab re-read) */
struct child next; / For the linked list */
} CHILD;
该结构中的原有注释写得挺详细的,这里补充几点。
sbin/ming
init process 就会有一个 CH 用上面的信息来填充
些 field。
l[12] action
Inittab 中的配置行大约如下:
3:2345:respawn:/ et

ty tty3
这 ILD
与其对应,
rleve
该结构中的某
id[8] process[128]
3 2345 Respawn /sbin/mingetty tty3
而 flags 反映该 process 的状态
exstat 是该 process 在退出后的退出码,也就是提供给 exit()系统调用的参数
pid process 是该
的 process identifier
t
m
对 respawn 类型的 process 才有意义,即其被 init process respawn 时的时间

)的次数。
init process 把它管理的 process 通过 next 来链接在由全局变量 family 指向的链表中。
count
对 respawn
与 ondemanded 型的 process 有意义,即其在最近 2 分钟内被启动(spawn
next
family
CHILD
Node

CHILD
Node

CHILD
Node

CHILD
Node
N
。。。
read_inittab()就是把
inittab 文件中的每一行生成
family 链表中的一个节点
(node),并用该行上的信息
来填充该节点
id:5:initdefault: CHILD Node ①
si::sysinit:/etc/rc.d/rc.sysinit
CHILD Node ②
l0:0:wait:/etc/rc.d/rc 0 CHILD Node ③
/etc/inittab l1:1:wait:/etc/rc.d/rc 1
配置文件 l2:2:wait:/etc/rc.d/rc 2
l3:3:wait:/etc/rc.d/rc 3
l4:4:wait:/etc/rc.d/rc 4
l5:5:wait:/etc/rc.d/rc 5
l6:6:wait:/etc/rc.d/rc 6
ca::ctrlaltdel:/sbin/shutdown -t3 -r now
3:2345:respawn:/sbin/mingetty tty3 CHILD Node N
1108 void read_inittab(void)
1109 {
1110 FILE fp; / The INITTAB file */
1111 CHILD *ch, *old, i; / Pointers to CHILD structure */
1112 CHILD head = NULL; / Head of linked list /
1113 #ifdef INITLVL
1114 struct stat st; /
To stat INITLVL /
1115 #endif
1116 sigset_t nmask, omask; /
For blocking SIGCHLD. /
1117 char buf[256]; /
Line buffer /
1118 char err[64]; /
Error message. */
1119 char *id, *rlevel,
1120 *action, process; / Fields of a line */
1121 char p;
1122 int lineNo = 0; /
Line number in INITTAB file /
1123 int actionNo; /
Decoded action field /
1124 int f; /
Counter /
1125 int round; /
round 0 for SIGTERM, 1 for SIGKILL /
1126 int foundOne = 0; /
No killing no sleep /
1127 int talk; /
Talk to the user /
1128 int done = 0; /
Ready yet? /
1129
1130 #if DEBUG
1131 if (newFamily != NULL) {
1132 INITDBG(L_VB, “PANIC newFamily != NULL”);
1133 exit(1);
1134 }
1135 INITDBG(L_VB, “Reading inittab”);
1136 #endif
1137
1138 /

1139 * Open INITTAB and real line by line.
1140 /
1141 if ((fp = fopen(INITTAB, “r”)) == NULL) 打开/etc/inittab 文件
1142 initlog(L_VB, “No inittab file found”);
1143
1144 while(!done) { 每循环一次即处理 inittab 中一行,构造 newFamily 链表。注意是 newFamily 链表,不是
1145 /
family 链表
1146 * Add single user shell entry at the end.
1147 /
1148 if (fp == NULL || fgets(buf, sizeof(buf), fp) == NULL) {
1149 done = 1; 已经处理完毕,退出循环
1150 /

1151 * See if we have a single user entry.
1152 /
1153 for(old = newFamily; old; old = old->next) ???
1154 if (strpbrk(old->rlevel, “S”)) break;
1155 if (old == NULL)
1156 snprintf(buf, sizeof(buf), “~~:S:wait:%s\n”, SULOGIN);
1157 else
1158 continue;
1159 }
1160 lineNo++;
1161 /

1162 * Skip comments and empty lines
1163 */
1164 for(p = buf; *p == ’ ’ || p == ‘\t’; p++) 忽略前导空白字符
1165 ;
1166 if (p == ‘#’ || p == ‘\n’) continue; 以“#”开头的为注释,忽略
1167
1168 /

1169 * Decode the fields
1170 /
分解 id:runlevels:action:process 中的 4 部分
1171 id = strsep(&p, “:”); 由于文件中的配置行的各部分用“:”分割,所以这里通过 strsep 来分别提取各部分内
1172 rlevel = strsep(&p, “:”); 容
1173 action = strsep(&p, “:”);
1174 process = strsep(&p, “\n”);
1175
从下面的代码可以看到在 init manual 中没有标明的限制,比如命令行的长度不能太长,超过 127 个字符等
1176 /

1177 * Check if syntax is OK. Be very verbose here, to
1178 * avoid newbie postings on comp.os.linux.setup
1179 /
1180 err[0] = 0;
1181 if (!id || !id) strcpy(err, “missing id field”);
1182 if (!rlevel) strcpy(err, “missing runlevel field”);
1183 if (!process) strcpy(err, “missing process field”);
1184 if (!action || !action)
1185 strcpy(err, “missing action field”);
1186 if (id && strlen(id) > sizeof(utproto.ut_id))
1187 sprintf(err, “id field too long (max %d characters)”,
1188 (int)sizeof(utproto.ut_id));
1189 if (rlevel && strlen(rlevel) > 11)
1190 strcpy(err, “rlevel field too long (max 11 characters)”);
1191 if (process && strlen(process) > 127)
1192 strcpy(err, “process field too long”);
1193 if (action && strlen(action) > 32)
1194 strcpy(err, “action field too long”);
1195 if (err[0] != 0) {
1196 initlog(L_VB, “%s[%d]: %s”, INITTAB, lineNo, err);
1197 INITDBG(L_VB, “%s:%s:%s:%s”, id, rlevel, action, process);
1198 continue;
1199 }
1200
1201 /

1202 * Decode the “action” field
1203 /
init 允许的 action 类型都记录在 actions[]数组中,这里通过比较字符串来把其转换成数字型 identifier
1204 actionNo = -1;
1205 for(f = 0; actions[f].name; f++)
1206 if (strcasecmp(action, actions[f].name) == 0) {
1207 actionNo = actions[f].act;
1208 break;
1209 }
1210 if (actionNo == -1) { 碰到非法的 action(不在 actions[]数组中的)则忽略
1211 initlog(L_VB, “%s[%d]: %s: unknown action field”,
1212 INITTAB, lineNo, action);
1213 continue;
1214 }
1215
1216 /

1217 * See if the id field is unique
1218 /
配置行中的第一部分是所谓 identifier,必须唯一,但命名好像没有任何规定,可任意。已经处理过的配置行都被记录入 CHILD 的链表节点
中,这里在处理当前行时检查一下已有节点中是否有与当前行的 id 相同的,如果有,则不是忽略该行,而是停止继续处理/etc/inittab 文件,
可见 id 的唯一性是至关重要的
1219 for(old = newFamily; old; old = old->next) {
1220 if(strcmp(old->id, id) == 0 && strcmp(id, “~~”)) {
1221 initlog(L_VB, “%s[%d]: duplicate ID field “%s””,
1222 INITTAB, lineNo, id);
1223 break;
1224 }
1225 }
1226 if (old) continue;
1227
1228 /

1229 * Allocate a CHILD structure
1230 /
1231 ch = imalloc(sizeof(CHILD)); 为当前配置行分配一个 CHILD node
1232
1233 /

1234 * And fill it in. 用当前配置行中的信息来填充 CHILD node
1235 /
1236 ch->action = actionNo; action 类型
1237 strncpy(ch->id, id, sizeof(utproto.ut_id) + 1); /
Hack for different libs. /该行的唯一标示符
1238 strncpy(ch->process, process, sizeof(ch->process) - 1); 该行是要执行的命令行
1239 if (rlevel[0]) { 填上 run level
1240 for(f = 0; f < sizeof(rlevel) - 1 && rlevel[f]; f++) {
1241 ch->rlevel[f] = rlevel[f];
1242 if (ch->rlevel[f] == ‘s’) ch->rlevel[f] = ‘S’;
1243 }
1244 strncpy(ch->rlevel, rlevel, sizeof(ch->rlevel) - 1);
1245 } else { 如果没有写 run level,则表示所有 run level 都要执行该行的 process 部分
1246 strcpy(ch->rlevel, “0123456789”);
1247 if (ISPOWER(ch->action))
1248 strcpy(ch->rlevel, “S0123456789”);
1249 }
下面是对 action 的处理
1250 /

1251 * We have the fake runlevel ‘#’ for SYSINIT and
1252 * '
’ for BOOT and BOOTWAIT.
1253 /
从上面的注释看,SYSINIT action 用’#‘表示,而 BOOT action 用’
‘表示。而真正合法的 run level 是 0 到 9 加上’S’
“#”与“
”表示在任何 run level 都要执行,另外 SYSINIT 的优先级是最高的,所以它应该比 BOOT 中的 action 先执行
1254 if (ch->action == SYSINIT) strcpy(ch->rlevel, “#”);
1255 if (ch->action == BOOT || ch->action == BOOTWAIT)
1256 strcpy(ch->rlevel, "
");
1257
1258 /

1259 * Now add it to the linked list. Special for powerfail.
1260 /
在从/etc/inittab 中读取配置行并生成的链表的头为 newFamily。如果是系统的启动阶段,family 所表示的链表自然为空,如果只是通过运
行 init 来切换 run level 等,则 family 记录的链表非空,也就是当前 init 通过上次读取/etc/inittab 后生成的链表。
1261 if (ISPOWER(ch->action)) { 如果 action 属于这几种(POWERWAIT,POWERFAIL,POWEROKWAIT,
1262 POWERFAILNOW,CTRLALTDEL),即与电源相关与用户按了 Ctrl+Alt+Del 键
1263 /

1264 * Disable by default
1265 /
1266 ch->flags |= XECUTED; 设置不执行标志。在 startup()种如果检测到该配置行所代表的 action 的 flag 设置
1267 了 XECUTED,则忽略对该行的处理。这可以理解,因为符合 ISPOWER()的 action 都不是
1268 /
在正常情况下需要运行的,只有对应的事件确实发生了,才需要执行。比如,如果用户从
来没有按下过 Ctrl+Alt+Del 键,自然根本不需要执行/etc/inittab 中 CTRLALTDEL
Action 指定的 process。所以在默认情况下它是 disable 的(通过设置 XECUTED 标
志)。当检测到按下 Ctrl+Alt+Del 键后,才需要 enable。
并且上述 action 被插在 family 链表的前面,这样如果它们需要执行的话,将先得到执
行。可以理解,因为这几个动作都比较严重,所以优先权较高
1269 * Tricky: insert at the front of the list…
1270 /
1271 old = NULL;
1272 for(i = newFamily; i; i = i->next) {
1273 if (!ISPOWER(i->action)) break;
1274 old = i;
1275 }
1276 /

1277 * Now add after entry “old”
1278 /
1279 if (old) {
1280 ch->next = i;
1281 old->next = ch;
1282 if (i == NULL) head = ch;
1283 } else {
1284 ch->next = newFamily;
1285 newFamily = ch;
1286 if (ch->next == NULL) head = ch;
1287 }
1288 } else { 其他的 action 都插到尾部,KBREQUEST 默认是不执行的。从 init manual 上看 SIGWINCH signal 会触发该动作
1289 /

1290 * Just add at end of the list
1291 /
1292 if (ch->action == KBREQUEST) ch->flags |= XECUTED;
1293 ch->next = NULL;
1294 if (head)
1295 head->next = ch;
1296 else
1297 newFamily = ch;
1298 head = ch;
1299 }
1300
1301 /

1302 * Walk through the old list comparing id fields
1303 /
1304 for(old = family; old; old = old->next)
1305 if (strcmp(old->id, ch->id) == 0) {
1306 old->new = ch;
1307 break;
1308 }
到这里处理一行结束
1309 }
1310 /

1311 * We’re done.
1312 /
1313 if (fp) fclose(fp); 关闭/etc/inittab 文件,对于 init 1,实际上基本上到此为止。
1314 下面都是作为 daemon 进程的 init 即 init 3 的处理。我觉得整个代码应该整理得更清晰一点,由
kernel 启动 init(init 1)与作为 daemon 进程运行的 init(init 3)的逻辑应该分开,而不要像现
在一样,绞和在一块,比较乱
1315 /

1316 * Loop through the list of children, and see if they need to
1317 * be killed.
1318 /
1319
1320 INITDBG(L_VB, “Checking for children to kill”);
1321 for(round = 0; round < 2; round++) {
1322 talk = 1;
1323 for(ch = family; ch; ch = ch->next) { 由于是系统启动阶段运行 init(init 1),则这时的 family 链表为空,应该
1324 ch->flags &= ~KILLME; 不进入循环,则这时 round = 0,talk = 1,foundOne = 0,代码应该跳转
1325 到 L1393 行执行。
1326 /

1327 * Is this line deleted?
1328 /
1329 if (ch->new == NULL) ch->flags |= KILLME;
1330
1331 /

1332 * If the entry has changed, kill it anyway. Note that
1333 * we do not check ch->process, only the “action” field.
1334 * This way, you can turn an entry “off” immediately, but
1335 * changes in the command line will only become effective
1336 * after the running version has exited.
1337 /
1338 if (ch->new && ch->action != ch->new->action) ch->flags |= KILLME;
1339
1340 /

1341 * Only BOOT processes may live in all levels
1342 /
1343 if (ch->action != BOOT &&
1344 strchr(ch->rlevel, runlevel) == NULL) {
1345 /

1346 * Ondemand procedures live always,
1347 * except in single user
1348 /
1349 if (runlevel == ‘S’ || !(ch->flags & DEMAND))
1350 ch->flags |= KILLME;
1351 }
1352
1353 /

1354 * Now, if this process may live note so in the new list
1355 /
1356 if ((ch->flags & KILLME) == 0) {
1357 ch->new->flags = ch->flags;
1358 ch->new->pid = ch->pid;
1359 ch->new->exstat = ch->exstat;
1360 continue;
1361 }
1362
1363
1364 /

1365 * Is this process still around?
1366 /
1367 if ((ch->flags & RUNNING) == 0) {
1368 ch->flags &= ~KILLME;
1369 continue;
1370 }
1371 INITDBG(L_VB, “Killing “%s””, ch->process);
1372 switch(round) {
1373 case 0: /
Send TERM signal /
1374 if (talk)
1375 initlog(L_CO,
1376 “Sending processes the TERM signal”);
1377 kill(-(ch->pid), SIGTERM);
1378 foundOne = 1;
1379 break;
1380 case 1: /
Send KILL signal and collect status /
1381 if (talk)
1382 initlog(L_CO,
1383 “Sending processes the KILL signal”);
1384 kill(-(ch->pid), SIGKILL);
1385 break;
1386 }
1387 talk = 0;
1388
1389 }
1390 /

1391 * See if we have to wait 5 seconds
1392 /
1393 if (foundOne && round == 0) { 对于系统启动阶段运行的 init(init 1),round = 0,但 foundOne = 0,所以
1394 /
将不会进入该 if 分支,跳转到 L1419 执行
1395 * Yup, but check every second if we still have children.
1396 /
1397 for(f = 0; f < sltime; f++) {
1398 for(ch = family; ch; ch = ch->next) {
1399 if (!(ch->flags & KILLME)) continue;
1400 if ((ch->flags & RUNNING) && !(ch->flags & ZOMBIE))
1401 break;
1402 }
1403 if (ch == NULL) {
1404 /

1405 * No running children, skip SIGKILL
1406 /
1407 round = 1;
1408 foundOne = 0; /
Skip the sleep below. /
1409 break;
1410 }
1411 do_sleep(1);
1412 }
1413 }
1414 }
1415
1416 /

1417 * Now give all processes the chance to die and collect exit statuses.
1418 /
1419 if (foundOne) do_sleep(1); init 1 运行时,foundOne = 0,所以不会睡眠一秒
1420 for(ch = family; ch; ch = ch->next) 这时的 family 链表为空,不进入该循环,跳转到 L1437 执行
1421 if (ch->flags & KILLME) {
1422 if (!(ch->flags & ZOMBIE))
1423 initlog(L_CO, “Pid %d [id %s] seems to hang”, ch->pid,
1424 ch->id);
1425 else {
1426 INITDBG(L_VB, “Updating utmp for pid %d [id %s]”,
1427 ch->pid, ch->id);
1428 ch->flags &= ~RUNNING;
1429 if (ch->process[0] != ‘+’)
1430 write_utmp_wtmp("", ch->id, ch->pid, DEAD_PROCESS, NULL);
1431 }
1432 }
1433
1434 /

1435 * Both rounds done; clean up the list.
1436 /
1437 sigemptyset(&nmask);
1438 sigaddset(&nmask, SIGCHLD);
1439 sigprocmask(SIG_BLOCK, &nmask, &omask);
1440 for(ch = family; ch; ch = old) { init 1 运行时,family 链表还为空,所以不进入循环,跳转到 L1444
1441 old = ch->next;
1442 free(ch);
1443 }
1444 family = newFamily; newFamily 中就是在本函数头上读入的当前的/etc/inittab 配置行生成的链表,现在才
把它赋给 family
1445 for(ch = family; ch; ch = ch->next) ch->new = NULL;
1446 newFamily = NULL;
1447 sigprocmask(SIG_SETMASK, &omask, NULL);
1448
1449 #ifdef INITLVL
1450 /

1451 * Dispose of INITLVL file.
1452 / 删除/etc/initrunlvl,根据是文件还是 symbol link,删除方式是不一样的
1453 if (lstat(INITLVL, &st) >= 0 && S_ISLNK(st.st_mode)) { 检查/etc/initrunlvl 是文件还是 symbol link
1454 /

1455 * INITLVL is a symbolic link, so just truncate the file.
1456 /
1457 close(open(INITLVL, O_WRONLY|O_TRUNC));
1458 } else {
1459 /

1460 * Delete INITLVL file.
1461 /
1462 unlink(INITLVL);
1463 }
1464 #endif
1465 #ifdef INITLVL2
1466 /

1467 * Dispose of INITLVL2 file.
1468 / 删除/var/log/initrunlvl,根据是文件还是 symbol link,删除方式是不一样的
1469 if (lstat(INITLVL2, &st) >= 0 && S_ISLNK(st.st_mode)) {
1470 /

1471 * INITLVL2 is a symbolic link, so just truncate the file.
1472 /
1473 close(open(INITLVL2, O_WRONLY|O_TRUNC));
1474 } else {
1475 /

1476 * Delete INITLVL2 file.
1477 /
1478 unlink(INITLVL2);
1479 }
1480 #endif
1481 }
从上面的代码分析可看出,对 init 1 而言,read_inittab()的处理到 L1313 行实际上就结束了,剩下的处理都是为作为 daemon 运行的
init 3 来服务的(将在分析 init 3 的运行时对其详细分析),它的功能就是读取 inittab 配置文件中的行,然后构建 CHILD node 构成的
family 链表,即把硬盘上的 init process 要实现的功能的配置转换成内存中的实现配置,以便 start_if_needed()根据该内存中的配置来
执行。
1483 /

1484 * Walk through the family list and start up children.
1485 * The entries that do not belong here at all are removed
1486 * from the list.
1487 */
从上面的注释可看出本函数的功能就是枚举 family 链表,然后执行每个节点上指定的动作,即执行 CHILD 节点中 process[128]中指定的命令

1488 void start_if_needed(void)
1489 {
1490 CHILD ch; / Pointer to child /
1491 int delete; /
Delete this entry from list? /
1492
1493 INITDBG(L_VB, “Checking for children to start”);
通过 read_inittab(),所有/etc/inittab 配置文件中的合法的每一行都被用 family 链表中的一个 node 表示,现在枚举该链表
1494
1495 for(ch = family; ch; ch = ch->next) {
1496
1497 #if DEBUG
1498 if (ch->rlevel[0] == ‘C’) {
1499 INITDBG(L_VB, “%s: flags %d”, ch->process, ch->flags);
1500 }
1501 #endif
1502
1503 /
Are we waiting for this process? Then quit here. /
1504 if (ch->flags & WAITING) break; WAITING 标志表示 init process 必须等待该启动的子进程退出后才能
1505 继续下面的工作 ,即可能后面的工作依赖于当前的 process,所以必须
等待其完成后才能开始新的工作
1506 /
Already running? OK, don’t touch it /
1507 if (ch->flags & RUNNING) continue; 该 node 所代表的 process 处于 running 状态,当然不需要做什么
1508
1509 /
See if we have to start it up */
1510 delete = 1; 默认置删除标志
1511 if (strchr(ch->rlevel, runlevel) || runlevel 变量记录着当前的 run level,对于在该 node 中包含
了当前运行级别的 node 或者是 DEMAND(按需启动)或者
“#*Ss”(代表 sysinit,boot 等)则启动该节点所代表的
action。(“#*Ss”部分 run level 都要执行)
1512 ((ch->flags & DEMAND) && !strchr("#Ss", runlevel))) {
1513 startup(ch); 与执行该命令,具体间辅助函数介绍
1514 delete = 0; 启动了,当然不需要删除该 node,则 clear 该标志
1515 }
1516
1517 if (delete) { 没有启动,表明该 node 不符合合法的启动 level
1518 /
FIXME: is this OK? /
1519 ch->flags &= ~(RUNNING|WAITING);
1520 if (!ISPOWER(ch->action) && ch->action != KBREQUEST)
1521 ch->flags &= ~XECUTED;
1522 ch->pid = 0;
1523 } else
1524 /
Do we have to wait for this process? /
1525 if (ch->flags & WAITING) break;
1526 }
1527 /
Done. /
1528 }
辅助函数介绍
startup函数
1063 /

1064 * Start a child running!
1065 */
执行 CHILD 节点所代表的配置行上的命令行,比如 l2:2:wait:/etc/rc.d/rc 2 配置行,就是执行这里的“/etc/rc.d/rc 2”,
“/etc/rc.d/rc”是执行的可执行脚本,而“2”是该脚本的参数
1066 void startup(CHILD ch)
1067 {
1068 /

1069 * See if it’s disabled
1070 /
1071 if (ch->flags & FAILING) return;
1072
1073 switch(ch->action) {
1074
1075 case SYSINIT:
1076 case BOOTWAIT:
1077 case WAIT:
1078 case POWERWAIT:
1079 case POWERFAILNOW:
1080 case POWEROKWAIT:
1081 case CTRLALTDEL:
1082 if (!(ch->flags & XECUTED)) ch->flags |= WAITING;
对于上面的几种 action,如果已经允许执行(没有设 XECUTED 标
志),则会设置 WAITING 标志,由于这里没有 break,所以将执行 L1091
行的 spawn(),即启动该命令。之所以这里要设 WAITING 标志,是因为
上面的类型的动作都具有排它性。
1083 case KBREQUEST:
1084 case BOOT:
1085 case POWERFAIL:
1086 case ONCE:
1087 if (ch->flags & XECUTED) break; XECUTED 是暂时禁止执行的标志
1088 case ONDEMAND:
1089 case RESPAWN:
1090 ch->flags |= RUNNING;
1091 if (spawn(ch, &(ch->pid)) < 0) break; 启动该脚本,ch->pid 为返回的 pid,spawn()见辅助
1092 /
函数说明
1093 * Do NOT log if process field starts with ‘+’
1094 * FIXME: that’s for compatibility with very
1095 * old getties - probably it can be taken out.
1096 /
1097 if (ch->process[0] != ‘+’)
1098 write_utmp_wtmp("", ch->id, ch->pid,
1099 INIT_PROCESS, “”);
1100 break;
1101 }
1102 }
spawn函数
该函数是真正的去 launch
在 CHILD 中指定的命令行(process[128])
823 /

824 * Fork and execute.
825 *
826 * This function is too long and indents too deep.
827 *
828 */ 启动 ch 所对应的配置行上的命令,*res 为启动的这个新进程的 pid
下面以启动/etc/inittab 中的如下配置行为例:
si::sysinit:/etc/rc.d/rc.sysinit
则 CHILD *ch 的内容如下
ch->id = “si”
ch->rlevel = 0123456S
ch->action = SYSINIT
ch->process[128] = /etc/rc.d/rc.sysinit
829 int spawn(CHILD *ch, int *res)
830 {
831 char args[16]; / Argv array /
832 char buf[136]; /
Line buffer /
833 int f, st, rc; /
Scratch variables */
834 char ptr; / Ditto /
835 time_t t; /
System time /
836 int oldAlarm; /
Previous alarm value */
837 char proc = ch->process; / Command line /
838 pid_t pid, pgrp; /
child, console process group. /
839 sigset_t nmask, omask; /
For blocking SIGCHLD /
840 struct sigaction sa;
841
842 res = -1;
843 buf[sizeof(buf) - 1] = 0;
844
845 /
Skip ‘+’ if it’s there /
846 if (proc[0] == ‘+’) proc++;
847
848 ch->flags |= XECUTED; 当前正要启动的 ch,设置禁止启动标志,防止再进入
849
下面是根据不同的情况来拼装命令行字符串
init process 将监控 action 为“RESPAWN”与“ONDEMAND”类型的命令,如果其不运行了,则要启动它,使它在整个系统运行期间一直运
行。下面的处理是为了防止由于该命令的不正常死亡,导致 init process 在短时间内太过频繁的启动。如果在 2 分钟内启动超过 10 次,则先要
把该命令凉在一边 5 分钟,然后再启动。
850 if (ch->action == RESPAWN || ch->action == ONDEMAND) {
851 /
Is the date stamp from less than 2 minutes ago? /
852 time(&t); 取得当前系统时间,ch->tm 为上次该 process 启动(spawn)时的时间
853 if (ch->tm + TESTTIME > t) { TESTTIME 为 2 分钟,即从上次启动到现在没到 2 分钟,则累加被启动的次数
854 ch->count++;
855 } else {
856 ch->count = 0; 超过 2 分钟则重新计数在 2 分钟内被启动次数
857 ch->tm = t; 重设时间
858 }
859
860 /
Do we try to respawn too fast? /
861 if (ch->count >= MAXSPAWN) { 启动得太频繁了,2 分钟内被启动了 10 次(说明该程序在 2 分钟内死了超过 10 次)设置
862 FAILING 标志
863 initlog(L_VB,
864 “Id “%s” respawning too fast: disabled for %d minutes”,
865 ch->id, SLEEPTIME / 60);
866 ch->flags &= ~RUNNING;
867 ch->flags |= FAILING; 临时设上 FAILING 标志
868
869 /
Remember the time we stopped /
870 ch->tm = t;
871
872 /
Try again in 5 minutes /
873 oldAlarm = alarm(0); 取消 alarm,返回剩余时间
874 if (oldAlarm > SLEEPTIME || oldAlarm <= 0) oldAlarm = SLEEPTIME; 最多暂时 disable 5 分钟
875 alarm(oldAlarm); 启动 alarm,最多等待 5 分钟
876 return(-1); 没有启动就返回了(-1)
877 }
878 }
879
880 /
See if there is an “initscript” (except in single user mode). /
881 if (access(INITSCRIPT, R_OK) == 0 && runlevel != ‘S’) { 文件/etc/initscript 可读吗,实际上就是是否存在
882 /
Build command line using “initscript” / 如果/etc/initscript 存在,并且当前 run level
883 args[1] = SHELL; 不为 Single Mode(单用户模式)
884 args[2] = INITSCRIPT;
885 args[3] = ch->id;
为 L1045 行的 execvp()调用准备参数
886 args[4] = ch->rlevel; 这里执行的命令大致是这样的
887 args[5] = “unknown”; execvp(“/bin/sh”,args + 1),
而 args[1]=“/bin/sh”
888 for(f = 0; actions[f].name; f++) { args[2] = “/etc/initscript”
889 if (ch->action == actions[f].act) { args[3] = “si”
890 args[5] = actions[f].name; args[4] = “0123456S”
891 break; args[5]=“ sysinit”
892 } args[6]=“/etc/rc.d/rc.sysinit”
893 } args[7]= NULL
894 args[6] = proc;
895 args[7] = NULL;
896 } else if (strpbrk(proc, "~!$^&*()=|\\{}[];\"'<>?")) { 在命令行上查找以“~!$^&
()=|\{}[];"’<>?”
897 /
See if we need to fire off a shell for this command / 中任何一个字符匹配的第一个字符的位置
898 /
Give command line to shell /
899 args[1] = SHELL; 运行/bin/sh –c exec /etc/rc.d/rc.sysinit
900 args[2] = “-c”;
901 strcpy(buf, "exec ");
902 strncat(buf, proc, sizeof(buf) - strlen(buf) - 1);
903 args[3] = buf;
904 args[4] = NULL;
905 } else {
906 /
Split up command line arguments /
907 buf[0] = 0;
908 strncat(buf, proc, sizeof(buf) - 1);
909 ptr = buf;
910 for(f = 1; f < 15; f++) {
911 /
Skip white space */
912 while(*ptr == ’ ’ || ptr == ‘\t’) ptr++;
913 args[f] = ptr;
914
915 /
May be trailing space… */
916 if (ptr == 0) break;
917
918 /
Skip this `word’ */
919 while(*ptr && *ptr != ’ ’ && *ptr != ‘\t’ && ptr != ‘#’)
920 ptr++;
921
922 /
If end-of-line, break */
923 if (*ptr == ‘#’ || *ptr == 0) {
924 f++;
925 ptr = 0;
926 break;
927 }
928 /
End word with \0 and continue /
929 ptr++ = 0;
930 }
931 args[f] = NULL;
932 }
933 args[0] = args[1];
934 while(1) {
935 /

936 * Block sigchild while forking.
937 /
938 sigemptyset(&nmask);
939 sigaddset(&nmask, SIGCHLD);
940 sigprocmask(SIG_BLOCK, &nmask, &omask);
按 daemon 进程方式来启动进程
941
942 if ((pid = fork()) == 0) { init process 用它的子进程来运行启动的命令
943
944 close(0); 关闭 file handle 0,1,2,即 STDIN,STDOUT,STDERR
945 close(1);
946 close(2);
947 if (pipe_fd >= 0) close(pipe_fd);
948
949 sigprocmask(SIG_SETMASK, &omask, NULL);
950
951 /

952 * In sysinit, boot, bootwait or single user mode:
953 * for any wait-type subprocess we force the console
954 * to be its controlling tty.
955 /
956 if (strchr("
#sS", runlevel) && ch->flags & WAITING) {
957 /

958 * We fork once extra. This is so that we can
959 * wait and change the process group and session
960 * of the console after exit of the leader.
961 /
962 setsid();
963 if ((f = console_open(O_RDWR|O_NOCTTY)) >= 0) {
964 /
Take over controlling tty by force /
965 (void)ioctl(f, TIOCSCTTY, 1);
966 dup(f);
967 dup(f);
968 }
969 if ((pid = fork()) < 0) { fork 失败,则退出,原始 init process 当然不会退出,这里退出的
970 initlog(L_VB, “cannot fork”); 是其子进程,该进程由 L942
的 fork()产生
971 exit(1);
972 }
973 if (pid > 0) { 父进程等待子进程(通过 waitpid 系统调用),也就是执行命令行的子进程返回
974 /

975 * Ignore keyboard signals etc.
976 * Then wait for child to exit.
977 /
978 SETSIG(sa, SIGINT, SIG_IGN, SA_RESTART);
979 SETSIG(sa, SIGTSTP, SIG_IGN, SA_RESTART);
980 SETSIG(sa, SIGQUIT, SIG_IGN, SA_RESTART);
981 SETSIG(sa, SIGCHLD, SIG_DFL, SA_RESTART);
982
983 while ((rc = waitpid(pid, &st, 0)) != pid)
984 if (rc < 0 && errno == ECHILD)
985 break;
986
987 /

988 * Small optimization. See if stealing
989 * controlling tty back is needed.
990 /
991 pgrp = tcgetpgrp(f);
992 if (pgrp != getpid())
993 exit(0);
994
995 /

996 * Steal controlling tty away. We do
997 * this with a temporary process.
998 /
999 if ((pid = fork()) < 0) {
1000 initlog(L_VB, “cannot fork”);
1001 exit(1);
1002 }
1003 if (pid == 0) {
1004 setsid();
1005 (void)ioctl(f, TIOCSCTTY, 1);
1006 exit(0);
1007 }
1008 while((rc = waitpid(pid, &st, 0)) != pid)
1009 if (rc < 0 && errno == ECHILD)
1010 break;
1011 exit(0);
1012 }
1013
1014 /
Set ioctl settings to default ones /
1015 console_stty();
1016
1017 } else { 子进程,命令行将在该 process
的 context 内执行
1018 setsid();
1019 if ((f = console_open(O_RDWR|O_NOCTTY)) < 0) {
1020 initlog(L_VB, “open(%s): %s”, console_dev,
1021 strerror(errno));
1022 f = open("/dev/null", O_RDWR);
1023 }
1024 dup(f); 由于在 L944,L945,L946 上关闭了 0,1,2
的 handle,所以自然子进程也继承了父进程的
1025 dup(f); handle,即这时候没有 0,1,2 handle。这里打开 system console 作为新的 1,2,3
Handle。这样当调用 console_open()打开/dev/console 时返回的 f 是 0,而 L1024 则置
当前 process
的 1 handle
为/dev/console,L1025 则置当前 process
的 2 handle

/dev/console。这样 STDIN(标准输入),STDOUT(标准输出),STDERR(标准错误输出)都
被正确设置。
注意点:
在 Unix 环境下 ,启动任何程序后,其默认的前 3 个 handle(0,1,2)都已经设定,它的根源就
在这里。
1026 }
1027
1028 /
Reset all the signals, set up environment /
1029 for(f = 1; f < NSIG; f++) SETSIG(sa, f, SIG_DFL, SA_RESTART);
把该 process 的 signal 都设为忽略
1030 environ = init_buildenv(1); 建立命令行执行的环境
1031
1032 /

1033 * Execute prog. In case of ENOEXEC try again
1034 * as a shell script.
1035 */
1036 execvp(args[1], args + 1); 执行命令的调用
1037 if (errno == ENOEXEC) {
1038 args[1] = SHELL;
1039 args[2] = “-c”;
1040 strcpy(buf, "exec ");
1041 strncat(buf, proc, sizeof(buf) - strlen(buf) - 1);
1042 args[3] = buf;
1043 args[4] = NULL;
1044 execvp(args[1], args + 1);
1045 }
1046 initlog(L_VB, “cannot execute “%s””, args[1]);
1047 exit(1);
1048 }
1049 res = pid; 把运行的命令的进程的 pid 返回给 init process
1050 sigprocmask(SIG_SETMASK, &omask, NULL);
1051
1052 INITDBG(L_VB, “Started id %s (pid %d)”, ch->id, pid);
1053
1054 if (pid == -1) {
1055 initlog(L_VB, “cannot fork, retry…”);
1056 do_sleep(5);
1057 continue;
1058 }
1059 return(pid);
1060 }
1061 }
check_init_fifo函数
1991 /

1992 * Read from the init FIFO. Processes like telnetd and rlogind can
1993 * ask us to create login processes on their behalf.
1994 *
1995 * FIXME: this needs to be finished. NOT that it is buggy, but we need
1996 * to add the telnetd/rlogind stuff so people can start using it.
1997 * Maybe move to using an AF_UNIX socket so we can use
1998 * the 2.2 kernel credential stuff to see who we’re talking to.
1999 *
2000 /
2001 void check_init_fifo(void)
2002 {
2003 struct init_request request;
2004 struct timeval tv;
2005 struct stat st, st2;
2006 fd_set fds;
2007 int n;
2008 int quit = 0;
2009
2010 /

2011 * First, try to create /dev/initctl if not present.
2012 /
2013 if (stat(INIT_FIFO, &st2) < 0 && errno == ENOENT) 如果没有建立命名管道(name pipe)/dev/initctl,
2014 (void)mkfifo(INIT_FIFO, 0600); 则建立之,只允许 root 用户读写
2015
2016 /

2017 * If /dev/initctl is open, stat the file to see if it
2018 * is still the same inode.
2019 /
2020 if (pipe_fd >= 0) { pipe_fd 是管道/dev/initctl 的 file handle,如果 >= 0 表示已被 open
2021 fstat(pipe_fd, &st);
2022 if (stat(INIT_FIFO, &st2) < 0 ||
2023 st.st_dev != st2.st_dev || 比较此管道是否是原始打开时的管道(通过 ino 来比较)
2024 st.st_ino != st2.st_ino) {
2025 close(pipe_fd); 如果不一致,则关闭之,这样下面的代码将再次 open 该 pipe
2026 pipe_fd = -1; 此赋值,会造成满足 L2033 的 if 分支
2027 }
2028 }
2029
2030 /

2031 * Now finally try to open /dev/initctl
2032 /
2033 if (pipe_fd < 0) { pipe_fd 为/dev/initctl 的 file handle,-1
2034 if ((pipe_fd = open(INIT_FIFO, O_RDWR|O_NONBLOCK)) >= 0) { 表示还没有打开,这里非阻塞打开
2035 fstat(pipe_fd, &st);
2036 if (!S_ISFIFO(st.st_mode)) {
2037 initlog(L_VB, “%s is not a fifo”, INIT_FIFO);
2038 close(pipe_fd);
2039 pipe_fd = -1;
2040 }
2041 }
2042 if (pipe_fd >= 0) {
2043 /

2044 * Don’t use fd’s 0, 1 or 2.
2045 /
2046 (void) dup2(pipe_fd, PIPE_FD); 把/dev/initctl 的 file handle 设为 PIPE_FD(10)
2047 close(pipe_fd);
2048 pipe_fd = PIPE_FD; 使得/dev/initctl 管道的 file handle 为 10(PIPE_FD)
2049
2050 /

2051 * Return to caller - we’ll be back later.
2052 /
2053 }
2054 }
2055
到了这里,/dev/initctl 管道应该被正确打开
2056 /
Wait for data to appear, if the pipe was opened. /
2057 if (pipe_fd >= 0) while(!quit) {
2058
2059 /
Do select, return on EINTR. /
2060 FD_ZERO(&fds);
2061 FD_SET(pipe_fd, &fds);
2062 tv.tv_sec = 5; 设 select 调用在该管道上最多阻塞 5 秒(timeout 为 5)
2063 tv.tv_usec = 0;
2064 n = select(pipe_fd + 1, &fds, NULL, NULL, &tv); 通过 select 调用来等待/dev/initctl 的输入,如有
2065 if (n <= 0) { 输入则马上返回 ,否则就阻塞次调用
2066 if (n == 0 || errno == EINTR) return; 即 init 3 等待 init 2 发来的 request
2067 continue;
2068 }
2069
从 select 调用返回,表示有进程往/dev/initctl 管道写东西,实际上就是用户通过运行 init X 把要切换到 X run level 的
request 写入该管道,造成 select 调用返回
2070 /
Read the data, return on EINTR. /
2071 n = read(pipe_fd, &request, sizeof(request));
读取写入/dev/initctl 中的 request,该 request 的格式如下:
struct init_request {
int magic; /
Magic number /
int cmd; /
What kind of request /
int runlevel; /
Runlevel to change to /
int sleeptime; /
Time between TERM and KILL /
union {
struct init_request_bsd bsd;
char data[368];
} i;
};
2072 if (n == 0) {
2073 /

2074 * End of file. This can’t happen under Linux (because
2075 * the pipe is opened O_RDWR - see select() in the
2076 * kernel) but you never know…
2077 /
2078 close(pipe_fd);
2079 pipe_fd = -1;
2080 return;
2081 }
2082 if (n <= 0) {
2083 if (errno == EINTR) return; 被 signal 打断了 select()系统调用的等待,所以继续
2084 initlog(L_VB, “error reading initrequest”);
2085 continue;
2086 }
2087
正常情况下,运行到这儿,表示确有 request 被写入/dev/initctl
2088 /

2089 * This is a convenient point to also try to
2090 * find the console device or check if it changed.
2091 /
2092 console_init();
2093
2094 /

2095 * Process request.
2096 /
2097 if (request.magic != INIT_MAGIC || n != sizeof(request)) { 检查被写入的 request 的格式的合法性
2098 initlog(L_VB, “got bogus initrequest”);
2099 continue;
2100 }
2101 switch(request.cmd) {
输入的 request 的格式合法,则判断要采取什么动作。即 init 2 可以向 daemon 进程的 init 3 发送的命令列表。具体 init 2 向
Init 2 发 request 的代码在 telinit()中,代码分析见对 init 2 的代码分析。
下面的全局变量 sltime 用来设定 SIGTERM 与 SIGKILL 间的间隔。当 init process 要杀死某个其管理的 process 时,先发送
SIGTERM,然后等待 sltime 秒(默认为 5 秒),再发送 SIGKILL。
2102 case INIT_CMD_RUNLVL: 切换到新的 run level
2103 sltime = request.sleeptime; 取得等待时间
2104 fifo_new_level(request.runlevel); 设置新的 run level,通过重新读取 inittab 文件来
启动与新 run level 匹配的命令脚本
2105 quit = 1;
2106 break;
2107 case INIT_CMD_POWERFAIL:
2108 sltime = request.sleeptime;
2109 do_power_fail(‘F’);
2110 quit = 1;
2111 break;
2112 case INIT_CMD_POWERFAILNOW:
2113 sltime = request.sleeptime;
2114 do_power_fail(‘L’);
2115 quit = 1;
2116 break;
2117 case INIT_CMD_POWEROK:
2118 sltime = request.sleeptime;
2119 do_power_fail(‘O’);
2120 quit = 1;
2121 break;
上面 3 个 request 都与 UPS 发来的与电源 event 有关。其处理代码都是 do_power_fail(),只不过通过不同的参数来区分状态
2122 case INIT_CMD_SETENV:
2123 initcmd_setenv(request.i.data, sizeof(request.i.data));
2124 break;
2125 case INIT_CMD_CHANGECONS:
2126 if (user_console) {
2127 free(user_console);
2128 user_console = NULL;
2129 }
2130 if (!request.i.bsd.reserved[0])
2131 user_console = NULL;
2132 else
2133 user_console = strdup(request.i.bsd.reserved);
2134 console_init();
2135 quit = 1;
2136 break;
2137 default:
2138 initlog(L_VB, “got unimplemented initrequest.”);
2139 break;
2140 }
2141 }
2142
2143 /

2144 * We come here if the pipe couldn’t be opened.
2145 */
2146 if (pipe_fd < 0) pause();
2147
2148 }
init 2 的运行
init 2 是指具有 root 权限的用户通过运行 init 来切换运行级别,或设置一些 init 3(daemon 进程)在运行当中的参数,比如指定 init 在向
process
发 SIGTERM
与 SIGKILL 之间的间隔秒数(sltime)
及 init 启动程序时传递的环境。
还是从 main()开始分析。
2597 int main(int argc, char **argv)
2598 {
2599 char p;
2600 int f;
2601 int isinit;
2602
2603 /
Get my own name /
2604 if ((p = strrchr(argv[0], ‘/’)) != NULL)
2605 p++;
2606 else
2607 p = argv[0];
2608 umask(022); argv[0] = /sbin/init,则
p 指向 init
2609
2610 /
Quick check /
2611 if (geteuid() != 0) { 检查是否拥有 root 权限,运行 init 必须拥有该权限
2612 fprintf(stderr, “%s: must be superuser.\n”, p);
2613 exit(1);
2614 }
2615
2616 /

2617 * Is this telinit or init ?
2618 /
2619 isinit = (getpid() == 1); 这时启动的 init process(init 2)
的 pid 当然不可能是 1(1 正被 init 2 给占着
呢),所以 isinit = 0
2620 for (f = 1; f < argc; f++) { init 2 可以模拟 init 1 的执行,就是通过传递命令行参数-i
或—init,在手册上
没有提到这一点
2621 if (!strcmp(argv[f], “-i”) || !strcmp(argv[f], “–init”))
2622 isinit = 1;
2623 break;
2624 }
2625 if (!isinit) exit(telinit(p, argc, argv)); 这时 isinit = 0,所以 init 2 的执行进入 telinit()的执
2626

2627 /

2628 * Check for re-exec
2629 */
。。。。。。
telinit()也是 init 2 运行的主要函数。这里的参数 programe
为 init process 的可执行文件名“init”。
2502 int telinit(char *progname, int argc, char **argv)
2503 {
2504 #ifdef TELINIT_USES_INITLVL
2505 FILE fp;
2506 #endif
2507 struct init_request request;
init 2 于 init 3 的沟通通过“/dev/initctl”这个有名 pipe(其由 init 3 创建),在该 pipe 中传递的即是 init 2 对 init 3 提出的
动作要求。即 init 2 本身不干任何事情,只是把用户的请求发送给 init 3 来处理。结构 init_request 就是“动作请求”的格式
struct init_request {
int magic; /
Magic number /
int cmd; /
What kind of request /
int runlevel; /
Runlevel to change to /
int sleeptime; /
Time between TERM and KILL */
union {
struct init_request_bsd bsd;
char data[368];
} i;
};
2508 struct sigaction sa;
2509 int f, fd, l;
2510 char *env = NULL;
2511
2512 memset(&request, 0, sizeof(request));
2513 request.magic = INIT_MAGIC; 请求的签名
2514
下面就是根据命令行上的参数来构建 request
2515 while ((f = getopt(argc, argv, “t:e:”)) != EOF) switch(f) {
2516 case ‘t’: t 参数指定 init 在向 process 发 SIGTERM 与 SIGKILL 之间的间隔秒数(sltime)
2517 sltime = atoi(optarg);
2518 break;
2519 case ‘e’: 手册中没有描述该参数,但看代码是用户可以通过该参数设置环境变量
2520 if (env == NULL)
2521 env = request.i.data;
2522 l = strlen(optarg);
2523 if (env + l + 2 > request.i.data + sizeof(request.i.data)) {
2524 fprintf(stderr, "%s: -e option data "
2525 “too large\n”, progname);
2526 exit(1);
2527 }
2528 memcpy(env, optarg, l);
2529 env += l;
2530 *env++ = 0;
2531 break;
2532 default:
2533 usage(progname);
2534 break;
2535 }
2536
2537 if (env) env++ = 0;
2538
2539 if (env) {
2540 if (argc != optind)
2541 usage(progname);
2542 request.cmd = INIT_CMD_SETENV;
2543 } else {
2544 if (argc - optind != 1 || strlen(argv[optind]) != 1)
2545 usage(progname);
2546 if (!strchr(“0123456789SsQqAaBbCcUu”, argv[optind][0]))
2547 usage(progname);
2548 request.cmd = INIT_CMD_RUNLVL; 处理该 request 的代码在 check_init_fifo()中(L2102)
2549 request.runlevel = env ? 0 : argv[optind][0]; 这里的 request.runlevel 即要切换过去的 run level
2550 request.sleeptime = sltime;
2551 }
2552
2553 /
Open the fifo and write a command. /
2554 /
Make sure we don’t hang on opening /dev/initctl /
2555 SETSIG(sa, SIGALRM, signal_handler, 0);
2556 alarm(3); 设置 alarm,即 3 秒后本进程收到 SIGALRM signal
2557 if ((fd = open(INIT_FIFO, O_WRONLY)) >= 0 && 打开有名管道“/dev/initctl”来写
2558 write(fd, &request, sizeof(request)) == sizeof(request)) { 写入该 request
2559 close(fd);
2560 alarm(0); 取消原先设置的 alarm 即上面 line 2556 行的 alarm(3)
Line 2556
与 Line 2560 是为了保证对有名管道/dev/initctl 的写如果正常的话,肯定应该
2561 return 0; 小于 3 秒。如果超过 3 秒,则本进程会收到 SIGALRM signal,则在下面 Line 2585 中的
2562 } ISMEMBER()将返回 true,则报错
2563
2564 #ifdef TELINIT_USES_INITLVL
2565 if (request.cmd == INIT_CMD_RUNLVL) {
2566 /
Fallthrough to the old method. /
2567
2568 /
Now write the new runlevel. /
2569 if ((fp = fopen(INITLVL, “w”)) == NULL) { 打开/etc/initrunlvl 文件供写
2570 fprintf(stderr, “%s: cannot create %s\n”,
2571 progname, INITLVL);
2572 exit(1);
2573 }
2574 fprintf(fp, “%s %d”, argv[optind], sltime);
2575 fclose(fp);
2576
2577 /
And tell init about the pending runlevel change. /
2578 if (kill(INITPID, SIGHUP) < 0) perror(progname);
向 1 号进程发 SIGHUP signal,也就是让
2579 init 2 重新读取/etc/inittab
2580 return 0;
2581 }
2582 #endif
2583
2584 fprintf(stderr, "%s: ", progname);
2585 if (ISMEMBER(got_signals, SIGALRM)) {
2586 fprintf(stderr, “timeout opening/writing control channel %s\n”,
2587 INIT_FIFO);
2588 } else {
2589 perror(INIT_FIFO);
2590 }
2591 return 1;
2592 }
下图可以表示 init 2 与 init 3 之间的关系。
init 3 init 2 init 3 运行的核心函
数就一个,即
telinit()
/dev/initctl pipe
init 3 作为一个 daemon 进程等待着外部的请求,而这
请求中包括 init 2 发来的 request
request
通过 select
在该 pipe 上
监听
check_init
_fifo() init 2 通过 pipe 发送
request,而 init 3 则监
听者该 pipe,一旦有
request 输入,就马上在
check_init_fifo()来处
理该 request
init 3 的运行
init 3 作为一个 daemon process 运行,它其实与 init 1 是完全一体的,即 init 1 完成系统用户态的启动,然后 init process 就进入
类似 Client-Server 架构中的 server process 的运行。它本身不会主动的发起新的动作,而是当有 init process 关注的事件时,它才会
被动的响应这些动作。我这里把原本连为一体的 init process 分成 init 1 与 init 3,纯粹是为了说明清楚的缘故。
主流程分析
当 init 1 结束后,即进入下面的一个无限循环。
2340 /

2341 * The main loop
2342 /
2343 int init_main()
2344 {
。。。。。。 (这里省略的我把它归类为 init 1)
2456
2457 while(1) { 进入无限循环,init process 以后就在该无限循环中打转,永不出来,除非关机
2458
2459 /
See if we need to make the boot transitions. /
2460 boot_transitions();
2461 INITDBG(L_VB, “init_main: waiting…”);
2462
2463 /
Check if there are processes to be waited on. /
2464 for(ch = family; ch; ch = ch->next) 枚举当前 init process 管理的进程
2465 if ((ch->flags & RUNNING) && ch->action != BOOT) break; 如果发觉有进程处于运行状态则马上退出枚举
2466
2467 #if CHANGE_WAIT
2468 /
Wait until we get hit by some signal. /
2469 while (ch != NULL && got_signals == 0) {
2470 if (ISMEMBER(got_signals, SIGHUP)) {
2471 /
See if there are processes to be waited on. /
2472 for(ch = family; ch; ch = ch->next)
2473 if (ch->flags & WAITING) break;
2474 }
2475 if (ch != NULL) check_init_fifo();
2476 }
2477 #else /
CHANGE_WAIT /
2478 if (ch != NULL && got_signals == 0) check_init_fifo(); check_init_fifo()的注解见“init 1”时
2479 #endif /
CHANGE_WAIT / 的解释。这里是 init 3 检查是否有来自 init 2
2480 的 request。
2481 /
Check the ‘failing’ flags /
2482 fail_check();
2483
2484 /
Process any signals. /
2485 process_signals(); 在 init 运行期间,其可能会收到关心的 signal。在 signal handler 中只是被纪录下来,而
2486 真正的处理则在该函数中。该函数注解见“init 1”的说明
2487 /
See what we need to start up (again) /
2488 start_if_needed(); 该函数注解见“init 1”的说明
2489 }
2490 /NOTREACHED/
2491 }
init process 等候着其关心的事件(通过 signal 的方式来通知),主要有如下几个事件
ƒ 其通过 select 系统调用等候在管道/dev/initctl 的一端,一旦有进程往另一端写入希望 init process 执行的 request,init 3 就会
分析该 request,然后执行要求的动作(见 check_init_fifo 函数注释)。
ƒ 当有 signal 发送给 init process 时,init 在 signal handler 中并不马上处理该 signal,而仅仅是在全局变量 got_signals 中置
一下该 signal 发生过的标志,真正的处理是在 init 3 的循环中的 process_signals()中。
设置 init process 的信号(signal)处理器。
SETSIG(sa, SIGALRM, signal_handler, 0);
SETSIG(sa, SIGHUP, signal_handler, 0);
SETSIG(sa, SIGINT, signal_handler, 0);
SETSIG(sa, SIGPWR, signal_handler, 0);
SETSIG(sa, SIGWINCH, signal_handler, 0);
SETSIG(sa, SIGUSR1, signal_handler, 0);
在 init 1 中把 HUP PWR WINCH ALRM INT signal 的处理器设置好。当 signal 发生时,只是简单的置一下位,真正的对 signal 的处理是
在函数 process_signals()中。
/

  • We got a signal (HUP PWR WINCH ALRM INT)
    /
    void signal_handler(int sig)
    {
    ADDSET(got_signals, sig);
    }
    而 ADDSET 是个 macro。简单设个标志位了事。
    #define ADDSET(set, val) ((set) |= (1 << (val)))
    在 process_signals()中检查哪些标志位置位,然后依次处理。
    if (ISMEMBER(got_signals, SIGPWR)) { ISMEMBER 是个 macro,定义如下
    INITDBG(L_VB, “got SIGPWR”); #define ISMEMBER(set, val) ((set) & (1 << (val)))
    /
    See what kind of SIGPWR this is. / 用于检查某个 signal 是否被触发
    pwrstat = 0;
    if ((fd = open(PWRSTAT, O_RDONLY)) >= 0) {
    c = 0;
    read(fd, &c, 1); 电源 fail 的处理
    pwrstat = c;
    close(fd);
    unlink(PWRSTAT);
    }
    do_power_fail(pwrstat);
    DELSET(got_signals, SIGPWR); 处理完后把标志清调
    }
    if (ISMEMBER(got_signals, SIGINT)) { 当用户按了 Ctrl+Alt+Del 后,内核将发送 SIGINT signal 给 init
    INITDBG(L_VB, “got SIGINT”); process
    /
    Tell ctrlaltdel entry to start up /
    for(ch = family; ch; ch = ch->next)
    if (ch->action == CTRLALTDEL)
    ch->flags &= ~XECUTED; 去掉 disable 执行的标志
    DELSET(got_signals, SIGINT);
    }
    if (ISMEMBER(got_signals, SIGWINCH)) {
    INITDBG(L_VB, “got SIGWINCH”);
    /
    Tell kbrequest entry to start up /
    for(ch = family; ch; ch = ch->next)
    if (ch->action == KBREQUEST)
    ch->flags &= ~XECUTED;
    DELSET(got_signals, SIGWINCH);
    }
    if (ISMEMBER(got_signals, SIGALRM)) {
    INITDBG(L_VB, “got SIGALRM”);
    /
    The timer went off: check it out /
    DELSET(got_signals, SIGALRM);
    }
    if (ISMEMBER(got_signals, SIGCHLD)) {
    INITDBG(L_VB, “got SIGCHLD”);
    /
    First set flag to 0 /
    DELSET(got_signals, SIGCHLD);
    /
    See which child this was /
    for(ch = family; ch; ch = ch->next)
    if (ch->flags & ZOMBIE) {
    INITDBG(L_VB, “Child died, PID= %d”, ch->pid);
    ch->flags &= ~(RUNNING|ZOMBIE|WAITING);
    if (ch->process[0] != ‘+’)
    write_utmp_wtmp("", ch->id, ch->pid, DEAD_PROCESS, NULL);
    }
    }
    if (ISMEMBER(got_signals, SIGHUP)) {
    INITDBG(L_VB, “got SIGHUP”);
    #if CHANGE_WAIT
    /
    Are we waiting for a child? /
    for(ch = family; ch; ch = ch->next)
    if (ch->flags & WAITING) break;
    if (ch == NULL)
    #endif
    {
    /
    We need to go into a new runlevel /
    oldlevel = runlevel;
    #ifdef INITLVL
    runlevel = read_level(0);
    #endif
    if (runlevel == ‘U’) {
    runlevel = oldlevel;
    re_exec();
    } else {
    if (oldlevel != ‘S’ && runlevel == ‘S’) console_stty();
    if (runlevel == ‘6’ || runlevel == ‘0’ ||
    runlevel == ‘1’) console_stty();
    read_inittab();
    fail_cancel();
    setproctitle(“init [%c]”, runlevel);
    DELSET(got_signals, SIGHUP);
    }
    }
    }
    if (ISMEMBER(got_signals, SIGUSR1)) {
    /
  • SIGUSR1 means close and reopen /dev/initctl
    /
    INITDBG(L_VB, “got SIGUSR1”);
    close(pipe_fd);
    pipe_fd = -1;
    DELSET(got_signals, SIGUSR1);
    }
    „ 当受到 UPS 发来的电源 fail signal 时
    if ((fd = open(PWRSTAT, O_RDONLY)) >= 0) {
    c = 0;
    read(fd, &c, 1);
    pwrstat = c;
    close(fd);
    unlink(PWRSTAT);
    }
    do_power_fail(pwrstat);
    首先从/etc/powerstatus 获得原因,“F”,“L”,“O”,分别表示电源 Fail,电源 Low,电源 Ok(恢复)。
    参数 pwrstat 为 powerfail 原因,根据原因来启动/etc/inittab 中要求当对因事件发生时要执行的动作
    参数 pwrstat 反映了电源(power)的状态
    “O”表示电源 OK
    “L”表示电源 Low
    “F” 表示电源 Fail(故障)
    1757 /

    1758 * Start up powerfail entries.
    1759 */
    1760 void do_power_fail(int pwrstat)
    1761 {
    1762 CHILD ch;
    1763
    1764 /

    1765 * Tell powerwait & powerfail entries to start up
    1766 /
    1767 for (ch = family; ch; ch = ch->next) { 枚举 family 链表
    1768 if (pwrstat == ‘O’) { 在收到电源恢复(Ok)的 signal 后
    1769 /

    1770 * The power is OK again.
    1771 /
    1772 if (ch->action == POWEROKWAIT) XECUTED 是 disable 该 process 运行
    1773 ch->flags &= ~XECUTED; 清除 disable 标志,即属性为 POWEROKWAIT 的 process 可以
    运行了
    1774 } else if (pwrstat == ‘L’) { 在收到电源 Low
    的 signal

    1775 /

    1776 * Low battery, shut down now.
    1777 /
    1778 if (ch->action == POWERFAILNOW)
    1779 ch->flags &= ~XECUTED; 允许属性为 POWERFAILNOW 的 process 的运行
    1780 } else {
    1781 /

    1782 * Power is failing, shutdown imminent 在收到电源 Fail
    的 signal

    1783 */
    1784 if (ch->action == POWERFAIL || ch->action == POWERWAIT)
    1785 ch->flags &= ~XECUTED; 允许属性为 POWERFAIL
    与 POWERWAIT 的 process 的运行
    1786 }
    1787 }
    1788 }
    对应到例子文件中

When our UPS tells us power has failed, assume we have a few minutes

of power left. Schedule a shutdown for 2 minutes from now.

This does, of course, assume you have powerd installed and your

UPS connected and working correctly.

pf::powerfail:/sbin/shutdown -f -h +2 “Power Failure; System Shutting Down”

If power was restored before the shutdown kicked in, cancel it.

pr:12345:powerokwait:/sbin/shutdown -c “Power Restored; Shutdown Cancelled”
也就是在默认情况下/sbin/shutdown -f -h +2 “Power Failure; System Shutting Down”(属性为 POWERFAIL)与
/sbin/shutdown -c “Power Restored; Shutdown Cancelled”(属性为 POWEROKWAIT)都在 init process 管理的进程链表中是被
disable 的,即置了 XECUTED 标志的,当收到对应的消息后,会 enable 它。当然这里只是打开了允许他们运行,并没有真正的运行。真正的运
行在下面的 start_if_needed()中。
2484 /* Process any signals. /
2485 process_signals(); 处理被记录下来的 signal
2486
2487 /
See what we need to start up (again) /
2488 start_if_needed(); 有可能在 family 链表中的 node 状态已经改变,所以重新枚举整个链表,看是否有本来不能运行
2489 } 的动作(action)可以执行了
„ 当用户按了中断键(Delete 键或 Ctrl-C),则会向前台 process group 发 SIGINT signal。
2263 /
Tell ctrlaltdel entry to start up /
2264 for(ch = family; ch; ch = ch->next)
2265 if (ch->action == CTRLALTDEL)
2266 ch->flags &= ~XECUTED; 允许 Ctrl-Alt-Del handler 的运行
„ 按《UNIX 环境高级编程》上的说法如下:
当有进程通过 ioctl 接口来改变终端窗口大小时会发该消息。
2272 /
Tell kbrequest entry to start up /
2273 for(ch = family; ch; ch = ch->next)
2274 if (ch->action == KBREQUEST)
2275 ch->flags &= ~XECUTED; 允许属性为 KBREQUEST 的 process 运行
„ 对 timeout 的 alarm,几乎什么都不干
2280 INITDBG(L_VB, “got SIGALRM”);
2281 /
The timer went off: check it out /
2282 DELSET(got_signals, SIGALRM);
„ 当收到有子进程死亡时,要在 utmp 和 wtmp 文件中记上一笔。
2290 /
See which child this was /
2291 for(ch = family; ch; ch = ch->next) 对 family 链表进行枚举,如果发现僵尸,则清除三个标志
2292 if (ch->flags & ZOMBIE) {
2293 INITDBG(L_VB, “Child died, PID= %d”, ch->pid);
2294 ch->flags &= ~(RUNNING|ZOMBIE|WAITING);
2295 if (ch->process[0] != ‘+’)
2296 write_utmp_wtmp("", ch->id, ch->pid, DEAD_PROCESS, NULL);
2297 }
„ 当用户如果修改了/etc/inittab,希望不用重新启动就生效(在正常情况下,init process 只在启统启动阶段才会读取该配置文件),
则可以发 SIGHUP signal,让 init process 重读 inittab,并根据新的配置文件来运行。
2303 #if CHANGE_WAIT
2304 /
Are we waiting for a child? /
2305 for(ch = family; ch; ch = ch->next) 如果这时有 process 处于必须等待其结束的状态,则此时不能
2306 if (ch->flags & WAITING) break; 执行重读 inittab 文件的操作
2307 if (ch == NULL) 如果有 process 处于必须等待其结束的状态,则 ch 就不会为 NULL,即如果为 NULL,表示此
2308 #endif 时重新读取 inittab 文件是安全的
2309 {
2310 /
We need to go into a new runlevel /
2311 oldlevel = runlevel; 把当前的 run level 存入 oldlevel 中
2312 #ifdef INITLVL
2313 runlevel = read_level(0); 获得当前 inittab 文件中的 run level
2314 #endif
2315 if (runlevel == ‘U’) { 按照 init 手册的说法如下“U or u tell init to re-execute itself
2316 runlevel = oldlevel; (preserving the state). No re-examining of /etc/inittab
2317 re_exec(); file happens”,其中 re_exec()即时重新启动一遍 init 管理的链表中的
Process
2318 } else {
2319 if (oldlevel != ‘S’ && runlevel == ‘S’) console_stty();
2320 if (runlevel == ‘6’ || runlevel == ‘0’ ||
2321 runlevel == ‘1’) console_stty();
2322 read_inittab(); 我们在分析 init 1 时说过,read_inittab()中有部分分支是不在系统时执行
2323 fail_cancel(); 的,是在 init 3 时执行的。
2324 setproctitle(“init [%c]”, runlevel);
„ 当 init process 收到 SIGUSR1 signal(用户自定义信号)后,会关闭/dev/initctl pipe。
close(pipe_fd); init 在接受到该 signal 后,init 关闭和重新打开/dev/initctl
pipe_fd = -1;
辅助函数
fail_check函数
1707 /

1708 * This procedure is called after every signal (SIGHUP, SIGALRM…)
1709 *
1710 * Only clear the ‘failing’ flag if the process is sleeping
1711 * longer than 5 minutes, or inittab was read again due
1712 * to user interaction.
1713 */
1714 void fail_check(void)
1715 {
1716 CHILD ch; / Pointer to child structure /
1717 time_t t; /
System time /
1718 time_t next_alarm = 0; /
When to set next alarm /
1719
1720 time(&t); 取得当前系统时间
1721
1722 for(ch = family; ch; ch = ch->next) { 枚举 init process 管理的进程
1723
1724 if (ch->flags & FAILING) { 枚举整个 family 链表检查状态为 fail 的 node
1725 /
Can we free this sucker? /
1726 if (ch->tm + SLEEPTIME < t) {
1727 ch->flags &= ~FAILING;
1728 ch->count = 0;
1729 ch->tm = 0;
1730 } else {
1731 /
No, we’ll look again later /
1732 if (next_alarm == 0 ||
1733 ch->tm + SLEEPTIME > next_alarm)
1734 next_alarm = ch->tm + SLEEPTIME;
1735 }
1736 }
1737 }
1738 if (next_alarm) {
1739 next_alarm -= t;
1740 if (next_alarm < 1) next_alarm = 1;
1741 alarm(next_alarm);
1742 }
1743 }
re_exec函数
1827 /

1828 * Attempt to re-exec.
1829 */
1830 void re_exec(void)
1831 {
1832 CHILD *ch;
1833 sigset_t mask, oldset;
1834 pid_t pid;
1835 char *env;
1836 int fd;
1837
1838 if (strchr(“S12345”,runlevel) == NULL) 这两行对应 telinit 手册中的说明“Run level should be one of
1839 return; Ss12345,otherwise request would be silently ignored“
1840
1841 /

1842 * Reset the alarm, and block all signals.
1843 /
1844 alarm(0); 取消所有 alarm signal
1845 sigfillset(&mask);
1846 sigprocmask(SIG_BLOCK, &mask, &oldset);
1847
1848 /

1849 * construct a pipe fd --> STATE_PIPE and write a signature
1850 /
1851 fd = make_pipe(STATE_PIPE);
1852
1853 /

1854 * It’s a backup day today, so I’m pissed off. Being a BOFH, however,
1855 * does have it’s advantages…
1856 /
1857 fail_cancel();
1858 close(pipe_fd);
1859 pipe_fd = -1;
1860 DELSET(got_signals, SIGCHLD);
1861 DELSET(got_signals, SIGHUP);
1862 DELSET(got_signals, SIGUSR1);
1863
1864 /

1865 * That should be cleaned.
1866 /
1867 for(ch = family; ch; ch = ch->next) 对 init process 管理的进程链表进行枚举
1868 if (ch->flags & ZOMBIE) {
1869 INITDBG(L_VB, “Child died, PID= %d”, ch->pid);
1870 ch->flags &= ~(RUNNING|ZOMBIE|WAITING);
1871 if (ch->process[0] != ‘+’)
1872 write_utmp_wtmp("", ch->id, ch->pid, DEAD_PROCESS, NULL);
1873 }
1874
1875 if ((pid = fork()) == 0) {
1876 /

1877 * Child sends state information to the parent.
1878 /
1879 send_state(fd); 由 init 的子进程往 state pipe 中写入当前 init process 管理的 process 的状态
1880 exit(0);
1881 }
1882
1883 /

1884 * The existing init process execs a new init binary.
1885 /
1886 env = init_buildenv(0);
1887 execl(myname, myname, “–init”, NULL, env); 原有的 init process 执行/sbin/init,环境用
1888 init_buildenv()构造的,而非父进程
1889 /

1890 * We shouldn’t be here, something failed.
1891 * Bitch, close the state pipe, unblock signals and return.
1892 */
1893 close(fd);
1894 close(STATE_PIPE);
1895 sigprocmask(SIG_SETMASK, &oldset, NULL);
1896 init_freeenv(env);
1897 initlog(L_CO, “Attempt to re-exec failed”);
1898 }
read_inittab函数在init 3 时的分析
1108 void read_inittab(void)
1109 {
1110 FILE fp; / The INITTAB file */
1111 CHILD *ch, *old, i; / Pointers to CHILD structure */
1112 CHILD head = NULL; / Head of linked list /
1113 #ifdef INITLVL
1114 struct stat st; /
To stat INITLVL /
1115 #endif
1116 sigset_t nmask, omask; /
For blocking SIGCHLD. /
1117 char buf[256]; /
Line buffer /
1118 char err[64]; /
Error message. */
1119 char *id, *rlevel,
1120 *action, process; / Fields of a line */
1121 char p;
1122 int lineNo = 0; /
Line number in INITTAB file /
1123 int actionNo; /
Decoded action field /
1124 int f; /
Counter /
1125 int round; /
round 0 for SIGTERM, 1 for SIGKILL /
1126 int foundOne = 0; /
No killing no sleep /
1127 int talk; /
Talk to the user /
1128 int done = 0; /
Ready yet? /
1129
1130 #if DEBUG
1131 if (newFamily != NULL) {
1132 INITDBG(L_VB, “PANIC newFamily != NULL”);
1133 exit(1);
1134 }
1135 INITDBG(L_VB, “Reading inittab”);
1136 #endif
1137
1138 /

1139 * Open INITTAB and real line by line.
1140 /
1141 if ((fp = fopen(INITTAB, “r”)) == NULL) 打开/etc/inittab 文件
1142 initlog(L_VB, “No inittab file found”);
1143
1144 while(!done) { 每循环一次即处理 inittab 中一行,构造 newFamily 链表。注意是 newFamily 链表,不是
1145 /
family 链表
1146 * Add single user shell entry at the end.
1147 /
1148 if (fp == NULL || fgets(buf, sizeof(buf), fp) == NULL) {
1149 done = 1; 已经处理完毕,退出循环
1150 /

1151 * See if we have a single user entry.
1152 /
1153 for(old = newFamily; old; old = old->next) ???
1154 if (strpbrk(old->rlevel, “S”)) break;
1155 if (old == NULL)
1156 snprintf(buf, sizeof(buf), “~~:S:wait:%s\n”, SULOGIN);
1157 else
1158 continue;
1159 }
1160 lineNo++;
1161 /

1162 * Skip comments and empty lines
1163 */
1164 for(p = buf; *p == ’ ’ || p == ‘\t’; p++) 忽略前导空白字符
1165 ;
1166 if (p == ‘#’ || p == ‘\n’) continue; 以“#”开头的为注释,忽略
1167
1168 /

1169 * Decode the fields
1170 /
分解 id:runlevels:action:process 中的 4 部分
1171 id = strsep(&p, “:”); 由于文件中的配置行的各部分用“:”分割,所以这里通过 strsep 来分别提取各部分内
1172 rlevel = strsep(&p, “:”); 容
1173 action = strsep(&p, “:”);
1174 process = strsep(&p, “\n”);
1175
从下面的代码可以看到在 init manual 中没有标明的限制,比如命令行的长度不能太长,超过 127 个字符等
1176 /

1177 * Check if syntax is OK. Be very verbose here, to
1178 * avoid newbie postings on comp.os.linux.setup
1179 /
1180 err[0] = 0;
1181 if (!id || !id) strcpy(err, “missing id field”);
1182 if (!rlevel) strcpy(err, “missing runlevel field”);
1183 if (!process) strcpy(err, “missing process field”);
1184 if (!action || !action)
1185 strcpy(err, “missing action field”);
1186 if (id && strlen(id) > sizeof(utproto.ut_id))
1187 sprintf(err, “id field too long (max %d characters)”,
1188 (int)sizeof(utproto.ut_id));
1189 if (rlevel && strlen(rlevel) > 11)
1190 strcpy(err, “rlevel field too long (max 11 characters)”);
1191 if (process && strlen(process) > 127)
1192 strcpy(err, “process field too long”);
1193 if (action && strlen(action) > 32)
1194 strcpy(err, “action field too long”);
1195 if (err[0] != 0) {
1196 initlog(L_VB, “%s[%d]: %s”, INITTAB, lineNo, err);
1197 INITDBG(L_VB, “%s:%s:%s:%s”, id, rlevel, action, process);
1198 continue;
1199 }
1200
1201 /

1202 * Decode the “action” field
1203 /
init 允许的 action 类型都记录在 actions[]数组中,这里通过比较字符串来把其转换成数字型 identifier
1204 actionNo = -1;
1205 for(f = 0; actions[f].name; f++)
1206 if (strcasecmp(action, actions[f].name) == 0) {
1207 actionNo = actions[f].act;
1208 break;
1209 }
1210 if (actionNo == -1) { 碰到非法的 action(不在 actions[]数组中的)则忽略
1211 initlog(L_VB, “%s[%d]: %s: unknown action field”,
1212 INITTAB, lineNo, action);
1213 continue;
1214 }
1215
1216 /

1217 * See if the id field is unique
1218 /
配置行中的第一部分是所谓 identifier,必须唯一,但命名好像没有任何规定,可任意。已经处理过的配置行都被记录入 CHILD 的链表节点
中,这里在处理当前行时检查一下已有节点中是否有与当前行的 id 相同的,如果有,则不是忽略该行,而是停止继续处理/etc/inittab 文件,
可见 id 的唯一性是至关重要的
1219 for(old = newFamily; old; old = old->next) {
1220 if(strcmp(old->id, id) == 0 && strcmp(id, “~~”)) {
1221 initlog(L_VB, “%s[%d]: duplicate ID field “%s””,
1222 INITTAB, lineNo, id);
1223 break;
1224 }
1225 }
1226 if (old) continue;
1227
1228 /

1229 * Allocate a CHILD structure
1230 /
1231 ch = imalloc(sizeof(CHILD)); 为当前配置行分配一个 CHILD node
1232
1233 /

1234 * And fill it in. 用当前配置行中的信息来填充 CHILD node
1235 /
1236 ch->action = actionNo; action 类型
1237 strncpy(ch->id, id, sizeof(utproto.ut_id) + 1); /
Hack for different libs. /该行的唯一标示符
1238 strncpy(ch->process, process, sizeof(ch->process) - 1); 该行是要执行的命令行
1239 if (rlevel[0]) { 填上 run level
1240 for(f = 0; f < sizeof(rlevel) - 1 && rlevel[f]; f++) {
1241 ch->rlevel[f] = rlevel[f];
1242 if (ch->rlevel[f] == ‘s’) ch->rlevel[f] = ‘S’;
1243 }
1244 strncpy(ch->rlevel, rlevel, sizeof(ch->rlevel) - 1);
1245 } else { 如果没有写 run level,则表示所有 run level 都要执行该行的 process 部分
1246 strcpy(ch->rlevel, “0123456789”);
1247 if (ISPOWER(ch->action))
1248 strcpy(ch->rlevel, “S0123456789”);
1249 }
下面是对 action 的处理
1250 /

1251 * We have the fake runlevel ‘#’ for SYSINIT and
1252 * '
’ for BOOT and BOOTWAIT.
1253 /
从上面的注释看,SYSINIT action 用’#‘表示,而 BOOT action 用’
‘表示。而真正合法的 run level 是 0 到 9 加上’S’
“#”与“
”表示在任何 run level 都要执行,另外 SYSINIT 的优先级是最高的,所以它应该比 BOOT 中的 action 先执行
1254 if (ch->action == SYSINIT) strcpy(ch->rlevel, “#”);
1255 if (ch->action == BOOT || ch->action == BOOTWAIT)
1256 strcpy(ch->rlevel, "
");
1257
1258 /

1259 * Now add it to the linked list. Special for powerfail.
1260 /
在从/etc/inittab 中读取配置行并生成的链表的头为 newFamily。如果是系统的启动阶段,family 所表示的链表自然为空,如果只是通过运
行 init 来切换 run level 等,则 family 记录的链表非空,也就是当前 init 通过上次读取/etc/inittab 后生成的链表。
1261 if (ISPOWER(ch->action)) { 如果 action 属于这几种(POWERWAIT,POWERFAIL,POWEROKWAIT,
1262 POWERFAILNOW,CTRLALTDEL),即与电源相关与用户按了 Ctrl+Alt+Del 键
1263 /

1264 * Disable by default
1265 /
1266 ch->flags |= XECUTED; 设置不执行标志。在 startup()种如果检测到该配置行所代表的 action 的 flag 设置
1267 了 XECUTED,则忽略对该行的处理。这可以理解,因为符合 ISPOWER()的 action 都不是
1268 /
在正常情况下需要运行的,只有对应的事件确实发生了,才需要执行。比如,如果用户从
来没有按下过 Ctrl+Alt+Del 键,自然根本不需要执行/etc/inittab 中 CTRLALTDEL
Action 指定的 process。所以在默认情况下它是 disable 的(通过设置 XECUTED 标
志)。当检测到按下 Ctrl+Alt+Del 键后,才需要 enable。
并且上述 action 被插在 family 链表的前面,这样如果它们需要执行的话,将先得到执
行。可以理解,因为这几个动作都比较严重,所以优先权较高
1269 * Tricky: insert at the front of the list…
1270 /
1271 old = NULL;
1272 for(i = newFamily; i; i = i->next) {
1273 if (!ISPOWER(i->action)) break;
1274 old = i;
1275 }
1276 /

1277 * Now add after entry “old”
1278 /
1279 if (old) {
1280 ch->next = i;
1281 old->next = ch;
1282 if (i == NULL) head = ch;
1283 } else {
1284 ch->next = newFamily;
1285 newFamily = ch;
1286 if (ch->next == NULL) head = ch;
1287 }
1288 } else { 其他的 action 都插到尾部,KBREQUEST 默认是不执行的。从 init manual 上看 SIGWINCH signal 会触发该动作
1289 /

1290 * Just add at end of the list
1291 /
1292 if (ch->action == KBREQUEST) ch->flags |= XECUTED;
1293 ch->next = NULL;
1294 if (head)
1295 head->next = ch;
1296 else
1297 newFamily = ch;
1298 head = ch;
1299 }
1300
1301 /

1302 * Walk through the old list comparing id fields
1303 /
1304 for(old = family; old; old = old->next)
1305 if (strcmp(old->id, ch->id) == 0) {
1306 old->new = ch;
1307 break;
1308 }
到这里处理一行结束
1309 }
1310 /

1311 * We’re done.
1312 /
1313 if (fp) fclose(fp); 关闭/etc/inittab 文件,对于 init 1,实际上基本上到此为止。
1314 下面都是作为 daemon 进程的 init
即 init 3 的处理。我觉得整个代码应该整理得更清晰一点,由
kernel 启动 init(init 1)与作为 daemon 进程运行的 init(init 3)的逻辑应该分开,而不要像现
在一样,绞和在一块,比较乱
1315 /

1316 * Loop through the list of children, and see if they need to
1317 * be killed.
1318 /
1319
1320 INITDBG(L_VB, “Checking for children to kill”);
1321 for(round = 0; round < 2; round++) { 循环两遍,round 0 for SIGTERM, 1 for SIGKILL,即第一遍给要杀死的
1322 talk = 1; process 发 SIGTERM signal,第二遍发 SIGKILL signal
1323 for(ch = family; ch; ch = ch->next) { 由于是系统启动阶段运行 init(init 1),则这时的 family 链表为空,应该
1324 ch->flags &= ~KILLME; 不进入循环,则这时 round = 0,talk = 1,foundOne = 0,代码应该跳转
1325 到 L1393 行执行。
在 init 3 中,family 的链表当然不为空,所以自然会进入循环
从 L1321 到 L1414 的代码是 init 3 执行的代码,init 1 不会执行。
这部分代码就是首先把当前 init 管理的 process(在 family 链表中)需要杀死的就 kill 掉,为重新 inittab 中要启动的 process
清理场地。
1.对如下属性的 process 是不能杀死的,即忽略这些进程
BOOT
S — Single User Mode
DEMAND
2.杀死 process 的方式是这样的,先发送 SIGTERM signal,然后等待 sltime 秒(sltime 默认为 5 秒,但用户可以设定),最后
再发 SIGKILL signal。
1326 /

1327 * Is this line deleted?
1328 /
1329 if (ch->new == NULL) ch->flags |= KILLME;
1330
1331 /

1332 * If the entry has changed, kill it anyway. Note that
1333 * we do not check ch->process, only the “action” field.
1334 * This way, you can turn an entry “off” immediately, but
1335 * changes in the command line will only become effective
1336 * after the running version has exited.
1337 /
1338 if (ch->new && ch->action != ch->new->action) ch->flags |= KILLME;
1339
1340 /

1341 * Only BOOT processes may live in all levels
1342 /
1343 if (ch->action != BOOT &&
1344 strchr(ch->rlevel, runlevel) == NULL) {
1345 /

1346 * Ondemand procedures live always,
1347 * except in single user
1348 /
1349 if (runlevel == ‘S’ || !(ch->flags & DEMAND))
1350 ch->flags |= KILLME;
1351 }
1352
1353 /

1354 * Now, if this process may live note so in the new list
1355 /
1356 if ((ch->flags & KILLME) == 0) {
1357 ch->new->flags = ch->flags;
1358 ch->new->pid = ch->pid;
1359 ch->new->exstat = ch->exstat;
1360 continue;
1361 }
1362
1363
1364 /

1365 * Is this process still around?
1366 /
1367 if ((ch->flags & RUNNING) == 0) {
1368 ch->flags &= ~KILLME;
1369 continue;
1370 }
1371 INITDBG(L_VB, “Killing “%s””, ch->process);
1372 switch(round) { 先发 SIGTERM signal
1373 case 0: /
Send TERM signal /
1374 if (talk)
1375 initlog(L_CO,
1376 “Sending processes the TERM signal”);
1377 kill(-(ch->pid), SIGTERM);
将 SIGTERM signal 发给其进程组 id
为 ch->pid

1378 foundOne = 1; process
1379 break;
1380 case 1: /
Send KILL signal and collect status /
1381 if (talk) 发送 SIGKILL signal
1382 initlog(L_CO,
1383 “Sending processes the KILL signal”);
1384 kill(-(ch->pid), SIGKILL);
1385 break;
1386 }
1387 talk = 0;
1388
1389 }
1390 /

1391 * See if we have to wait 5 seconds
1392 /
1393 if (foundOne && round == 0) { 对于系统启动阶段运行的 init(init 1),round = 0,
但 foundOne = 0,所以
1394 /
将不会进入该 if 分支,跳转到 L1419 执行
1395 * Yup, but check every second if we still have children.
1396 / 同样,init 3 将执行执行这里 L1397
到 L1411 的代码
1397 for(f = 0; f < sltime; f++) {
1398 for(ch = family; ch; ch = ch->next) {
1399 if (!(ch->flags & KILLME)) continue;
1400 if ((ch->flags & RUNNING) && !(ch->flags & ZOMBIE))
1401 break;
1402 }
1403 if (ch == NULL) {
1404 /

1405 * No running children, skip SIGKILL
1406 /
1407 round = 1;
1408 foundOne = 0; /
Skip the sleep below. /
1409 break;
1410 }
1411 do_sleep(1); 整个循环将等待 1 * sltime

1412 }
1413 }
1414 }
1415
1416 /

1417 * Now give all processes the chance to die and collect exit statuses.
1418 /
1419 if (foundOne) do_sleep(1); init 1 运行时,foundOne = 0,所以不会睡眠一秒
1420 for(ch = family; ch; ch = ch->next) 这时的 family 链表为空,不进入该循环,跳转到 L1437 执行
1421 if (ch->flags & KILLME) { init 3 将执行这里 L1420 到 L1430 的代码
1422 if (!(ch->flags & ZOMBIE))
1423 initlog(L_CO, “Pid %d [id %s] seems to hang”, ch->pid,
1424 ch->id);
1425 else {
1426 INITDBG(L_VB, “Updating utmp for pid %d [id %s]”,
1427 ch->pid, ch->id);
1428 ch->flags &= ~RUNNING;
1429 if (ch->process[0] != ‘+’)
1430 write_utmp_wtmp("", ch->id, ch->pid, DEAD_PROCESS, NULL);
1431 }
1432 }
1433
1434 /

1435 * Both rounds done; clean up the list.
1436 /
1437 sigemptyset(&nmask);
1438 sigaddset(&nmask, SIGCHLD);
1439 sigprocmask(SIG_BLOCK, &nmask, &omask);
1440 for(ch = family; ch; ch = old) { init 1 运行时,family 链表还为空,所以不进入循环,跳转到 L1444
1441 old = ch->next; init 3 运行时,需要先释放原来 family 链表中的节点,重读 inittab 文件后生成的链表由
newFamily 指向
1442 free(ch);
1443 }
1444 family = newFamily; newFamily 中就是在本函数头上读入的当前的/etc/inittab 配置行生成的链表,现在才
把它赋给 family
1445 for(ch = family; ch; ch = ch->next) ch->new = NULL;
1446 newFamily = NULL;
1447 sigprocmask(SIG_SETMASK, &omask, NULL);
1448
1449 #ifdef INITLVL
1450 /

1451 * Dispose of INITLVL file.
1452 / 删除/etc/initrunlvl,根据是文件还是 symbol link,删除方式是不一样的
1453 if (lstat(INITLVL, &st) >= 0 && S_ISLNK(st.st_mode)) { 检查/etc/initrunlvl 是文件还是 symbol link
1454 /

1455 * INITLVL is a symbolic link, so just truncate the file.
1456 /
1457 close(open(INITLVL, O_WRONLY|O_TRUNC));
1458 } else {
1459 /

1460 * Delete INITLVL file.
1461 /
1462 unlink(INITLVL);
1463 }
1464 #endif
1465 #ifdef INITLVL2
1466 /

1467 * Dispose of INITLVL2 file.
1468 / 删除/var/log/initrunlvl,根据是文件还是 symbol link,删除方式是不一样的
1469 if (lstat(INITLVL2, &st) >= 0 && S_ISLNK(st.st_mode)) {
1470 /

1471 * INITLVL2 is a symbolic link, so just truncate the file.
1472 /
1473 close(open(INITLVL2, O_WRONLY|O_TRUNC));
1474 } else {
1475 /

1476 * Delete INITLVL2 file.
1477 */
1478 unlink(INITLVL2);
1479 }
1480 #endif
1481 }
后记
本笔记是我想写的Linux系统初始化系列的一部分。(主要分三部分,一是init进程分析,也就是本文;二是多用户模式启动分析,即进入
console登录界面的启动;三是GUI模式启动,即X Window登录界面的启动1)Linux内核初始化后的动作是启动init process,而以后的用户
态的初始化全部由其启动,所以它是研究系统用户态启动的源头,非学习不可。
1 这里不涉及Linux内核本身的初始化,也就是上面介绍的start_kernel函数及其调用的相关函数。因为这个话题实在太大,几乎涉及内核所有
子系统。
联系
Walter Zhou
mailto:[email protected]
附录
环境

  1. sysvinit-2.86 包
  2. VMware + Redhat 8.0
    inittab 中 action 的注解
    ƒ respawn:如果process字段指定的进程不存在,则启动该进程,init不等待处理结束,而是继续扫描inittab文件中的后续进程,当这
    样的进程终止时,init会重新启动它,如果这样的进程已存在,则什么也不做。
    ƒ wait:启动process字段指定的进程,并等到处理结束才去处理inittab中的下一记录项。
    ƒ once:启动process字段指定的进程,不等待处理结束就去处理下一记录项。当这样的进程终止时,也不再重新启动它,在进入新的运行
    级别时,如果这样的进程仍在运行,init也不重新启动它。
    ƒ boot:只有在系统启动时,init才处理这样的记录项,启动相应进程,并不等待处理结束就去处理下一个记录项。当这样的进程终止时,
    系统也不重启它。
    ƒ bootwait:系统启动后,当第一次从单用户模式进入多用户模式时处理这样的记录项,init启动这样的进程,并且等待它的处理结束,
    然后再进行下一个记录项的处理,当这样的进程终止时,系统也不重启它。
    ƒ powerfail:当init接到断电的信号(SIGPWR)时,处理指定的进程。
    ƒ powerwait:当init接到断电的信号(SIGPWR)时,处理指定的进程,并且等到处理结束才去检查其他的记录项。
    ƒ off:如果指定的进程正在运行,init就给它发SIGTERM警告信号,在向它发出信号SIGKILL强制其结束之前等待 5 秒,如果这样的进程
    不存在,则忽略这一项。
    ƒ ondemand:功能通respawn,不同的是,与具体的运行级别无关,只用于rstate字段是a、b、c的那些记录项。
    ƒ sysinit:指定的进程在访问控制台之前执行,这样的记录项仅用于对某些设备的初始化,目的是为了使init在这样的设备上向用户提问
    有关运行级别的问题,init需要等待进程运行结束后才继续。
    ƒ initdefault:指定一个默认的运行级别,只有当init一开始被调用时才扫描这一项,如果rstate字段指定了多个运行级别,其中最大
    的数字是默认的运行级别,如果rstate字段是空的,init认为字段是 0123456,于是进入级别 6,这样便陷入了一个循环,如果
    inittab文件中没有包含initdefault的记录项,则在系统启动时请求用户为它指定一个初始运行级别
    关机分析
    在 init process 的配置文件 inittab 中有多个动作涉及到 shutdown(关机命令)。

Trap CTRL-ALT-DELETE

ca::ctrlaltdel:/sbin/shutdown -t3 -r now 当按下 CTRL-ALT-DELETE 的组合键

When our UPS tells us power has failed, assume we have a few minutes

of power left. Schedule a shutdown for 2 minutes from now.

This does, of course, assume you have powerd installed and your

UPS connected and working correctly. 当 UPS 报告电源 fail 时

pf::powerfail:/sbin/shutdown -f -h +2 “Power Failure; System Shutting Down”

If power was restored before the shutdown kicked in, cancel it. 当 UPS 报告电源恢复

pr:12345:powerokwait:/sbin/shutdown -c “Power Restored; Shutdown Cancelled”
在上面三种情况下,init process 都会调用 shutdown 命令,只不过参数不同而已。比如当按下 CTRL-ALT-DELETE 的组合键后,“-t3 -r
now”表示从现在(now)开始等待 3 秒(t3),然后重启系统®。具体参数请看该命令的手册。
为什么在介绍系统启动的 init process 时要介绍关机命令呢?很简单,shutdown 命令与 init process 息息相关。shutdown 真正的关机或
重启实际上是通过 init process 来实现的。
关机流程介绍
Shutdown 命令有如下的参数选项
/*

  • Show usage message.
    */
    void usage(void)
    {
    fprintf(stderr,
    “Usage:\t shutdown [-akrhHPfnc] [-t secs] time [warning message]\n”
    “\t\t -a: use /etc/shutdown.allow\n”
    “\t\t -k: don’t really shutdown, only warn.\n”
    “\t\t -r: reboot after shutdown.\n”
    “\t\t -h: halt after shutdown.\n”
    “\t\t -P: halt action is to turn off power.\n”
    “\t\t -H: halt action is to just halt.\n”
    “\t\t -f: do a ‘fast’ reboot (skip fsck).\n”
    “\t\t -F: Force fsck on reboot.\n”
    “\t\t -n: do not go through “init” but go down real fast.\n”
    “\t\t -c: cancel a running shutdown.\n”
    “\t\t -t secs: delay between warning and kill signal.\n”
    "\t\t ** the “time” argument is mandatory! (try “now”) *\n");
    exit(1);
    }
    -a 表示只有记录在/etc/shutdown.allow 文件中的用户才允许运行 shutdown 命令。
    -k 表示并不是真正要关机或重启,仅仅发给登录该系统的用户警告。
    -r 表示系统重新启动。
    -h 表示系统 halt。
    -P 表示系统关机。
    -f 表示重新启动,同时重启时不运行 fsck(检查文件系统)
    -F 表示重新启动,同时重启时强制运行 fsck(检查文件系统)
    -c 表示取消已进入 shutdown 状态的系统。
    -t secs: 指定在发警告与杀死系统中运行的进程之间的间隔秒数
    与 shutdown 相关的一些配置文件。
    /etc/nologin 如果存在该文件,则不允许登陆该系统。如果系统要禁止用户登录,则只要建立该文件即可。该文件只要存在及可,有无内容
    无所谓。
    /
  • Create the /etc/nologin file.
    /
    void donologin(int min)
    {
    FILE fp;
    time_t t;
    time(&t);
    t += 60 * min;
    if ((fp = fopen(NOLOGIN, “w”)) != NULL) { create /etc/nologin 文件
    fprintf(fp, “\rThe system is going down on %s\r\n”, ctime(&t));
    if (message[0]) fputs(message, fp);
    fclose(fp);
    }
    }
    /fastboot 如果希望在重启时不要检查文件系统(fsck),就建立该文件,有无内容无所谓。
    while((c = getopt(argc, argv, “HPacqkrhnfFyt:g:i:”)) != EOF) { 分析 shutdown 命令行参数
    switch© {
    case ‘H’:
    halttype = “HALT”;
    break;
    case ‘P’:
    halttype = “POWERDOWN”;
    break;
    case ‘a’: /
    Access control. /
    useacl = 1;
    break;
    case ‘c’: /
    Cancel an already running shutdown. /
    cancel = 1;
    break;
    case ‘k’: /
    Don’t really shutdown, only warn.
    /
    dontshut = 1;
    break;
    case ‘r’: /* Automatic reboot /
    down_level[0] = ‘6’;
    break;
    case ‘h’: /
    Halt after shutdown /
    down_level[0] = ‘0’;
    break;
    case ‘f’: /
    Don’t perform fsck after next boot /
    fastboot = 1; 表示希望重启时不要检查文件系统,置标志
    break;
    chdir("/");
    if (fastboot) close(open(FASTBOOT, O_CREAT | O_RDWR, 0644)); 如果 fastboot 标志置位建立/fastboot 文件
    /forcefsck 如果希望在重启时强制检查文件系统(fsck),就建立该文件,有无内容无所谓。
    if (forcefsck) close(open(FORCEFSCK, O_CREAT | O_RDWR, 0644)); 如果 fastfsck 标志置位建立/forcefsck


    /etc/shutdown.allow 如果该文件存在,shutdown 命令会查看该文件,只有列在该文件中的用户才可以运行 shutdown 命令;如果没有该文
    件,则只有 root 用户才有权运行 shutdown 命令。
    /
    Process the options. /
    while((c = getopt(argc, argv, “HPacqkrhnfFyt:g:i:”)) != EOF) {
    switch© {
    case ‘H’:
    halttype = “HALT”;
    break;
    case ‘P’:
    halttype = “POWERDOWN”;
    break;
    case ‘a’: /
    Access control. / -a option,表示查看/etc/shutdown.allow 文件
    useacl = 1; 置标志
    break;
    。。。。。。
    /
    Do we need to use the shutdown.allow file ? /
    if (useacl && (fp = fopen(SDALLOW, “r”)) != NULL) { 如果标志置位,则打开/etc/shutdown.allow 文件,里面每一
    行是一个用户登录名
    /
    Read /etc/shutdown.allow. */
    i = 0;
    while(fgets(buf, 128, fp)) { 循环读取一行
    if (buf[0] == ‘#’ || buf[0] == ‘\n’) continue;
    以#开头的为注释,忽略该行
    if (i > 31) continue; 该文件最多可以有 32 个用户登录名,我不知道为什么要有这个限制
    for(sp = buf; *sp; sp++) if (*sp == ‘\n’) sp = 0; 提取用户名
    downusers[i++] = strdup(buf); 把提取出的用户名放入 downusers 数组
    }
    if (i < 32) downusers[i] = 0;
    fclose(fp);
    /
    Now walk through /var/run/utmp to find logged in users. /
    while(!user_ok && (ut = getutent()) != NULL) {
    /
    See if this is a user process on a VC. */
    if (ut->ut_type != USER_PROCESS) continue;
    sprintf(term, "/dev/%.s", UT_LINESIZE, ut->ut_line);
    if (stat(term, &st) < 0) continue;
    #ifdef major /
    glibc /
    if (major(st.st_rdev) != 4 ||
    minor(st.st_rdev) > 63) continue;
    #else
    if ((st.st_rdev & 0xFFC0) != 0x0400) continue;
    #endif
    /
    Root is always OK. /
    if (strcmp(ut->ut_user, “root”) == 0) { root 用户总是可以关机的
    user_ok++;
    break;
    }
    /
    See if this is an allowed user. /
    for(i = 0; i < 32 && downusers[i]; i++)
    if (!strncmp(downusers[i], ut->ut_user, 检查运行 shutdown 命令的用户名是否在 downusers
    UT_NAMESIZE)) { 数组中
    user_ok++; 置位,可以运行 shutdown 命令标志
    break;
    }
    }
    endutent();
    /
    See if user was allowed. /
    if (!user_ok) {
    if ((fp = fopen(CONSOLE, “w”)) != NULL) {
    fprintf(fp, "\rshutdown: no authorized users "
    “logged in.\r\n”);
    fclose(fp);
    }
    exit(1);
    }
    }
    Shutdown 源码
    /
  • shutdown.c Shut the system down.
  • Usage: shutdown [-krhfnc] time [warning message]
  • -k: don’t really shutdown, only warn.
  • -r: reboot after shutdown.
  • -h: halt after shutdown.
  • -f: do a ‘fast’ reboot (skip fsck).
  • -F: Force fsck on reboot.
  • -n: do not go through init but do it ourselves.
  • -c: cancel an already running shutdown.
  • -t secs: delay between SIGTERM and SIGKILL for init.
  • Author: Miquel van Smoorenburg, [email protected]
  • Version: @(#)shutdown 2.86-1 31-Jul-2004 [email protected]
  • This file is part of the sysvinit suite,
  • Copyright 1991-2004 Miquel van Smoorenburg.
  • This program is free software; you can redistribute it and/or
  • modify it under the terms of the GNU General Public License
  • as published by the Free Software Foundation; either version
  • 2 of the License, or (at your option) any later version.
    */
    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include “paths.h”
    #include “reboot.h”
    #include “initreq.h”
    char Version = “@(#) shutdown 2.86-1 31-Jul-2004 [email protected]”;
    #define MESSAGELEN 256
    int dontshut = 0; /
    Don’t shutdown, only warn /
    char down_level[2]; /
    What runlevel to go to. /
    int dosync = 1; /
    Sync before reboot or halt /
    int fastboot = 0; /
    Do a ‘fast’ reboot /
    int forcefsck = 0; /
    Force fsck on reboot /
    char message[MESSAGELEN]; /
    Warning message */
    char sltime = 0; / Sleep time /
    char newstate[64]; /
    What are we gonna do /
    int doself = 0; /
    Don’t use init */
    int got_alrm = 0;
    char clean_env[] = {
    “HOME=/”,
    “PATH=/bin:/usr/bin:/sbin:/usr/sbin”,
    “TERM=dumb”,
    NULL,
    };
    /
    From “wall.c” */
    extern void wall(char , int, int);
    /
    From “utmp.c” */
    extern void write_wtmp(char *user, char *id, int pid, int type, char line);
    /
  • Sleep without being interrupted.
    /
    void hardsleep(int secs)
    {
    struct timespec ts, rem;
    ts.tv_sec = secs;
    ts.tv_nsec = 0;
    while(nanosleep(&ts, &rem) < 0 && errno == EINTR)
    ts = rem;
    }
    /
  • Break off an already running shutdown.
    /
    void stopit(int sig)
    {
    unlink(NOLOGIN);
    unlink(FASTBOOT);
    unlink(FORCEFSCK);
    unlink(SDPID);
    printf("\r\nShutdown cancelled.\r\n");
    exit(0);
    }
    /
  • Show usage message.
    */
    void usage(void)
    {
    fprintf(stderr,
    “Usage:\t shutdown [-akrhHPfnc] [-t secs] time [warning message]\n”
    “\t\t -a: use /etc/shutdown.allow\n”
    “\t\t -k: don’t really shutdown, only warn.\n”
    “\t\t -r: reboot after shutdown.\n”
    “\t\t -h: halt after shutdown.\n”
    “\t\t -P: halt action is to turn off power.\n”
    “\t\t -H: halt action is to just halt.\n”
    “\t\t -f: do a ‘fast’ reboot (skip fsck).\n”
    “\t\t -F: Force fsck on reboot.\n”
    “\t\t -n: do not go through “init” but go down real fast.\n”
    “\t\t -c: cancel a running shutdown.\n”
    “\t\t -t secs: delay between warning and kill signal.\n”
    "\t\t ** the “time” argument is mandatory! (try “now”) *\n");
    exit(1);
    }
    void alrm_handler(int sig)
    {
    got_alrm = sig;
    }
    /
  • Set environment variables in the init process.
    */
    int init_setenv(char *name, char value)
    {
    struct init_request request;
    struct sigaction sa;
    int fd;
    int nl, vl;
    memset(&request, 0, sizeof(request));
    request.magic = INIT_MAGIC;
    request.cmd = INIT_CMD_SETENV;
    nl = strlen(name);
    vl = value ? strlen(value) : 0;
    if (nl + vl + 3 >= sizeof(request.i.data))
    return -1;
    memcpy(request.i.data, name, nl);
    if (value) {
    request.i.data[nl] = ‘=’;
    memcpy(request.i.data + nl + 1, value, vl);
    }
    /
  • Open the fifo and write the command.
  • Make sure we don’t hang on opening /dev/initctl
    /
    memset(&sa, 0, sizeof(sa));
    sa.sa_handler = alrm_handler;
    sigaction(SIGALRM, &sa, NULL);
    got_alrm = 0;
    alarm(3);
    if ((fd = open(INIT_FIFO, O_WRONLY)) >= 0 &&
    write(fd, &request, sizeof(request)) == sizeof(request)) {
    close(fd);
    alarm(0);
    return 0;
    }
    fprintf(stderr, "shutdown: ");
    if (got_alrm) {
    fprintf(stderr, “timeout opening/writing control channel %s\n”,
    INIT_FIFO);
    } else {
    perror(INIT_FIFO);
    }
    return -1;
    }
    /
  • Tell everyone the system is going down in ‘mins’ minutes.
    /
    void warn(int mins)
    {
    char buf[MESSAGELEN + sizeof(newstate)];
    int len;
    buf[0] = 0;
    strncat(buf, message, sizeof(buf) - 1);
    len = strlen(buf);
    if (mins == 0)
    snprintf(buf + len, sizeof(buf) - len,
    “\rThe system is going down %s NOW!\r\n”,
    newstate);
    else
    snprintf(buf + len, sizeof(buf) - len,
    “\rThe system is going DOWN %s in %d minute%s!\r\n”,
    newstate, mins, mins == 1 ? “” : “s”);
    wall(buf, 1, 0);
    }
    /
  • Create the /etc/nologin file.
    */
    void donologin(int min)
    {
    FILE fp;
    time_t t;
    time(&t);
    t += 60 * min;
    if ((fp = fopen(NOLOGIN, “w”)) != NULL) {
    fprintf(fp, “\rThe system is going down on %s\r\n”, ctime(&t));
    if (message[0]) fputs(message, fp);
    fclose(fp);
    }
    }
    /
  • Spawn an external program.
    */
    int spawn(int noerr, char *prog, …)
    {
    va_list ap;
    pid_t pid, rc;
    int i;
    char *argv[8];
    i = 0;
    while ((pid = fork()) < 0 && i < 10) {
    perror(“fork”);
    sleep(5);
    i++;
    }
    if (pid < 0) return -1;
    if (pid > 0) {
    while((rc = wait(&i)) != pid)
    if (rc < 0 && errno == ECHILD)
    break;
    return (rc == pid) ? WEXITSTATUS(i) : -1;
    }
    if (noerr) fclose(stderr);
    argv[0] = prog;
    va_start(ap, prog);
    for (i = 1; i < 7 && (argv[i] = va_arg(ap, char )) != NULL; i++)
    ;
    argv[i] = NULL;
    va_end(ap);
    chdir("/");
    environ = clean_env;
    execvp(argv[0], argv);
    perror(argv[0]);
    exit(1);
    /NOTREACHED/
    return 0;
    }
    /
  • Kill all processes, call /etc/init.d/halt (if present)
    */
    void fastdown()
    {
    int do_halt = (down_level[0] == ‘0’);
    int i;
    #if 0
    char cmd[128];
    char script;
    /
  • Currently, the halt script is either init.d/halt OR rc.d/rc.0,
  • likewise for the reboot script. Test for the presence
  • of either.
    /
    if (do_halt) {
    if (access(HALTSCRIPT1, X_OK) == 0)
    script = HALTSCRIPT1;
    else
    script = HALTSCRIPT2;
    } else {
    if (access(REBOOTSCRIPT1, X_OK) == 0)
    script = REBOOTSCRIPT1;
    else
    script = REBOOTSCRIPT2;
    }
    #endif
    /
    First close all files. /
    for(i = 0; i < 3; i++)
    if (!isatty(i)) {
    close(i);
    open("/dev/null", O_RDWR);
    }
    for(i = 3; i < 20; i++) close(i);
    close(255);
    /
    First idle init. /
    if (kill(1, SIGTSTP) < 0) {
    fprintf(stderr, “shutdown: can’t idle init.\r\n”);
    exit(1);
    }
    /
    Kill all processes. /
    fprintf(stderr, “shutdown: sending all processes the TERM signal…\r\n”);
    kill(-1, SIGTERM);
    sleep(sltime ? atoi(sltime) : 3);
    fprintf(stderr, “shutdown: sending all processes the KILL signal.\r\n”);
    (void) kill(-1, SIGKILL);
    #if 0
    /
    See if we can run /etc/init.d/halt /
    if (access(script, X_OK) == 0) {
    spawn(1, cmd, “fast”, NULL);
    fprintf(stderr, "shutdown: %s returned - falling back "
    “on default routines\r\n”, script);
    }
    #endif
    /
    script failed or not present: do it ourself. /
    sleep(1); /
    Give init the chance to collect zombies. /
    /
    Record the fact that we’re going down /
    write_wtmp(“shutdown”, “", 0, RUN_LVL, "”);
    /
    This is for those who have quota installed. /
    spawn(1, “accton”, NULL);
    spawn(1, “quotaoff”, “-a”, NULL);
    sync();
    fprintf(stderr, “shutdown: turning off swap\r\n”);
    spawn(0, “swapoff”, “-a”, NULL);
    fprintf(stderr, “shutdown: unmounting all file systems\r\n”);
    spawn(0, “umount”, “-a”, NULL);
    /
    We’re done, halt or reboot now. /
    if (do_halt) {
    fprintf(stderr, "The system is halted. Press CTRL-ALT-DEL "
    “or turn off power\r\n”);
    init_reboot(BMAGIC_HALT);
    exit(0);
    }
    fprintf(stderr, “Please stand by while rebooting the system.\r\n”);
    init_reboot(BMAGIC_REBOOT);
    exit(0);
    }
    /
  • Go to runlevel 0, 1 or 6.
    */
    void shutdown(char *halttype)
    {
    char args[8];
    int argp = 0;
    int do_halt = (down_level[0] == ‘0’);
    /
    Warn for the last time /
    warn(0);
    if (dontshut) {
    hardsleep(1);
    stopit(0);
    }
    openlog(“shutdown”, LOG_PID, LOG_USER);
    if (do_halt)
    syslog(LOG_NOTICE, “shutting down for system halt”);
    else
    syslog(LOG_NOTICE, “shutting down for system reboot”);
    closelog();
    /
    See if we have to do it ourself. /
    if (doself) fastdown();
    /
    Create the arguments for init. */
    args[argp++] = INIT;
    if (sltime) {
    args[argp++] = “-t”;
    args[argp++] = sltime;
    }
    args[argp++] = down_level;
    args[argp] = (char )NULL;
    unlink(SDPID);
    unlink(NOLOGIN);
    /
    Now execute init to change runlevel. /
    sync();
    init_setenv(“INIT_HALT”, halttype);
    execv(INIT, args);
    /
    Oops - failed. /
    fprintf(stderr, “\rshutdown: cannot execute %s\r\n”, INIT);
    unlink(FASTBOOT);
    unlink(FORCEFSCK);
    init_setenv(“INIT_HALT”, NULL);
    openlog(“shutdown”, LOG_PID, LOG_USER);
    syslog(LOG_NOTICE, “shutdown failed”);
    closelog();
    exit(1);
    }
    /
  • returns if a warning is to be sent for wt
    /
    static int needwarning(int wt)
    {
    int ret;
    if (wt < 10)
    ret = 1;
    else if (wt < 60)
    ret = (wt % 15 == 0);
    else if (wt < 180)
    ret = (wt % 30 == 0);
    else
    ret = (wt % 60 == 0);
    return ret;
    }
    /
  • Main program.
  • Process the options and do the final countdown.
    */
    int main(int argc, char **argv)
    {
    FILE *fp;
    extern int getopt();
    extern int optind;
    struct sigaction sa;
    struct tm *lt;
    struct stat st;
    struct utmp *ut;
    time_t t;
    uid_t realuid;
    char *halttype;
    char downusers[32];
    char buf[128];
    char term[UT_LINESIZE + 6];
    char sp;
    char when = NULL;
    int c, i, wt;
    int hours, mins;
    int didnolog = 0;
    int cancel = 0;
    int useacl = 0;
    int pid = 0;
    int user_ok = 0;
    /
    We can be installed setuid root (executable for a special group) /
    realuid = getuid();
    setuid(geteuid());
    if (getuid() != 0) {
    fprintf(stderr, “shutdown: you must be root to do that!\n”);
    exit(1);
    }
    strcpy(down_level, “1”);
    halttype = NULL;
    /
    Process the options. /
    while((c = getopt(argc, argv, “HPacqkrhnfFyt:g:i:”)) != EOF) {
    switch© {
    case ‘H’:
    halttype = “HALT”;
    break;
    case ‘P’:
    halttype = “POWERDOWN”;
    break;
    case ‘a’: /
    Access control. /
    useacl = 1;
    break;
    case ‘c’: /
    Cancel an already running shutdown. /
    cancel = 1;
    break;
    case ‘k’: /
    Don’t really shutdown, only warn.
    /
    dontshut = 1;
    break;
    case ‘r’: /
    Automatic reboot /
    down_level[0] = ‘6’;
    break;
    case ‘h’: /
    Halt after shutdown /
    down_level[0] = ‘0’;
    break;
    case ‘f’: /
    Don’t perform fsck after next boot /
    fastboot = 1;
    break;
    case ‘F’: /
    Force fsck after next boot /
    forcefsck = 1;
    break;
    case ‘n’: /
    Don’t switch runlevels. /
    doself = 1;
    break;
    case ‘t’: /
    Delay between TERM and KILL /
    sltime = optarg;
    break;
    case ‘y’: /
    Ignored for sysV compatibility /
    break;
    case ‘g’: /
    sysv style to specify time. /
    when = optarg;
    break;
    case ‘i’: /
    Level to go to. /
    if (!strchr(“0156aAbBcCsS”, optarg[0])) {
    fprintf(stderr,
    “shutdown: `%s’: bad runlevel\n”,
    optarg);
    exit(1);
    }
    down_level[0] = optarg[0];
    break;
    default:
    usage();
    break;
    }
    }
    /
    Do we need to use the shutdown.allow file ? /
    if (useacl && (fp = fopen(SDALLOW, “r”)) != NULL) {
    /
    Read /etc/shutdown.allow. */
    i = 0;
    while(fgets(buf, 128, fp)) {
    if (buf[0] == ‘#’ || buf[0] == ‘\n’) continue;
    if (i > 31) continue;
    for(sp = buf; *sp; sp++) if (*sp == ‘\n’) sp = 0;
    downusers[i++] = strdup(buf);
    }
    if (i < 32) downusers[i] = 0;
    fclose(fp);
    /
    Now walk through /var/run/utmp to find logged in users. /
    while(!user_ok && (ut = getutent()) != NULL) {
    /
    See if this is a user process on a VC. */
    if (ut->ut_type != USER_PROCESS) continue;
    sprintf(term, "/dev/%.s", UT_LINESIZE, ut->ut_line);
    if (stat(term, &st) < 0) continue;
    #ifdef major /
    glibc /
    if (major(st.st_rdev) != 4 ||
    minor(st.st_rdev) > 63) continue;
    #else
    if ((st.st_rdev & 0xFFC0) != 0x0400) continue;
    #endif
    /
    Root is always OK. /
    if (strcmp(ut->ut_user, “root”) == 0) {
    user_ok++;
    break;
    }
    /
    See if this is an allowed user. /
    for(i = 0; i < 32 && downusers[i]; i++)
    if (!strncmp(downusers[i], ut->ut_user,
    UT_NAMESIZE)) {
    user_ok++;
    break;
    }
    }
    endutent();
    /
    See if user was allowed. /
    if (!user_ok) {
    if ((fp = fopen(CONSOLE, “w”)) != NULL) {
    fprintf(fp, "\rshutdown: no authorized users "
    “logged in.\r\n”);
    fclose(fp);
    }
    exit(1);
    }
    }
    /
    Read pid of running shutdown from a file /
    if ((fp = fopen(SDPID, “r”)) != NULL) {
    fscanf(fp, “%d”, &pid);
    fclose(fp);
    }
    /
    Read remaining words, skip time if needed. /
    message[0] = 0;
    for(c = optind + (!cancel && !when); c < argc; c++) {
    if (strlen(message) + strlen(argv[c]) + 4 > MESSAGELEN)
    break;
    strcat(message, argv[c]);
    strcat(message, " ");
    }
    if (message[0]) strcat(message, “\r\n”);
    /
    See if we want to run or cancel. /
    if (cancel) {
    if (pid <= 0) {
    fprintf(stderr, "shutdown: cannot find pid "
    “of running shutdown.\n”);
    exit(1);
    }
    init_setenv(“INIT_HALT”, NULL);
    if (kill(pid, SIGINT) < 0) {
    fprintf(stderr, “shutdown: not running.\n”);
    exit(1);
    }
    if (message[0]) wall(message, 1, 0);
    exit(0);
    }
    /
    Check syntax. /
    if (when == NULL) {
    if (optind == argc) usage();
    when = argv[optind++];
    }
    /
    See if we are already running. /
    if (pid > 0 && kill(pid, 0) == 0) {
    fprintf(stderr, “\rshutdown: already running.\r\n”);
    exit(1);
    }
    /
    Extra check. /
    if (doself && down_level[0] != ‘0’ && down_level[0] != ‘6’) {
    fprintf(stderr,
    “shutdown: can use “-n” for halt or reboot only.\r\n”);
    exit(1);
    }
    /
    Tell users what we’re gonna do. /
    switch(down_level[0]) {
    case ‘0’:
    strcpy(newstate, “for system halt”);
    break;
    case ‘6’:
    strcpy(newstate, “for reboot”);
    break;
    case ‘1’:
    strcpy(newstate, “to maintenance mode”);
    break;
    default:
    sprintf(newstate, “to runlevel %s”, down_level);
    break;
    }
    /
    Create a new PID file. /
    unlink(SDPID);
    umask(022);
    if ((fp = fopen(SDPID, “w”)) != NULL) {
    fprintf(fp, “%d\n”, getpid());
    fclose(fp);
    } else if (errno != EROFS)
    fprintf(stderr, “shutdown: warning: cannot open %s\n”, SDPID);
    /
  • Catch some common signals.
    /
    signal(SIGQUIT, SIG_IGN);
    signal(SIGCHLD, SIG_IGN);
    signal(SIGHUP, SIG_IGN);
    signal(SIGTSTP, SIG_IGN);
    signal(SIGTTIN, SIG_IGN);
    signal(SIGTTOU, SIG_IGN);
    memset(&sa, 0, sizeof(sa));
    sa.sa_handler = stopit;
    sigaction(SIGINT, &sa, NULL);
    /
    Go to the root directory /
    chdir("/");
    if (fastboot) close(open(FASTBOOT, O_CREAT | O_RDWR, 0644));
    if (forcefsck) close(open(FORCEFSCK, O_CREAT | O_RDWR, 0644));
    /
    Alias now and take care of old ‘+mins’ notation. /
    if (!strcmp(when, “now”)) strcpy(when, “0”);
    if (when[0] == ‘+’) when++;
    /
    Decode shutdown time. */
    for (sp = when; *sp; sp++) {
    if (sp != ‘:’ && (sp < ‘0’ || sp > ‘9’))
    usage();
    }
    if (strchr(when, ‘:’) == NULL) {
    /
    Time in minutes. /
    wt = atoi(when);
    if (wt == 0 && when[0] != ‘0’) usage();
    } else {
    /
    Time in hh:mm format. /
    if (sscanf(when, “%d:%2d”, &hours, &mins) != 2) usage();
    if (hours > 23 || mins > 59) usage();
    time(&t);
    lt = localtime(&t);
    wt = (60
    hours + mins) - (60
    lt->tm_hour + lt->tm_min);
    if (wt < 0) wt += 1440;
    }
    /
    Shutdown NOW if time == 0 /
    if (wt == 0) shutdown(halttype);
    /
    Give warnings on regular intervals and finally shutdown. /
    if (wt < 15 && !needwarning(wt)) warn(wt);
    while(wt) {
    if (wt <= 5 && !didnolog) {
    donologin(wt);
    didnolog++;
    }
    if (needwarning(wt)) warn(wt);
    hardsleep(60);
    wt–;
    }
    shutdown(halttype);
    return 0; /
    Never happens */
    }

你可能感兴趣的:(sysvinit源码分析 Linux-init-process-analyse)