This doc is mainly about the older initrd technique. Most distro use initramfs intead.
1) References
http://www.kernel.org/doc/Documentation/filesystems/ramfs-rootfs-initramfs.txt
(ramfs, rootfs and initramfs)
http://en.wikipedia.org/wiki/Initrd
(initrd and initramfs)
http://kernelnewbies.org/RootFileSystem
(How is Root File System Found?)
http://lxr.linux.no/#linux+v3.5.3/Documentation/early-userspace/README
http://lxr.linux.no/#linux+v3.5.3/Documentation/early-userspace/
(Kernel Doc: Early User Space)
http://free-electrons.com/services/rootfs-support/
(Embeded Linux Root Filesystem Development Support)
http://lwn.net/Articles/210046/
(Embedded Linux: Small Root File System)
http://tldp.org/HOWTO/Bootdisk-HOWTO/buildroot.html
(Building a Root File System)
http://www.oschina.net/question/158589_67643
http://superuser.com/questions/468603/doubts-about-the-linux-root-file-system
(Doubts About The Linux Root File System and Mount Command)
2) Basics
In my understanding, the term 'root file system' is ambiguous, it actually refers to two different things. One is the absolute minimal file system needed to accomplish system setup together with the kernel. It contains some early user space programs which perform hardware detection, module loading, device discovery and so on. This file system is often referred to as 'rootfs'. The other is the 'real' root file system. It may locates on a local disk or on a remote server (for systems that support boot from ethernet).
Here's a snippet from wike:
An image of this initial root file system (along with the kernel image) must be stored somewhere accessible by the Linux bootloader or the boot firmware of the computer. This can be:
The bootloader will load the kernel and initial root file system image into memory and then start the kernel, passing in the memory address of the image.
The initial root filesystem is known as the initial ramdisk because the filesystem lives in a disk image created by the kernel in RAM. In a desktop or server system, the initial ramdisk is used to load drivers and initialize an environment so that an external storage system (disk or network attached storage) can be mounted. The switch from the initial root filesystem to the real root filesystem is called a pivot. The pivot causes the real root filesystem to be mounted over the initial root filesystem. When that happens, a new init process from the real root filesystem is launched and takes over the process id of 1. At that point the initial ramdisk is no longer needed and the memory can be freed.
3) How to build rootfs?
4) How is the root file system found?
One of the important kernel boot parameters is "root=", which tells the kernel where to find the root filesystem. For instance,
root=/dev/hda1
This is commonly specified as what looks like a standard Unix pathname (as above). But standard Unix pathnames are interpreted according to currently-mounted filesystems. So how do you interpret the above root pathname, before you've even mounted any filesystems?
It took me a few hours to decipher the answer to this (the following applies at least as of the 2.6.11 kernel sources). First of all, at kernel initialization time, there is an absolutely minimal filesystem registered, called "rootfs". The code that implements this filesystem can be found in fs/ramfs/inode.c, which also happens to contain the code for the "ramfs" filesystem. rootfs is basically identical to ramfs, except for the specification of the MS_NOUSER flag; this is interpreted by the routine graft_tree in fs/namespace.c, and I think it prevents userland processes doing their own mounts of rootfs.
The routine init_mount_tree (found in fs/namespace.c) is called at system startup time to mount an instance of rootfs, and make it the root namespace of the current process (remember that, under Linux, different processes can have different filesystem namespaces). This routine is called at the end of mnt_init (also in fs/namespace.c), as part of the following sequence:
sysfs_init(); /* causes sysfs to register itself--this is needed later for actually finding the root device */ init_rootfs(); /* causes rootfs to register itself */ init_mount_tree(); /* actually creates the initial filesystem namespace, with rootfs mounted at "/" */
mnt_init is called from vfs_caches_init in fs/dcache.c, which in turn is called from start_kernel in init/main.c.
The actual interpretation of the root=path parameter is done in a routine called name_to_dev_t, found in init/do_mounts.c. This tries all the various syntaxes that are supported, one of which is the form "/dev/name", where name is interpreted by doing a temporary mount of the sysfs filesystem (at its usual place, /sys), and then looking for an entry under /sys/block/name (done in the subsidiary routine try_name in the same source file). name_to_dev_t is called from prepare_namespace, which in turn is called from init in init/main.c. This routine is spawned as the first process on the system (pid 1) by a call to kernel_thread in rest_init, which comes at the end of the abovementioned start_kernel.
start_kernel is the very last routine called in the boot sequence after the kernel gets control from the bootloader (in arch/i386/kernel/head.S for the i386 architecture). It never returns, because the very last thing it does after all the initialization is call cpu_idle, which runs an endless loop for soaking up CPU time as long as the CPU doesn't have anything else to do (like run a process or service an interrupt).
5) Other Details
I've explored several Linux Distros using 'mount' command. Some have 'rootfs on / type rootfs' in the result, while others don't. And all of them have something like '/dev/sdaX on / type extX'.
The following's an explanation.
Rootfs is never mounted explicitly, it just always exists. The 'mount' command may not show it, but if we examine the /proc/self/moutXXX interface, we'll see the result because this entry, as with other /proc entries, are generated by the kernel.
For example:
rootfs / rootfs rw 0 0
none /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
none /proc proc rw,nosuid,nodev,noexec,relatime 0 0
none /dev devtmpfs rw,relatime,size=505560k,nr_inodes=126390,mode=755 0 0
none /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
/dev/disk/by-uuid/95909ad3-0954-46b9-8c42-65216ac911b4 / ext4 rw,relatime,errors=remount-ro,barrier=1,data=ordered 0 0
none /sys/fs/fuse/connections fusectl rw,relatime 0 0
none /sys/kernel/debug debugfs rw,relatime 0 0
none /sys/kernel/security securityfs rw,relatime 0 0
none /dev/shm tmpfs rw,nosuid,nodev,relatime 0 0
none /var/run tmpfs rw,nosuid,relatime,mode=755 0 0
none /var/lock tmpfs rw,nosuid,nodev,noexec,relatime 0 0
none /lib/init/rw tmpfs rw,nosuid,relatime,mode=755 0 0
binfmt_misc /proc/sys/fs/binfmt_misc binfmt_misc rw,nosuid,nodev,noexec,relatime 0 0
gvfs-fuse-daemon /home/chenqi/.gvfs fuse.gvfs-fuse-daemon rw,nosuid,nodev,relatime,user_id=1000,group_id=1000 0 0
The documentation for SUSE Linux provides a good explanation of why Linux is booted with a RAMDisk:
As soon as the Linux kernel has been booted and the root file system (/) mounted, programs can be run and further kernel modules can be integrated to provide additional functions. To mount the root file system, certain conditions must be met. The kernel needs the corresponding drivers to access the device on which the root file system is located (especially SCSI drivers). The kernel must also contain the code needed to read the file system (ext2, reiserfs, romfs, etc.). It is also conceivable that the root file system is already encrypted. In this case, a password is needed to mount the file system.
For the problem of SCSI drivers, a number of different solutions are possible. The kernel could contain all imaginable drivers, but this might be a problem because different drivers could conflict with each other. Also, the kernel would become very large because of this. Another possibility is to provide different kernels, each one containing just one or a few SCSI drivers. This method has the problem that a large number of different kernels are required, a problem then increased by the differently optimized kernels (Athlon optimization, SMP). The idea of loading the SCSI driver as a module leads to the general problem resolved by the concept of an initial ramdisk: running user space programs even before the root file system is mounted.
This prevents a potential chicken-or-egg situation where the root file system cannot be loaded until the device on which it is located can be accessed, but that device can't be accessed until the root file system has been loaded:
The initial ramdisk (also called initdisk or initrd) solves precisely the problems described above. The Linux kernel provides an option of having a small file system loaded to a RAM disk and running programs there before the actual root file system is mounted. The loading of initrd is handled by the boot loader (GRUB, LILO, etc.). Boot loaders only need BIOS routines to load data from the boot medium. If the boot loader is able to load the kernel, it can also load the initial ramdisk. Special drivers are not required.
Of course, a RAMDisk is not strictly necessary for the boot process to take place. For example, you could compile a kernel that contained all necessary hardware drivers and modules to be loaded at startup. But apparently this is too much work for most people, and the RAMDisk proved to be a simpler, more scalable solution.