5.5. Initialization OptionsBoth components built into the kernel and components loaded as modules can be passed input parameters so that users can fine-tune the functionality implemented by the components, override defaults compiled into them, or change them from one system boot to the next. The kernel provides two kinds of macros to define options : Module options (macros of the module_param family) These define options you can provide when you load a module. When a component is built into the kernel, you cannot provide values for these options at kernel boot time. However, with the introduction of the /sys filesystem, you can configure the options via those files at runtime. The /sys interface is relatively new, compared to the /proc interface. The later section "Module Options" goes into a little more detail on these options. Boot-time kernel options (macros of the _ _setup family) These define options you can provide at boot time with a boot loader. They are used mainly by modules that the user can build into the kernel, and kernel components that cannot be compiled as modules. You will see those macros in the section "Boot-Time Kernel Options" in Chapter 7. It is interesting to note that a module can define an initialization option in both ways: one is effective when the module is built-in and the other is effective when the module is loaded separately. This can be a little confusing, especially because different modules can define passing parameters of the same name at module load time without any risk of name collision (i.e., the parameters are passed just to the module being loaded), but if you pass those parameters at kernel boot time, you must make sure there is no name collision between the various modules' options. We will not go into detail on the pros and cons of the two approaches. You can look at the drivers/block/loop.c driver for a clear example using both module_param and _ _setup. |
Kernel modules define their parameters by means of macros such as module_param; see include/linux/moduleparam.h for a list. module_param requires three input parameters, as shown in the following example from drivers/net/sis900.c:
... module_param(multicast_filter_limit, int 0444); module_param(max_interrupt_work, int, 0444); module_param(debug, int, 0444); ...
The first input parameter is the name of the parameter to be offered to the user. The second is the type of the parameter (e.g., integer), and the third represents the permissions assigned to the file in /sys to which the parameter will be exported.
This is what you would get when listing the module's directory in /sys:
[root@localhost src]# ls -la /sys/module/sis900/parameters/ total 0 drwxr-xr-x 2 root root 0 Apr 9 18:31 . drwxr-xr-x 4 root root 0 Apr 9 18:31 .. -r--r--r-- 1 root root 0 Apr 9 18:31 debug -r--r--r-- 1 root root 4096 Apr 9 18:31 max_interrupt_work -r--r--r-- 1 root root 4096 Apr 9 18:31 multicast_filter_limit [root@localhost src]#
Each module is assigned a directory in /sys/modules. The subdirectory /sys/modules/module/parameters holds a file for each parameter exported by module. The previous snapshot from drivers/net/sis900.c shows three options that are readable by anyone, but not writable (they cannot be changed).
Permissions on /sys files (and on /proc files, incidentally) are defined using the same syntax as common files, so you can specify read, write, and execute permissions for the owner, the group, and everybody else. A value of 400 means, for example, read access for the owner (who is the root user) and no other access for anyone. When a value of 0 is assigned, no one has any permissions and you would not even see the file in /sys.
If the component programmer wants the user to be able to read the values of parameters, she must give at least read permission. She can also provide write permission to allow users to modify values. However, take into account that the module that exports the parameter is not notified about any change to the file, so the module must have a mechanism to detect the change or be able to cope with changes.
For a detailed description of the /sys interface, refer to Linux Device Drivers.
5.7. Initializing the Device Handling Layer: net_dev_initAn important part of initialization for the networking code, including Traffic Control and per-CPU ingress queues, is performed at boot time by net_dev_init, defined in net/core/dev.c: static int _ _init net_dev_init(void) { ... } subsys_initcall(net_dev_init); See Chapter 7 for how the subsys_initcall macros ensure that net_dev_init runs before any NIC device drivers register themselves, and why this is important. You also will see why net_dev_init is tagged with the _ _init macro. Let's walk through the main parts of net_dev_init:
Random number generation is a support function that the kernel performs to help randomize some of its own activity. You will see in this book that many networking subsystems use randomly generated values. For instance, they often add a random component to the delay of timers, making it less likely for timers to run simultaneously and load down the CPU with background processing. Randomization can also defend against a Denial of Service (DoS) attack by someone who tries to guess the organization of certain data structures. The degree to which the kernel's numbers can be considered truly random is called system entropy . It is improved through contributions by kernel components whose activity has a nondeterministic aspect, and networking often falls in this category. Currently, only a few NIC device drivers contribute to system entropy (see earlier discussion on SA_SAMPLE_RANDOM). A patch for kernel 2.4 adds a compile time option that you can use to enable or disable the contribution to system entropy by NICs. Search the Web using the keyword "SA_SAMPLE_NET_RANDOM," and you will find the current version. 5.7.1. Legacy CodeI mentioned in the previous section that the subsys_initcall macros ensure that net_dev_init is executed before any device driver has a chance to register its devices. Before the introduction of this mechanism, the order of execution used to be enforced differently, using the old-fashioned mechanism of a one-time flag. The global variable dev_boot_phase was used as a Boolean flag to remember whether net_dev_init had to be executed. It was initialized to 1 (i.e., net_dev_init had not been executed yet) and was cleared by net_dev_init. Each time register_netdevice was invoked by a device driver, it checked the value of dev_boot_phase and executed net_dev_init if the flag was set, indicating the function had not yet been executed. This mechanism is not needed anymore, because register_netdevice cannot be called before net_dev_init if the correct tagging is applied to key device drivers' routines, as described in Chapter 7. However, to detect wrong tagging or buggy code, net_dev_init still clears the value of dev_boot_phase, and register_netdevice uses the macro BUG_ON to make sure it is never called when dev_boot_phase is set.[*]
|
There are cases where it makes sense for the kernel to invoke a user-space application to handle events. Two such helpers are particularly important:
/sbin/modprobe
Invoked when the kernel needs to load a module. This helper is part of the module-init-tools package.
/sbin/hotplug
Invoked when the kernel detects that a new device has been plugged or unplugged from the system. Its main job is to load the correct device driver (module) based on the device identifier. Devices are identified by the bus they are plugged into (e.g., PCI) and the associated ID defined by the bus specification.[] This helper is part of the hotplug package.
[] See the section "Registering a PCI NIC Device Driver" in Chapter 6 for an example involving PCI.
The kernel provides a function named call_usermodehelper to execute such user-space helpers. The function allows the caller to pass the application a variable number of both arguments in arg[] and environment variables in env[]. For example, the first argument arg[0] tells call_usermodehelper what user-space helper to launch, and arg[1] can be used to tell the helper itself what configuration script to use (often called the user-space agent). We will see an example in the later section "/sbin/hotplug."
Figure 5-3 shows how two kernel routines, request_module and kobject_hotplug, invoke call_usermodehelper to invoke /sbin/modprobe and /sbin/hotplug, respectively. It also shows examples of how arg[] and envp[] are initialized in the two cases. The following subsections go into a little more detail on each of those two user-space helpers.
kmod is the kernel module loader that allows kernel components to request the loading of a module. The kernel provides more than one routine, but here we'll look only at request_module. This function initializes arg[1] with the name of the module to load. /sbin/modprobe uses the configuration file /etc/modprobe.conf to do various things, one of which is to see whether the module name received from the kernel is actually an alias to something else (see Figure 5-3).
Here are two examples of events that would lead the kernel to ask /sbin/modprobe to load a module:
When the administrator uses ifconfig to configure a network card whose device driver has not been loaded yetsay, for device eth0[*]the kernel sends a request to /sbin/modprobe to load the module whose name is the string "eth0". If /etc/prorobe.conf contains the entry "alias eth0 3c59x", /sbin/modprobe tries loading the module 3c59x.ko.
[*] Note that because the device driver has not been loaded yet, eth0 does not exist yet either.
When the administrator configures Traffic Control on a device with the IPROUTE2 package's tc command, it may refer to a queuing discipline or a classifier that is not in the kernel. In this case, the kernel sends /sbin/modprobe a request to load the relevant module.
For more details on modules and kmod, refer to Linux Device Drivers.
Hotplug was introduced into the Linux kernel to implement the popular consumer feature known as Plug and Play (PnP) . This feature allows the kernel to detect the insertion or removal of hot-pluggable devices and to notify a user-space application, giving the latter enough details to make it able to load the associated driver if needed, and to apply the associated configuration if one is present.
Hotplug can actually be used to take care of non-hot-pluggable devices as well, at boot time. The idea is that it does not matter whether a device was hot-plugged on a running system or if it was already plugged in at boot time; the user-space helper is notified in both cases. The user-space application decides whether the event requires any action on its part.
Linux systems, like most Unix systems, execute a set of scripts at boot time to initialize peripherals, including network devices. The syntax, names, and locations of these scripts change with different Linux distributions. (For example, distributions using the System V init model have a directory per run level in /etc/rc.d/, each one with its own configuration file indicating what to start. Other distributions are either based on the BSD model, or follow the BSD model in compatibility mode with System V.) Therefore, notifications for devices already present at boot time may be ignored because the scripts will eventually configure the associated devices.
When you compile the kernel modules, the object files are placed by default in the directory /lib/modules/kernel_version/, where kernel_version is, for instance, 2.6.12. In the same directory you can find two interesting files: modules.pcimap and modules.usbmap. These files contain, respectively, the PCI IDs[*] and USB IDs of the devices supported by the kernel. The same files include, for each device ID, a reference to the associated kernel module. When the user-space helper receives a notification about a hot-pluggable device being plugged, it uses these files to find out the correct device driver.
[*] The section "Example of PCI NIC Driver Registration" in Chapter 6 gives a brief description of a PCI device identifier.
The modules.xxxmap files are populated from ID vectors provided by device drivers. For example, you will see in the section "Example of PCI NIC Driver Registration" in Chapter 6 how the Vortex driver initializes its instance of pci_device_id. Because that driver is written for a PCI device, the contents of that table go into the modules.pcimap file.
If you are interested in the latest code, you can find more information at http://linux-hotplug.sourceforge.net.
The default user-space helper for Hotplug is the script[]/sbin/hotplug, part of the Hotplug package. This package can be configured with the files located in the default directories /etc/hotplug/ and /etc/hotplug.d/.
[] The administrator can write his own scripts or use the ones provided by the most common Linux distributions.
The kobject_hotplug function is invoked by the kernel to respond to the insertion and removal of a device, among other events. kobject_hotplug initializes arg[0] to /sbin/hotplug and arg[1] to the agent to be used: /sbin/hotplug is a simple script that delegates the processing of the event to another script (the agent) based on arg[1].
The user-space helper agents can be more or less complex based on how fancy you want the auto-configuration to be. The scripts provided with the Hotplug package try to recognize the Linux distribution and adapt the actions to their configuration file's syntax and location.
Let's take networking, the subject of this book, as an example of hotplugging. When an NIC is added to or removed from the system, kobject_hotplug initializes arg[1] to net, leading /sbin/hotplug to execute the net.agent agent.
Unlike the other agents shown in Figure 5-3, net.agent does not represent a medium or bus type. While the net agent is used to configure a device, other agents are used to load the correct modules (device drivers) based on the device identifiers.
net.agent is supposed to apply any configuration associated with the new device, so it needs the kernel to provide at least the device identifier. In the example shown in Figure 5-3, the device identifier is passed by the kernel through the INTERFACE environment variable.
To be able to configure a device, it must first be created and registered with the kernel. This task is normally driven by the associated device driver, which must therefore be loaded first. For instance, adding a PCMCIA Ethernet card causes several calls to /sbin/hotplug; among them:
One leading to the execution of /sbin/modprobe,[*] which will take care of loading the right module device driver. In the case of PCMCIA, the driver is loaded by the pci.agent agent (using the action ADD).
[*] Unlike /sbin/hotplug, which is a shell script, /sbin/modprobe is a binary executable file. If you want to give it a look, download the source code of the modutil package.
One configuring the new device. This is done by the net.agent agent (again using the action ADD).
A virtual device is an abstraction built on top of one or more real devices. The association between virtual devices and real devices can be many-to-many, as shown by the three models in Figure 5-4. It is also possible to build virtual devices on top of other virtual devices. However, not all combinations are meaningful or are supported by the kernel.
Linux allows you to define different kinds of virtual devices. Here are a few examples:
Bonding
With this feature, a virtual device bundles a group of physical devices and makes them behave as one.
802.1Q
This is an IEEE standard that extends the 802.3/Ethernet header with the so-called VLAN header, allowing for the creation of Virtual LANs.
Bridging
A bridge interface is a virtual representation of a bridge. Details are in Part IV.
Aliasing interfaces
Originally, the main purpose for this feature was to allow a single real Ethernet interface to span several virtual interfaces (eth0:0, eth0:1, etc.), each with its own IP configuration. Now, thanks to improvements to the networking code, there is no need to define a new virtual interface to configure multiple IP addresses on the same NIC. However, there may be cases (notably routing) where having different virtual NICs on the same NIC would make life easier, perhaps allowing simpler configuration. Details are in Chapter 30.
True equalizer (TEQL)
This is a queuing discipline that can be used with Traffic Control. Its implementation requires the creation of a special device. The idea behind TEQL is a bit similar to Bonding.
Tunnel interfaces
The implementation of IP-over-IP tunneling (IPIP) and the Generalized Routing Encapsulation (GRE) protocol is based on the creation of a virtual device.
This list is not complete. Also, given the speed with which new features are included into the Linux kernel, you can expect to see new virtual devices being added to the kernel.
Bonding, bridging, and 802.1Q devices are examples of the model in Figure 5-4(c). Aliasing interfaces are examples of the model in Figure 5-4(b). The model in Figure 5-4(a) can be seen as a special case of the other two.
Virtual devices and real devices interact with the kernel in slightly different ways. For example, they differ with regard to the following points:
Initialization
Most virtual devices are assigned a net_device data structure, as real devices are. Often, most of the virtual device's net_device's function pointers are initialized to routines implemented as wrappers, more or less complex, around the function pointers used by the associated real devices.
However, not all virtual devices are assigned a net_device instance. Aliasing devices are an example; they are implemented as simple labels on the associated real device (see the section "Old-generation configuration: aliasing interfaces" in Chapter 30).
Configuration
It is common to provide ad hoc user-space tools to configure virtual devices, especially for the high-level fields that apply only to those devices and which could not be configured using standard tools such as ifconfig.
External interface
Each virtual device usually exports a file, or a directory with a few files, to the /proc filesystem. How complex and detailed the information exported with those files is depends on the kind of virtual device and on the design. You will see the ones used by each virtual device listed in the section "Virtual Devices" in their associated chapters (for those devices covered in this book). Files associated with virtual devices are extra files; they do not replace the ones associated with the physical devices. Aliasing devices, which do not have their own net_device instances, are again an exception.
Transmission
When the relationship of virtual device to real device is not one-to-one, the routine used to transmit may need to include, among other tasks, the selection of the real device to use.[*] Because QoS is enforced on a per-device basis, the multiple relationships between virtual devices and associated real devices have implications for the Traffic Control configuration.
[*] See Chapter 11 for more details on packet transmission in general, and dev_queue_xmit in particular.
Reception
Because virtual devices are software objects, they do not need to engage in interactions with real resources on the system, such as registering an IRQ handler or allocating I/O ports and I/O memory. Their traffic comes secondhand from the physical devices that perform those tasks. Packet reception happens differently for different types of virtual devices. For instance, 802.1Q interfaces register an Ethertype and are passed only those packets received by the associated real devices that carry the right protocol ID.[] In contrast, bridge interfaces receive any packet that arrives from the associated devices (see Chapter 16).
[]Chapter 13 discusses the demultiplexing of ingress traffic based on the protocol identifier.
External notifications
Notifications from other kernel components about specific events taking place in the kernel[] are of interest as much to virtual devices as to real ones. Because virtual devices' logic is implemented on top of real devices, the latter have no knowledge about that logic and therefore are not able to pass on those notifications. For this reason, notifications need to go directly to the virtual devices. Let's use Bonding as an example: if one device in the bundle goes down, the algorithms used to distribute traffic among the bundle's members have to be made aware of that so that they do not select the devices that are no longer available.
[]Chapter 4 defines notification chains and explains what kind of notifications they can be used for.
Unlike these software-triggered notifications, hardware-triggered notifications (e.g., PCI power management) cannot reach virtual devices directly because there is no hardware associated with virtual devices.