第六章.使用
/proc
来输入6.1 TODO:写一章关于sysfs
这个现在仅是个预留位置。最终我想要看到一个(准备写的)章节关于sysfs来替代这里。如果你对于sysfs熟悉,并且想要加入到写着一个章节,那么请联系我们(LKMPG维护者)。
第七章.谈谈设备文件
7.1谈谈设备文件(writes and IOCTLs)
设备文件应该代表了物理设备。大多数物理设备用来输入和输出,所以就必须在内核中为设备驱动得到输出从进程发送到设备的的一些机制来实现。这个实现时通过为输出和写文件打开一个设备文件,仅仅像写到文件一样。在下面的例子中,通过
device_write
来实现。
这个通常是不够的,想象下,你有一个串口连接到了一个猫(调制解调器,modem)(即使你有一个内部猫,它从CPU的观点来看依旧是当作串口来实现连接到调制解调器上的,所以你没必要給你自己想太多)。最自然的做法就是使用设备文件来写东西到调制解调器上(要么是调制解调器命令,要么是数据通过电话线发过去)和从调制解调器上读信息(要么是命令的回复,要么就是通过电话线收到的信息)。然而,这个一节打开了一个问题关于当你需要与串口自己对话的时候要做什么,例如,发送和接受数据需要以什么发送速率。
在Unix中的回答是使用一个特殊的函数叫做ioctl
(Input Output ConTroL的缩写)。每一个设备能有自己的ioctl命令,这个命令被读ioctl
的东西(从一个进程到内核发送信息),写到ioctl
中(返回信息到进程),这个要么就是两个都做要么就两个都不做的事情。ioctl
函数被调用时有3个参数:
1. 对应设备文件的文件描述
2. ioctl number
3. 一个长整形,你能用它来传递东西(一般为指针指向结构)
ioctl number 为主要设备号,ioctl类型,命令和参数类型进行编码。这个ioctl number通常由在头文件的宏定义调用(_IO,_IOR
,_IOW
,或者_IOWR
–取决于类型)来创建。这个头文件应该被使用了ioctl的程序(那么他们能产生对应的ioctl'S
)和内核模块(让内核能明白程序干什么) 来包括(include进去)。在下面的例子中,头文件是chardev.h
,程序使用了头文件中代表的ioctl.c
。
如果你想在你的内核模块用ioctl'S
,最好接受一个官方分配的ioctl,这样假使你意外的得到了别人的ioctl'S
或者别人得到了你的ioctl'S
,你会知道发生了什么错误。想要更多信息,咨询内核源码树中的Documentation/ioctl-number.txt.
例子7-1.chardev.c
在新版本内核中ioctl会不识别
/*
* chardev.c - Create an input/output character device
*/
#include <linux/kernel.h> /* We're doing kernel work */
#include <linux/module.h> /* Specifically, a module */
#include <linux/fs.h>
#include <asm/uaccess.h> /* for get_user and put_user */
#include "chardev.h"
#define SUCCESS 0
#define DEVICE_NAME "char_dev"
#define BUF_LEN 80
/*
* Is the device open right now? Used to prevent
* concurent access into the same device
*/
static int Device_Open = 0;
/*
* The message the device will give when asked
*/
static char Message[BUF_LEN];
/*
* How far did the process reading the message get?
* Useful if the message is larger than the size of the
* buffer we get to fill in device_read.
*/
static char *Message_Ptr;
/*
* This is called whenever a process attempts to open the device file
*/
static int device_open(struct inode *inode, struct file *file)
{
#ifdef DEBUG
printk(KERN_INFO "device_open(%p)\n", file);
#endif
/*
* We don't want to talk to two processes at the same time
*/
if (Device_Open)
return -EBUSY;
Device_Open++;
/*
* Initialize the message
*/
Message_Ptr = Message;
try_module_get(THIS_MODULE);
return SUCCESS;
}
static int device_release(struct inode *inode, struct file *file)
{
#ifdef DEBUG
printk(KERN_INFO "device_release(%p,%p)\n", inode, file);
#endif
/*
* We're now ready for our next caller
*/
Device_Open--;
module_put(THIS_MODULE);
return SUCCESS;
}
/*
* This function is called whenever a process which has already opened the
* device file attempts to read from it.
*/
static ssize_t device_read(struct file *file, /* see include/linux/fs.h */
char __user * buffer, /* buffer to be
* filled with data */
size_t length, /* length of the buffer */
loff_t * offset)
{
/*
* Number of bytes actually written to the buffer
*/
int bytes_read = 0;
#ifdef DEBUG
printk(KERN_INFO "device_read(%p,%p,%d)\n", file, buffer, length);
#endif
/*
* If we're at the end of the message, return 0
* (which signifies end of file)
*/
if (*Message_Ptr == 0)
return 0;
/*
* Actually put the data into the buffer
*/
while (length && *Message_Ptr) {
/*
* Because the buffer is in the user data segment,
* not the kernel data segment, assignment wouldn't
* work. Instead, we have to use put_user which
* copies data from the kernel data segment to the
* user data segment.
*/
put_user(*(Message_Ptr++), buffer++);
length--;
bytes_read++;
}
#ifdef DEBUG
printk(KERN_INFO "Read %d bytes, %d left\n", bytes_read, length);
#endif
/*
* Read functions are supposed to return the number
* of bytes actually inserted into the buffer
*/
return bytes_read;
}
/*
* This function is called when somebody tries to
* write into our device file.
*/
static ssize_t
device_write(struct file *file,
const char __user * buffer, size_t length, loff_t * offset)
{
int i;
#ifdef DEBUG
printk(KERN_INFO "device_write(%p,%s,%d)", file, buffer, length);
#endif
for (i = 0; i < length && i < BUF_LEN; i++)
get_user(Message[i], buffer + i);
Message_Ptr = Message;
/*
* Again, return the number of input characters used
*/
return i;
}
/*
* This function is called whenever a process tries to do an ioctl on our
* device file. We get two extra parameters (additional to the inode and file
* structures, which all device functions get): the number of the ioctl called
* and the parameter given to the ioctl function.
*
* If the ioctl is write or read/write (meaning output is returned to the
* calling process), the ioctl call returns the output of this function.
*
*/
int device_ioctl(struct inode *inode, /* see include/linux/fs.h */
struct file *file, /* ditto */
unsigned int ioctl_num, /* number and param for ioctl */
unsigned long ioctl_param)
{
int i;
char *temp;
char ch;
/*
* Switch according to the ioctl called
*/
switch (ioctl_num) {
case IOCTL_SET_MSG:
/*
* Receive a pointer to a message (in user space) and set that
* to be the device's message. Get the parameter given to
* ioctl by the process.
*/
temp = (char *)ioctl_param;
/*
* Find the length of the message
*/
get_user(ch, temp);
for (i = 0; ch && i < BUF_LEN; i++, temp++)
get_user(ch, temp);
device_write(file, (char *)ioctl_param, i, 0);
break;
case IOCTL_GET_MSG:
/*
* Give the current message to the calling process -
* the parameter we got is a pointer, fill it.
*/
i = device_read(file, (char *)ioctl_param, 99, 0);
/*
* Put a zero at the end of the buffer, so it will be
* properly terminated
*/
put_user('\0', (char *)ioctl_param + i);
break;
case IOCTL_GET_NTH_BYTE:
/*
* This ioctl is both input (ioctl_param) and
* output (the return value of this function)
*/
return Message[ioctl_param];
break;
}
return SUCCESS;
}
/* Module Declarations */
/*
* This structure will hold the functions to be called
* when a process does something to the device we
* created. Since a pointer to this structure is kept in
* the devices table, it can't be local to
* init_module. NULL is for unimplemented functions.
*/
struct file_operations Fops = {
.read = device_read,
.write = device_write,
.ioctl = device_ioctl,
.open = device_open,
.release = device_release, /* a.k.a. close */
};
/*
* Initialize the module - Register the character device
*/
int init_module()
{
int ret_val;
/*
* Register the character device (atleast try)
*/
ret_val = register_chrdev(MAJOR_NUM, DEVICE_NAME, &Fops);
/*
* Negative values signify an error
*/
if (ret_val < 0) {
printk(KERN_ALERT "%s failed with %d\n",
"Sorry, registering the character device ", ret_val);
return ret_val;
}
printk(KERN_INFO "%s The major device number is %d.\n",
"Registeration is a success", MAJOR_NUM);
printk(KERN_INFO "If you want to talk to the device driver,\n");
printk(KERN_INFO "you'll have to create a device file. \n");
printk(KERN_INFO "We suggest you use:\n");
printk(KERN_INFO "mknod %s c %d 0\n", DEVICE_FILE_NAME, MAJOR_NUM);
printk(KERN_INFO "The device file name is important, because\n");
printk(KERN_INFO "the ioctl program assumes that's the\n");
printk(KERN_INFO "file you'll use.\n");
return 0;
}
/*
* Cleanup - unregister the appropriate file from /proc
*/
void cleanup_module()
{
int ret;
/*
* Unregister the device
*/
ret = unregister_chrdev(MAJOR_NUM, DEVICE_NAME);
/*
* If there's an error, report it
*/
if (ret < 0)
printk(KERN_ALERT "Error: unregister_chrdev: %d\n", ret);
}
例子7-2.chardev.h
/*
* chardev.h - the header file with the ioctl definitions.
*
* The declarations here have to be in a header file, because
* they need to be known both to the kernel module
* (in chardev.c) and the process calling ioctl (ioctl.c)
*/
#ifndef CHARDEV_H
#define CHARDEV_H
#include <linux/ioctl.h>
/*
* The major device number. We can't rely on dynamic
* registration any more, because ioctls need to know
* it.
*/
#define MAJOR_NUM 100
/*
* Set the message of the device driver
*/
#define IOCTL_SET_MSG _IOR(MAJOR_NUM, 0, char *)
/*
* _IOR means that we're creating an ioctl command
* number for passing information from a user process
* to the kernel module.
*
* The first arguments, MAJOR_NUM, is the major device
* number we're using.
*
* The second argument is the number of the command
* (there could be several with different meanings).
*
* The third argument is the type we want to get from
* the process to the kernel.
*/
/*
* Get the message of the device driver
*/
#define IOCTL_GET_MSG _IOR(MAJOR_NUM, 1, char *)
/*
* This IOCTL is used for output, to get the message
* of the device driver. However, we still need the
* buffer to place the message in to be input,
* as it is allocated by the process.
*/
/*
* Get the n'th byte of the message
*/
#define IOCTL_GET_NTH_BYTE _IOWR(MAJOR_NUM, 2, int)
/*
* The IOCTL is used for both input and output. It
* receives from the user a number, n, and returns
* Message[n].
*/
/*
* The name of the device file
*/
#define DEVICE_FILE_NAME "char_dev"
#endif
例子7-3.ictl.c
/*
* ioctl.c - the process to use ioctl's to control the kernel module
*
* Until now we could have used cat for input and output. But now
* we need to do ioctl's, which require writing our own process.
*/
/*
* device specifics, such as ioctl numbers and the
* major device file.
*/
#include "chardev.h"
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h> /* open */
#include <unistd.h> /* exit */
#include <sys/ioctl.h> /* ioctl */
/*
* Functions for the ioctl calls
*/
ioctl_set_msg(int file_desc, char *message)
{
int ret_val;
ret_val = ioctl(file_desc, IOCTL_SET_MSG, message);
if (ret_val < 0) {
printf("ioctl_set_msg failed:%d\n", ret_val);
exit(-1);
}
}
ioctl_get_msg(int file_desc)
{
int ret_val;
char message[100];
/*
* Warning - this is dangerous because we don't tell
* the kernel how far it's allowed to write, so it
* might overflow the buffer. In a real production
* program, we would have used two ioctls - one to tell
* the kernel the buffer length and another to give
* it the buffer to fill
*/
ret_val = ioctl(file_desc, IOCTL_GET_MSG, message);
if (ret_val < 0) {
printf("ioctl_get_msg failed:%d\n", ret_val);
exit(-1);
}
printf("get_msg message:%s\n", message);
}
ioctl_get_nth_byte(int file_desc)
{
int i;
char c;
printf("get_nth_byte message:");
i = 0;
do {
c = ioctl(file_desc, IOCTL_GET_NTH_BYTE, i++);
if (c < 0) {
printf
("ioctl_get_nth_byte failed at the %d'th byte:\n",
i);
exit(-1);
}
putchar(c);
} while (c != 0);
putchar('\n');
}
/*
* Main - Call the ioctl functions
*/
main()
{
int file_desc, ret_val;
char *msg = "Message passed by ioctl\n";
file_desc = open(DEVICE_FILE_NAME, 0);
if (file_desc < 0) {
printf("Can't open device file: %s\n", DEVICE_FILE_NAME);
exit(-1);
}
ioctl_get_nth_byte(file_desc);
ioctl_get_msg(file_desc);
ioctl_set_msg(file_desc, msg);
close(file_desc);
}
第八章 系统调用
8.1 系统调用
到目前为止,我们唯一做的事情就是使用已经定义好的内核机制来注册
/proc
文件和设备处理方法。如果你对内核程序按你所想做点什么已经够了,例如写一个设备驱动。但是如果你想要做点不一样的事情,像用某种方法改变系统的行为?那么,你可能得靠自己了。
这也就是内核编程有危险的地方。当写一个下面的例子的时候,我杀掉了open()
的系统调用。这个意味着我不能打开任何文件,我不能运行任何程序,并且我不能关掉计算机,不得不关电源。幸运的是,没有文件消亡。为了保证你不会丢失文件,请在insmod
和rmmod
之前运行sync
.
忘记/proc
文件,忘记设备文件。他们只是小细节。对于内核交流机制,(所有的进程用的),的真实过程是系统调用。当一个进程请求一个从内核的服务时候(例如打开一个文件,fork一个新的进程,或者请求更多的内存),这个就是现在的机制所使用的。如果你想要用有趣的方法改变内核的行为,下面将介绍。顺便的是,如果你想要看到程序使用的哪个系统调用,运行strace<arguments>
通常来说,一个进程不应该能进入内核。它不能进入内核内存并且它不能调用内核内核函数。CPU的硬件已经定死了(这就是称为保护模式的原因)。
系统调用对于这个规则是一个特例。进程用恰当的值然后调用一个特殊的指令来完成注册,这个特殊的指令会跳到一个提前在内核定义好的地方(当然,这个位置是可以被用户进程可读的,但是不可悲用户进程写操作)。在Intel cpu中,这个叫做0x80中断。硬件知道一旦你跳到了那个位置,你就不再运行在受限制的用户模式,而是作为操作系统的内核--因此你就能被允许做你想做的了。
在内核中的那个位置,使得一个进程能跳转,这个就被称为system_call
(系统调用).在那个位置的程序检查系统调用的号码,这个号码告诉内核这个进程请求了什么服务。然后,它会看系统调用的那个表(sys_call_table
)查内核函数的地址来调用。接下来,调用函数。在它返回之后,做一些系统检查在就返回到进程(或者说如果进程时间用完了就返回到别的进程)。如果你想要读这个代码,它的源文件在arch/$<$architecture$>$/kernel/entry.S
,在ENTRY(system_call)
这一行的后面。
所以,如果你想要改变某一个系统调用的工作的方式,我们需要做的就是写我们自己的函数来实现它(通常的是加入一点点我们自己的代码,然后调用原始的函数)然后改变指针由指向sys_call_table
到指向我们自己的函数。因为我们可能稍后就会被移除,且我们不想要把系统留在一个不稳定的状态,那么就非常有必要的在cleanup_module
的时候恢复这个table到它原来的状态。
这里的源码仅仅是个例子。我们想要'spy'
一个特定的用户,然后无论何时这个用户打开一个文件就printk()
一个消息。最后,我们用我们自己的函数取代系统调用打开一个文件,函数名叫our_sys_open
.这个函数检查当前进程的uid
(user’s id),然后如果这个进程等于我们监控的进程的uid,它就调用printk()
来显示打开的文件的名字。然后不管怎样,要用同样的参数调用原始的open()
函数,然后真实的打开文件。
init_module
函数替换了sys_call_table
中所占用的位置,然后使得原始的指针为变量。cleanup_module
函数使用那个变量来恢复所有的事情到正常状态。这个方法很危险,因为有可能两个模块改变了同一个系统调用。想想看,我们有两个内核模块,A和B。A,B打开系统调用操作分别是A_open
和B_open
。现在,当A插入到内核,系统调用被A_open
取代了,然后它做完后会调用原有sys_open
,接下来,B插入到内核,B_open
取代了系统调用,这个B_open
会调用会在它做完后调用它认为的原有的系统调用:A_open
。
现在,如果B首先被移除,那么所有的事情就安好---事情就简单的恢复到系统调用A_open然后再调用原有的系统调用。然而,如果A先被移除,B再被移除,那么系统就会死机。A的移除会恢复系统到原有的系统调用,sys_open
,使得B跳出循环。那么当B被移除了,它会修复系统调用到它认为的原始系统调用:A_open
,然而A_open
已经不在内存里面了。乍一看,好像我们能通过检查系统调用是否等于我们打开的函数然后如果没有改变(那么B在被移除的时候就不会改变系统调用)来解决这个问题,但是那就会导致一个更糟糕的问题。当A被移除了,它看到系统调用已经转变成B_open
了,那么那个就不会指向A_open
了,那么它就不会在a从内存移除的时候恢复到sys_open
。不幸的是,B_open
还是会尝试调用A_open
,及时A_open
不在了。那么到系统死机都不会移除B了。
注意到所有的这些问题都会使得偷用syscall
在产品用途上变的不可行。为了使人们远离这种潜在的危险,sys_call_table
不再提供出口了。这就意味着,如果你想要做的不仅仅是跑这个例子,你需要给你目前的内核打补丁以此来使得sys_call_table
变的可扩展。在例子的根目录下,你会找到一个README和补丁。你可以想象的是,这种修改不是一点点的小修改。不要尝试这个在一个有用的系统上(例如,你不拥有的系统或者不能简单恢复的系统)。你要得到这个导论的完整的源代码作为打包(tarball),为的就是得到补丁和README。而且取决于你内核的版本,你可能甚至需要手动应用这些补丁。
例子8-1.syscall.c
/*
* syscall.c
*
* System call "stealing" sample.
*/
/*
* Copyright (C) 2001 by Peter Jay Salzman
*/
/*
* The necessary header files
*/
/*
* Standard in kernel modules
*/
#include <linux/kernel.h> /* We're doing kernel work */
#include <linux/module.h> /* Specifically, a module, */
#include <linux/moduleparam.h> /* which will have params */
#include <linux/unistd.h> /* The list of system calls */
/*
* For the current (process) structure, we need
* this to know who the current user is.
*/
#include <linux/sched.h>
#include <asm/uaccess.h>
/*
* The system call table (a table of functions). We
* just define this as external, and the kernel will
* fill it up for us when we are insmod'ed
*
* sys_call_table is no longer exported in 2.6.x kernels.
* If you really want to try this DANGEROUS module you will
* have to apply the supplied patch against your current kernel
* and recompile it.
*/
extern void *sys_call_table[];
/*
* UID we want to spy on - will be filled from the
* command line
*/
static int uid;
module_param(uid, int, 0644);
/*
* A pointer to the original system call. The reason
* we keep this, rather than call the original function
* (sys_open), is because somebody else might have
* replaced the system call before us. Note that this
* is not 100% safe, because if another module
* replaced sys_open before us, then when we're inserted
* we'll call the function in that module - and it
* might be removed before we are.
*
* Another reason for this is that we can't get sys_open.
* It's a static variable, so it is not exported.
*/
asmlinkage int (*original_call) (const char *, int, int);
/*
* The function we'll replace sys_open (the function
* called when you call the open system call) with. To
* find the exact prototype, with the number and type
* of arguments, we find the original function first
* (it's at fs/open.c).
*
* In theory, this means that we're tied to the
* current version of the kernel. In practice, the
* system calls almost never change (it would wreck havoc
* and require programs to be recompiled, since the system
* calls are the interface between the kernel and the
* processes).
*/
asmlinkage int our_sys_open(const char *filename, int flags, int mode)
{
int i = 0;
char ch;
/*
* Check if this is the user we're spying on
*/
if (uid == current->uid) {
/*
* Report the file, if relevant
*/
printk("Opened file by %d: ", uid);
do {
get_user(ch, filename + i);
i++;
printk("%c", ch);
} while (ch != 0);
printk("\n");
}
/*
* Call the original sys_open - otherwise, we lose
* the ability to open files
*/
return original_call(filename, flags, mode);
}
/*
* Initialize the module - replace the system call
*/
int init_module()
{
/*
* Warning - too late for it now, but maybe for
* next time...
*/
printk(KERN_ALERT "I'm dangerous. I hope you did a ");
printk(KERN_ALERT "sync before you insmod'ed me.\n");
printk(KERN_ALERT "My counterpart, cleanup_module(), is even");
printk(KERN_ALERT "more dangerous. If\n");
printk(KERN_ALERT "you value your file system, it will ");
printk(KERN_ALERT "be \"sync; rmmod\" \n");
printk(KERN_ALERT "when you remove this module.\n");
/*
* Keep a pointer to the original function in
* original_call, and then replace the system call
* in the system call table with our_sys_open
*/
original_call = sys_call_table[__NR_open];
sys_call_table[__NR_open] = our_sys_open;
/*
* To get the address of the function for system
* call foo, go to sys_call_table[__NR_foo].
*/
printk(KERN_INFO "Spying on UID:%d\n", uid);
return 0;
}
/*
* Cleanup - unregister the appropriate file from /proc
*/
void cleanup_module()
{
/*
* Return the system call back to normal
*/
if (sys_call_table[__NR_open] != our_sys_open) {
printk(KERN_ALERT "Somebody else also played with the ");
printk(KERN_ALERT "open system call\n");
printk(KERN_ALERT "The system may be left in ");
printk(KERN_ALERT "an unstable state.\n");
}
sys_call_table[__NR_open] = original_call;
}