Chapter 1 & 2: An introduction to device driver & Building and Running modules
1. Concurrent, Security -- module writting notes
2. Kernel stack is small, normally one page(4K). So don't create a lot of local variables and don't have a long call stack.
3. Kernel can't handle float pointing computing.
4. Compiling & Build:
obj-m := module.o
module-objs := file1.o file2.o
make -C ~/kernel-2.6 M=`pwd` modules
5. /proc/modules saves all modules load. lsmod uses this file as well.
Entries in /proc/modules contain the module name, the amount of memory each module occupies, and the usage count
6. Module codes will be linked with vermagic.o, this makes: when insmod, it checks the version in module codes and the version of current kernel, if mismatch, insmod failed.
7. Use EXPORT_SYMBOL/EXPORT_SYMBOL_GPL, modules can export symbols to other module using. E.g: lp is the module we use, and the parport/parport_pc is the underlying modules. So, module symbol export can be used to define interfaces.
8. modprobe just search /lib/modules folder to find drivers, and insmod can be used to insert a module into kernel in the specified driver file path.
9. __init & __exit mocros. These are used to decorate the init & exit functions. __init tells kernel this is an init function and only be called when the module is loaded. After module loaded, the function will be dropped in kernel to save memory. __exit tells kernel this is the function which can ONLY be called while module unloading. So, IF YOU WILL CALL YOUR CLEANUP FUNCTIONS IN INIT FUNCTION, DON'T ADD __exit PREFIX IN THESE CLEANUP FUNCTION, BECAUSE THE FUNCTION PREFIXED WITH __exit CAN ONLY BE CALLED WHEN THE MODULE UNLOADING.
10. Module param: insmod xxx a=1 b=2
module_param(howmany, int, S_IRUGO);
module_param_array(name,type,num,perm); // array param is allowed
The permission param in module_param/module_param_array is used to tell kernel whether should create an entry under /sys/module, and if should, the permission of this entry(file). After that, changes to this entry can be got in module by module param. BUT THE MODULE WILL BE KNOWN ABOUT THIS AUTOMATICALLY, we should handle the module param update ourselves.
==================================================================================================
Chapter 3: Char drivers
1. major number -- related with one perticular driver. minor number -- kernel doesn't care about it, it means a lot for drivers. E.g: if a driver implements multiple device files, it can distinguish them by minor number.
2. MKDEV macro to generate a dev_t structure. MAJOR & MINOR macros can be used to check out the major & minor numbers in a dev_t structure.
3. alloc_chrdev_region -- request a major number from kernel and register it.
register_chrdev_region -- regist a major number manually
unregist_chrdev_region -- free the major number
/proc/devices -- list devices and their major numbers. You can check this file to get a major number free to use
4. Every file has a file structure in kernel(named "struct file"). There is a member named f_op in this structure which is the type of file_operations. Driver should implement the function pointers in f_op. When the open system call called, the open function in kernel will be called first, then the function will call driver's open function pointer.
5. Lots of function pointers in f_op. Refer to Page 70 for details. Sample code:
struct file_operations scull_fops = {
.owner = THIS_MODULE,
.llseek = scull_llseek,
.read = scull_read,
.write = scull_write,
.ioctl = scull_ioctl,
.open = scull_open,
.release = scull_release,
};
6. inode. inode represents a file while "struct file" represents a FD. So a file can be opened multiple times which generates a lot of "struct file"s, but inode only has one for one file.
7. cdev_alloc/cdev_init/cdev_add -- register the char device driver.
8. kmalloc/kfree. kmalloc is not the good way to malloc a large area of memory.
9. Why the user space pointer should NOT be dereferenced directly? Here are the reasons:
(1) Depending on which architecture your driver is running on, and how the kernel was configured, the user-space pointer may not be valid while running in kernel mode at all. There may be no mapping for that address, or it could point to some other, random data.
(2) Even if the pointer does mean the same thing in kernel space, user-space mem-ory is paged, and the memory in question might not be resident in RAM when the system call is made. Attempting to reference the user-space memory directly could generate a page fault, which is something that kernel code is not allowedto do. The result would be an “oops,” which would result in the death of the process that made the system call.
(3) The pointer in question has been supplied by a user program, which could be buggy or malicious. If your driver ever blindly dereferences a user-supplied pointer, it provides an open doorway allowing a user-space program to access or overwrite memory anywhere in the system. If you do not wish to be responsible for compromising the security of your users’ systems, you cannot ever derefer-ence a user-space pointer directly.
10. Use copy_to_user & copy_from_user to access user space pointers. These functions act like memcpy with differences: (1) The memory pointed by user space pointer may be in swapped in disk right now, so these "memcpy" like functions will execute longer, so MAKE SURE your driver code is thread/process safe. (2) These functions check whether the user space pointer is valid.
11. use "strace" to trace the system call -- params and return values.
==================================================================================================
Chapter 4: Debugging Techniques
1. When config kernel, there is a "kernel hacking" menu in it. All entries in it are about kernel debugging. E.g: magic sysrq function/debug slab(alloced memory will be set to 0xa5, after kfree, it will be set to 0xb6)/INIT debug... and etc.
2. Print debugging skills:
(1) printk将信息打印到当前的console上,这个console可以是一个文本的terminal,也可以是一个串口或并口打印机。要求是当前message的priority比console_loglevel小。记住:printk的message结尾要有一个\n,否则消息不会被打印出来。
(2) 如果klogd和syslogd都在运行的话,printk的信息就会被存储到/var/log/messages中。这和console_loglevel无关,所有的消息都会被记录到文件中。
(3) 如果klogd没有运行,那么,通过读取/proc/kmsg也可以看到信息,dmesg命令也会打印这些信息。
3. 如何修改console_loglevel?有很多办法:
(1) 调用sys_syslog系统调用。
(2) 启动klogd的时候加上-c option。
(3) 用程序修改。这里书中没给出程序,可以到Oreilly的FTP上下载。
(4) 修改文件/proc/sys/kernel/printk。这个文件有四项内容,看下面一段的描述。可以直接给这个文件设置一个value,这就表示修改current loglevel。
4. How the printk texts can be displayed in a lot of places:
The printk function writes messages into a circular buffer that is__LOG_BUF_LEN bytes long: a value from 4 KB to 1 MB chosen while configuring the kernel. The function then wakes any process that is waiting for messages, that is, any process that is sleep-ing in the syslog system call or that is reading /proc/kmsg.
5. About klogd & syslogd:
如果klogd启动,那么klogd将message传递给syslogd,syslogd负责将信息记录进入文件。facility是LOG_KERN,priority和printk中的对应。
如果klogd没有启动,那么自然message就不会被读取,除非用dmesg或是自己读取/proc/kmsg
如果不喜欢syslogd,那么可以配置klogd让他将信息直接写入文件。或者干脆klogd也不启动,象上面所说,将信息发送到另外一个专门的console上,或者单独开一个终端,读取/proc/kmsg即可。
6. Driver can implement it's own /proc or /sys files to expose a lot of driver internals to the user space. It's better than use printk heavily.
7. read_proc/create_proc_read_entry can be used to create & response read proc files. But this read_proc just permit us to return ONE PAGE data.
8. To avoid the ONE PAGE issue in read_proc, seq_file interface comes up. Four function pointers should be implemented:
start/stop/next/show
Use create_proc_entry to create the proc file and associate with the four functions listed above.
Refer to Page 109 for more details.
9. IOCTL is another way to debugging. No page limit. More generic than read_proc & seq_file interface.
10. Oops message reading. Refer to "Debug Hacks" book.
11. Magic SysRq key. Refer to "Debug Hacks" book.
12. Use GDB debug kernel: can ONLY check variables, CAN'T set breakpoints, change variables...
CONFIG_DEBUG_INFO selected when config kernel
gdb /usr/src/linux/vmlinux /proc/kcore -- vmlinux is not stripped/compressed linux kernel file. /proc/kcore like other proc entries, it outputs data when you read it automatically.
How to debug loadable module: (1) load the module (2) check the /sys/module/<module name>/sections to check out the section address(.text, .bss, .data...) (3) Then use gdb "add-symbol-file" command: add-symbol-file .../scull.ko 0xd0832000 -s .bss 0xd0837100 -s .data 0xd0836be0
13. kdb并不是官方的kernel builtin debugger。是oss.sgi.com提供的,而且现在只支持IA-32。这就没什么用了。ARM都不支持就比较麻烦了。
14. 有两个对gdb的patch,他们都叫kgdb。patch之后的gdb就能debug kernel,能设置断点,修改数据。工作模式也和平常的不一样,一般是被debug的独立在一台机器上,开发人员在另外一台机器上,两台机器用串口线互联,然后远程debug。
==================================================================================================
Chapter 5: Concurrency and Race Conditions
1. sema_init -- semaphone init. Semaphone has P, V functions.
2. DECLARE_MUTEX/DECLARE_MUTEX_LOCKED -- Declare a static mutex(actually it is a semaphone with value 1).
3. init_mutex/init_mutex_locked -- allocate a mutex at runtime
4. down/down_interruptible/down_trylock -- down_interruptible can be cancalled, so check the return value of the function to see whether we got this semaphone or the down operation was cancalled.
5. up - the semaphone V function
6. init_rwsem/down_read/down_read_trylock/up_read/down_write/down_write_trylock/up_write -- init a read/write semaphone. Read/write semaphone allows multiple read operation at one time, but if has write operation, all read/write trying will be blocked. Like windows' SRWLock. It can improve performances.
7. Completion -- like pthread condition. init_completion/DECLARE_COMPLETION/wait_for_completion/complete/complete_all
8. Spinlock - not waiting but always trying with a light loop. spin_lock_init/spin_lock/spin_unlock
一旦kernel code得到了一个spinlock,此时preemption就会被disable。原因就是前面介绍的,如果此时取得了该spinlock的线程被抢占的话,那么有可能其他等待该spinlock的线程就会长期等待,甚至一直等待,如果被抢占的线程一直得不到执行的话。和semaphore不一样,如果一个线程得不到semaphore的话,该线程就sleep了。而当一个线程得不到spinlock的时候,该线程会tight loop轮询,这会消耗很多CPU,同时也带来了一种可能:得到spinlock的线程得到CPU的机会大大降低了。
所以kernel要求使用spinlock的线程在得到spinlock之后不能执行任何可能会sleep的操作。但是sleep的操作非常的多,比如:从userspace读取内容或是写内容到userspace。userspace的这块内存可能被swap到了磁盘上,那么一读取磁盘线程就可能sleep直到disk I/O结束;kmalloc分配内存也可能导致线程sleep,因为当内存不够时,kmalloc会等待。
所以,当得到了spinlock之后,之后的代码书写要非常的小心。
又一种常见的锁死场景:我们的线程得到了一个spinlock,开始执行,同时中断产生。中断处理代码中也要取得这个spinlock,但是该lock不能用,于是spin。这样我们的线程就得不到执行,spinlock得不到释放,于是锁死。于是,使用spinlock的第二个注意点来了:在取得spinlock之后,disable中断。如果中断要使用和我们一样的spinlock的话。
第三个注意点:在取得spinlock之后,代码的执行时间要尽可能的短,然后就释放。
9. Atomic operations. atomic_t type/atomic_add/atomic_inc/atomic_dec...
10. Read-Copy-Update operation.
==================================================================================================
Chapter 6: Advanced Char Driver Operations
1.