The kdb Kernel Debugger

The kdb Kernel Debugger

Many readers may be wondering why the kernel does not have any more advanced debugging features built into it. The answer, quite simply, is that Linus does not believe in interactive debuggers. He fears that they lead to poor fixes, those which patch up symptoms rather than addressing the real cause of problems. Thus, no built-in debuggers. 许多读者可能想知道为什么内核没有内置任何更高级的调试功能。答案很简单,那就是Linus不相信交互式调试器。他担心它们会导致糟糕的修复,那些修补症状而不是解决问题的真正原因。因此,没有内置的调试器。

Other kernel developers, however, see an occasional use for interactive debugging tools. One such tool is the kdb built-in kernel debugger, available as a nonofficial patch from oss.sgi.com. To use kdb, you must obtain the patch (be sure to get a version that matches your kernel version), apply it, and rebuild and reinstall the kernel. Note that, as of this writing, kdb works only on IA-32 (x86) systems (though a version for the IA-64 existed for a while in the mainline kernel source before being removed). 然而,其他的内核开发者认为偶尔会用到交互式的调试工具。一个这样的工具是kdb内置的内核调试器,作为一个非官方的补丁可以从os.sgi.com获得。要使用kdb,你必须获得该补丁(要确保获得与你的内核版本相匹配的版本),应用它,并重建和重新安装内核。请注意,到目前为止,kdb只在IA-32(x86)系统上工作(尽管IA-64的版本在被删除之前曾在主线内核源中存在过一段时间)。

Once you are running a kdb-enabled kernel, there are a couple of ways to enter the debugger. Pressing the Pause (or Break) key on the console starts up the debugger. kdb also starts up when a kernel oops happens or when a breakpoint is hit. In any case, you see a message that looks something like this: 一旦你运行了一个支持kdb的内核,有几种方法可以进入调试器。按控制台的Pause(或Break)键可以启动调试器。当内核发生Oops或遇到断点时,kdb也会启动。在任何情况下,你都会看到一个类似这样的信息。

Entering kdb (0xc0347b80) on processor 0 due to Keyboard Entry

[0]kdb>

Note that just about everything the kernel does stops when kdb is running. Nothing else should be running on a system where you invoke kdb; in particular, you should not have networking turned on—unless, of course, you are debugging a network driver. It is generally a good idea to boot the system in single-user mode if you will be using kdb. 注意,当kdb运行时,内核所做的几乎所有事情都会停止。在你调用kdb的系统上不应该运行其他东西;特别是,你不应该打开网络,当然,你是在调试一个网络驱动程序。如果你要使用kdb,一般来说,以单用户模式启动系统是个好主意。

As an example, consider a quick scull debugging session. Assuming that the driver is already loaded, we can tell kdb to set a breakpoint in scull_read as follows: 作为一个例子,考虑一个快速的scull调试会话。假设驱动程序已经加载,我们可以告诉kdb在scull_read中设置一个断点,如下所示。

[0]kdb> bp scull_read

Instruction(i) BP #0 at 0xcd087c5dc (scull_read) is enabled globally adjust 1

[0]kdb> go

The bp command tells kdb to stop the next time the kernel enters scull_read. You then type go to continue execution. After putting something into one of the scull devices, we can attempt to read it by running cat under a shell on another terminal, yielding the following: bp命令告诉kdb在下次内核进入scull_read时停止。然后输入go来继续执行。在把东西放进一个scull设备后,我们可以通过在另一个终端的shell下运行cat来尝试读取它,结果如下。

Instruction(i) breakpoint #0 at 0xd087c5dc (adjusted)

0xd087c5dc scull_read:          int3

Entering kdb (current=0xcf09f890, pid 1575) on processor 0 due to Breakpoint @ 0xd087c5dc

[0]kdb>

We are now positioned at the beginning of scull_read. To see how we got there, we can get a stack trace: 我们现在位于scull_read的开头。为了了解我们是如何到达的,我们可以得到一个堆栈跟踪。

[0]kdb> bt

    ESP    EIP        Function (args)

0xcdbddf74 0xd087c5dc [scull]scull_read

0xcdbddf78 0xc0150718 vfs_read+0xb8

0xcdbddfa4 0xc01509c2 sys_read+0x42

0xcdbddfc4 0xc0103fcf syscall_call+0x7

[0]kdb>

kdb attempts to print out the arguments to every function in the call trace. It gets confused, however, by optimization tricks used by the compiler. Therefore, it fails to print the arguments to scull_read. kdb试图在调用跟踪中打印出每个函数的参数。然而,它被编译器使用的优化技巧所迷惑。因此,它未能打印出scull_read的参数。

Time to look at some data. The mds command manipulates data; we can query the value of the scull_devices pointer with a command such as: 是时候看看一些数据了。mds命令可以操作数据;我们可以用如下命令查询scull_devices指针的值

[0]kdb> mds scull_devices 1

0xd0880de8 cf36ac00    ....

Here we asked for one (4-byte) word of data starting at the location of scull_devices; the answer tells us that our device array is at the address 0xd0880de8; the first device structure itself is at 0xcf36ac00. To look at that device structure, we need to use that address:

[0]kdb> mds cf36ac00

0xcf36ac00 ce137dbc ....

0xcf36ac04 00000fa0 ....

0xcf36ac08 000003e8 ....

0xcf36ac0c 0000009b ....

0xcf36ac10 00000000 ....

0xcf36ac14 00000001 ....

0xcf36ac18 00000000 ....

0xcf36ac1c 00000001 ....

The eight lines here correspond to the beginning part of the scull_dev structure. Therefore, we see that the memory for the first device is allocated at 0xce137dbc, the quantum is 4000 (hex fa0), the quantum set size is 1000 (hex 3e8), and there are currently 155 (hex 9b) bytes stored in the device. 在这里,我们要求从scull_devices的位置开始获取一个(4字节)的数据;答案告诉我们,我们的设备阵列在地址0xd0880de8;第一个设备结构本身在0xcf36ac00。为了查看该设备结构,我们需要使用该地址。

kdb can change data as well. Suppose we wanted to trim some of the data from the device: kdb也可以改变数据。假设我们想修剪设备中的一些数据。

[0]kdb> mm cf26ac0c 0x50

0xcf26ac0c = 0x50

A subsequent cat on the device will now return less data than before. 设备上的后续cat现在将返回比以前更少的数据。

kdb has a number of other capabilities, including single-stepping (by instructions, not lines of C source code), setting breakpoints on data access, disassembling code, stepping through linked lists, accessing register data, and more. After you have applied the kdb patch, a full set of manual pages can be found in the Documentation/kdb directory in your kernel source tree. kdb还有其他一些功能,包括单步(按指令,而不是C源代码的行数),在数据访问上设置断点,反汇编代码,通过链接列表步进,访问寄存器数据,等等。在你应用了kdb补丁后,可以在内核源码树的Documentation/kdb目录下找到全套的手册页面。

你可能感兴趣的:(linux,kernel)