Chapter 4. Debugging Techniques

Chapter 4. Debugging Techniques

Kernel programming brings its own, unique debugging challenges. Kernel code cannot be easily executed under a debugger, nor can it be easily traced, because it is a set of functionalities not related to a specific process. Kernel code errors can also be exceedingly hard to reproduce and can bring down the entire system with them, thus destroying much of the evidence that could be used to track them down. 内核编程带来了自己独特的调试挑战。 内核代码在调试器下不容易执行,也不容易被跟踪,因为它是一组与特定进程无关的功能。 内核代码错误也非常难以重现,并且可能导致整个系统瘫痪,从而破坏了许多可用于追踪它们的证据。

This chapter introduces techniques you can use to monitor kernel code and trace errors under such trying circumstances. 本章介绍了可用于在这种尝试情况下监视内核代码和跟踪错误的技术。

Debugging Support in the Kernel

In Chapter 2, we recommended that you build and install your own kernel, rather than running the stock kernel that comes with your distribution. One of the strongest reasons for running your own kernel is that the kernel developers have built several debugging features into the kernel itself. These features can create extra output and slow performance, so they tend not to be enabled in production kernels from distributors. As a kernel developer, however, you have different priorities and will gladly accept the (minimal) overhead of the extra kernel debugging support. 在第 2 章中,我们建议您构建和安装自己的内核,而不是运行您的发行版附带的库存内核。 运行自己的内核的最重要原因之一是内核开发人员已经在内核本身中构建了几个调试功能。 这些功能会产生额外的输出并降低性能,因此它们往往不会在分销商的生产内核中启用。 然而,作为内核开发人员,您有不同的优先级,并且很乐意接受额外内核调试支持的(最小)开销。

Here, we list the configuration options that should be enabled for kernels used for development. Except where specified otherwise, all of these options are found under the "kernel hacking" menu in whatever kernel configuration tool you prefer. Note that some of these options are not supported by all architectures. 在这里,我们列出了应该为用于开发的内核启用的配置选项。 除非另有说明,所有这些选项都可以在您喜欢的任何内核配置工具的“内核黑客”菜单下找到。 请注意,并非所有体系结构都支持其中一些选项。

CONFIG_DEBUG_KERNEL

This option just makes other debugging options available; it should be turned on but does not, by itself, enable any features. 该选项只是使其他调试选项可用; 它应该被打开,但它本身并不启用任何功能。

CONFIG_DEBUG_SLAB

This crucial option turns on several types of checks in the kernel memory allocation functions; with these checks enabled, it is possible to detect a number of memory overrun and missing initialization errors. Each byte of allocated memory is set to 0xa5 before being handed to the caller and then set to 0x6b when it is freed. If you ever see either of those "poison" patterns repeating in output from your driver (or often in an oops listing), you'll know exactly what sort of error to look for. When debugging is enabled, the kernel also places special guard values before and after every allocated memory object; if those values ever get changed, the kernel knows that somebody has overrun a memory allocation, and it complains loudly. Various checks for more obscure errors are enabled as well. 这个关键选项在内核内存分配函数中打开了几种类型的检查; 启用这些检查后,可以检测到许多内存溢出和丢失的初始化错误。 分配内存的每个字节在交给调用者之前设置为 0xa5,然后在释放时设置为 0x6b。 如果您在驱动程序的输出中(或经常在 oops 列表中)看到这些“毒药”模式中的任何一个重复出现,您就会确切地知道要查找哪种错误。 启用调试后,内核还会在每个分配的内存对象之前和之后放置特殊的保护值; 如果这些值发生变化,内核就会知道有人超出了内存分配,并大声抱怨。 还启用了对更模糊错误的各种检查。

CONFIG_DEBUG_PAGEALLOC

Full pages are removed from the kernel address space when freed. This option can slow things down significantly, but it can also quickly point out certain kinds of memory corruption errors. 整个页面在释放时会从内核地址空间中删除。 此选项可以显着减慢速度,但它也可以快速指出某些类型的内存损坏错误。

CONFIG_DEBUG_SPINLOCK

With this option enabled, the kernel catches operations on uninitialized spinlocks and various other errors (such as unlocking a lock twice). 启用此选项后,内核会捕获未初始化的自旋锁和各种其他错误(例如两次解锁锁)的操作。

CONFIG_DEBUG_SPINLOCK_SLEEP

This option enables a check for attempts to sleep while holding a spinlock. In fact, it complains if you call a function that could potentially sleep, even if the call in question would not sleep. 此选项可以检查在持有自旋锁时是否尝试睡眠。 事实上,如果你调用一个可能休眠的函数,它会抱怨,即使有问题的调用不会休眠。

CONFIG_INIT_DEBUG

Items marked with _ _init (or _ _initdata) are discarded after system initialization or module load time. This option enables checks for code that attempts to access initialization-time memory after initialization is complete. 标有_ _init(或_ _initdata)的项目在系统初始化或模块加载时间后被丢弃。 此选项启用检查在初始化完成后尝试访问初始化时内存的代码。

CONFIG_DEBUG_INFO

This option causes the kernel to be built with full debugging information included. You'll need that information if you want to debug the kernel with gdb. You may also want to enable CONFIG_FRAME_POINTER if you plan to use gdb. 此选项会导致构建内核时包含完整的调试信息。 如果您想使用 gdb 调试内核,您将需要这些信息。 如果您打算使用 gdb,您可能还需要启用 CONFIG_FRAME_POINTER。

CONFIG_MAGIC_SYSRQ

Enables the "magic SysRq" key. We look at this key in Section 4.5.2 later in this chapter. 启用“magic SysRq”键。 我们将在本章后面的 4.5.2 节中查看这个键。

CONFIG_DEBUG_STACKOVERFLOW
CONFIG_DEBUG_STACK_USAGE

These options can help track down kernel stack overflows. A sure sign of a stack overflow is an oops listing without any sort of reasonable back trace. The first option adds explicit overflow checks to the kernel; the second causes the kernel to monitor stack usage and make some statistics available via the magic SysRq key. 这些选项可以帮助追踪内核堆栈溢出。 堆栈溢出的一个明确迹象是没有任何合理回溯的 oops 列表。 第一个选项向内核添加显式溢出检查; 第二个导致内核监控堆栈使用情况并通过神奇的 SysRq 键提供一些统计信息。

CONFIG_KALLSYMS

This option (under "General setup/Standard features") causes kernel symbol information to be built into the kernel; it is enabled by default. The symbol information is used in debugging contexts; without it, an oops listing can give you a kernel traceback only in hexadecimal, which is not very useful. 这个选项(在“General setup/Standard features”下)导致内核符号信息被内置到内核中; 默认情况下启用。 符号信息用于调试上下文; 没有它,oops 列表只能为您提供十六进制的内核回溯,这不是很有用。

CONFIG_IKCONFIG
CONFIG_IKCONFIG_PROC

These options (found in the "General setup" menu) cause the full kernel configuration state to be built into the kernel and to be made available via /proc. Most kernel developers know which configuration they used and do not need these options (which make the kernel bigger). They can be useful, though, if you are trying to debug a problem in a kernel built by somebody else. 这些选项(在“常规设置”菜单中找到)导致完整的内核配置状态被内置到内核中并通过 /proc 提供。 大多数内核开发人员都知道他们使用了哪种配置,并且不需要这些选项(这会使内核变得更大)。 但是,如果您尝试在其他人构建的内核中调试问题,它们可能很有用。

CONFIG_ACPI_DEBUG

Under "Power management/ACPI." This option turns on verbose ACPI (Advanced Configuration and Power Interface) debugging information, which can be useful if you suspect a problem related to ACPI. 在“电源管理/ACPI”下。 此选项打开详细的 ACPI(高级配置和电源接口)调试信息,如果您怀疑与 ACPI 相关的问题,这可能很有用。

CONFIG_DEBUG_DRIVER

Under "Device drivers." Turns on debugging information in the driver core, which can be useful for tracking down problems in the low-level support code. We'll look at the driver core in Chapter 14. 在“设备驱动程序”下。 打开驱动程序核心中的调试信息,这对于跟踪低级支持代码中的问题很有用。 我们将在第 14 章中介绍驱动核心。

CONFIG_SCSI_CONSTANTS

This option, found under "Device drivers/SCSI device support," builds in information for verbose SCSI error messages. If you are working on a SCSI driver, you probably want this option. 此选项位于“设备驱动程序/SCSI 设备支持”下,内置详细 SCSI 错误消息的信息。 如果您正在使用 SCSI 驱动程序,您可能需要此选项。

CONFIG_INPUT_EVBUG

This option (under "Device drivers/Input device support") turns on verbose logging of input events. If you are working on a driver for an input device, this option may be helpful. Be aware of the security implications of this option, however: it logs everything you type, including your passwords. 此选项(在“设备驱动程序/输入设备支持”下)打开输入事件的详细记录。 如果您正在开发输入设备的驱动程序,此选项可能会有所帮助。 但是请注意此选项的安全隐患:它会记录您键入的所有内容,包括您的密码。

CONFIG_PROFILING

This option is found under "Profiling support." Profiling is normally used for system performance tuning, but it can also be useful for tracking down some kernel hangs and related problems. 此选项位于“分析支持”下。 分析通常用于系统性能调整,但它也可用于跟踪一些内核挂起和相关问题。

We will revisit some of the above options as we look at various ways of tracking down kernel problems. But first, we will look at the classic debugging technique: print statements. 当我们查看跟踪内核问题的各种方法时,我们将重新审视上述一些选项。 但首先,我们将看看经典的调试技术:打印语句。

Debugging by Printing

The most common debugging technique is monitoring, which in applications programming is done by calling printf at suitable points. When you are debugging kernel code, you can accomplish the same goal with printk. 最常见的调试技术是监控,在应用程序编程中通过在适当的点调用 printf 来完成。 在调试内核代码时,您可以使用 printk 完成相同的目标。

你可能感兴趣的:(Linux,kernel,服务器)