How to store and analyze Kernel Crash Dumps

Howto store and analyze Kernel Crash Dumps


1Configuring and Analyzing Kernel Crash Dumps

Didyou ever want to investigate that kernel crash on your target board but had not any log? You had to repeat it connecting trace port.Event the oops log and others got, but it was enough to fix it.Acrash dump would probably have helped you. Get to know how to make aLinux system for capturing kernel crash dumps. Even if you are nokernel hacker, that last dmesg output of the system can help youlocate the problem or even get it fixed by someone else.

2What are Kernel Crash Dumps?

Kernelcrash dumps are a possibility to investigate kernel problems, whichcan be used even by non-experts to collect all the availableinformation about the problem. This allows a later investigation ofthe issue by providing the crash dump to your Linux distributor or toa Linux kernel expert. Often it makes it unnecessary to reproducethe problem since all the necessary information is already containedin the crash dump. A crash dump is a complete memory image of thesystem at the time of the crash, comparable to a core dump of anuserspace program.


3How do Kernel Crash Dumps on Linux Work?

Onembedded Linux, the Uboot usually work as the crash kernel to dumpthe kernel directly. When a crash happened except the watchdog timeout, the function of crash_kexec will be called. In the function ofcrash_kexec will call into the specific software reset function ofmachine_crash_swreset.

Nowthe system will bring up by the software reset. In the Uboot phase,the reset reason will be checked. If it is a software reset triggeredby the kernel crash procedure, the main memory context will be storedinto an SD card or other no-volatile storage.


TheUboot runs in a fixed memory area which also is the usable systemmemory of the dump system. A so-called "memory hole"should be reserved by the kernel. Because the Uboot will corrupt theprevious kernel image.


Thebenefit of using Uboot working as crash kernel over other crash dumpsystems is the freshly booted new kernel which provides a stableenvironment. Other crash dump tools like lkcd, netdump or diskdumphave, depending on the kernel problem, not always worked reliably.If the kernel crash was caused by e.g. a network driver problem,there was a high possiblility that sending the crash dump via networkwas not possible anymore.


4Configuration Details

Toget the crash dump, some prerequisites have to be met.

4.1Linux Kernel Configuration

Themost important option is CONFIG_KEXEC=y.

4.2Boot Parameters

Thememory area to be reserved for the dump kernel depends on theprocessor architecture. It is con-figured with the kernel parametercrashkernel=size[@offset]. size denotes the size of the memory holewhich is later available for the kernel system. This size is nolonger available for the running Linux system. Offset sets thephysical address in main memory where the memory hole is located.The offset is determined by the Uboot. For ns115,crashkernel=1M@0x8fbf8000 is a usual value.


4.3Manual Triggering of a Dumps

Fordebugging or in the case of abnormal system behaviour triggering adump manually can be useful. The Non Maksable Interrupt (NMI) is amethod to do that. Almost all server class hardware has thepossibility to manually trigger a NMI, either via the remotemanagement or a button at the machine. This button is often labeled"debug". To have a NMI trigger the crash dump, the Linuxkernel needs to be configured. The relevant sysctl settingas arekernel.unknown_nmi_panic and kernel.panic_on_unrecovered_nmi, whichshould be set to 1. To test the kdump configuration, a kernel crashcan also be triggered from the running system via the shell:

#echo 1 > /proc/sys/kernel/sysrq

#echo c > /proc/sysrq-trigger


5Analyzing the Dump



Assumethe kernel crash dump file is cdump_0.elf , the corresponding vmlinuxfile named vmlinux, and the analyzing kernel crash dump tool of“crash” had installed. You just input the next command in theshell.

..$crash vmlinux cdump_0.elf

Thenthe next will be showed.


6.Commandsin crash

crashis mainly a wrapper around gdb. Due to this, most of the commands ofgdb can also be used in crash. Additionally, it provides variouscommands and macros which are specially tailored for Linux crashdumps. A selection of particularly useful commands follows:

6.1help

Maybethe most important command in crash is help. The help function ofcrash is very extensive and explains every command in detail.Without additional arguments, all crash commands are listed. helpfollowed by a command provides information about the command. Thecomplete help is displayed with the command help all.


6.2log

Thelog displays the log ringbuffer of the kernel. The result is similarto the command dmesg in the runnign system. Example (shortened)



6.3ps

psdisplays the process list, similar to the corresponding command inthe running system. Example:

> 1325 1 0 e68d36e0 RU 0.1 820 460 sh

Aprocess which was running at the time of the crash is marked with a>.


6.4sys config

usingthe command to check the used configure file

crash>sys config | head -10

#

#Automatically generated file; DO NOT EDIT.

#Linux/arm 3.4.0 Kernel Configuration

#

CONFIG_ARM=y

CONFIG_SYS_SUPPORTS_APM_EMULATION=y

CONFIG_GENERIC_GPIO=y

#CONFIG_ARCH_USES_GETTIMEOFFSET is not set

CONFIG_GENERIC_CLOCKEVENTS=y

CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y

6.5others

Thereare commands about variable, back trace, list, rd ,search and others.Please see the corresponding help.


7Summary

Crashdumps provide facilities to the sysadmin as well as the kernel hackerto gather useful information in the case of a system crash. Time andeffort to set them up is relatively low and is quickly compensatedfor by the possibility of convenient postmortem analysis.

你可能感兴趣的:(Crash)