Howto store and analyze Kernel Crash Dumps
Didyou ever want to investigate that kernel crash on your target board but had not any log? You had to repeat it connecting trace port.Event the oops log and others got, but it was enough to fix it.Acrash dump would probably have helped you. Get to know how to make aLinux system for capturing kernel crash dumps. Even if you are nokernel hacker, that last dmesg output of the system can help youlocate the problem or even get it fixed by someone else.
Kernelcrash dumps are a possibility to investigate kernel problems, whichcan be used even by non-experts to collect all the availableinformation about the problem. This allows a later investigation ofthe issue by providing the crash dump to your Linux distributor or toa Linux kernel expert. Often it makes it unnecessary to reproducethe problem since all the necessary information is already containedin the crash dump. A crash dump is a complete memory image of thesystem at the time of the crash, comparable to a core dump of anuserspace program.
Onembedded Linux, the Uboot usually work as the crash kernel to dumpthe kernel directly. When a crash happened except the watchdog timeout, the function of crash_kexec will be called. In the function ofcrash_kexec will call into the specific software reset function ofmachine_crash_swreset.
Nowthe system will bring up by the software reset. In the Uboot phase,the reset reason will be checked. If it is a software reset triggeredby the kernel crash procedure, the main memory context will be storedinto an SD card or other no-volatile storage.
TheUboot runs in a fixed memory area which also is the usable systemmemory of the dump system. A so-called "memory hole"should be reserved by the kernel. Because the Uboot will corrupt theprevious kernel image.
Thebenefit of using Uboot working as crash kernel over other crash dumpsystems is the freshly booted new kernel which provides a stableenvironment. Other crash dump tools like lkcd, netdump or diskdumphave, depending on the kernel problem, not always worked reliably.If the kernel crash was caused by e.g. a network driver problem,there was a high possiblility that sending the crash dump via networkwas not possible anymore.
Toget the crash dump, some prerequisites have to be met.
Themost important option is CONFIG_KEXEC=y.
Thememory area to be reserved for the dump kernel depends on theprocessor architecture. It is con-figured with the kernel parametercrashkernel=size[@offset]. size denotes the size of the memory holewhich is later available for the kernel system. This size is nolonger available for the running Linux system. Offset sets thephysical address in main memory where the memory hole is located.The offset is determined by the Uboot. For ns115,crashkernel=1M@0x8fbf8000 is a usual value.
Fordebugging or in the case of abnormal system behaviour triggering adump manually can be useful. The Non Maksable Interrupt (NMI) is amethod to do that. Almost all server class hardware has thepossibility to manually trigger a NMI, either via the remotemanagement or a button at the machine. This button is often labeled"debug". To have a NMI trigger the crash dump, the Linuxkernel needs to be configured. The relevant sysctl settingas arekernel.unknown_nmi_panic and kernel.panic_on_unrecovered_nmi, whichshould be set to 1. To test the kdump configuration, a kernel crashcan also be triggered from the running system via the shell:
#echo 1 > /proc/sys/kernel/sysrq
#echo c > /proc/sysrq-trigger
Assumethe kernel crash dump file is cdump_0.elf , the corresponding vmlinuxfile named vmlinux, and the analyzing kernel crash dump tool of“crash” had installed. You just input the next command in theshell.
..$crash vmlinux cdump_0.elf
Thenthe next will be showed.
crashis mainly a wrapper around gdb. Due to this, most of the commands ofgdb can also be used in crash. Additionally, it provides variouscommands and macros which are specially tailored for Linux crashdumps. A selection of particularly useful commands follows:
Maybethe most important command in crash is help. The help function ofcrash is very extensive and explains every command in detail.Without additional arguments, all crash commands are listed. helpfollowed by a command provides information about the command. Thecomplete help is displayed with the command help all.
Thelog displays the log ringbuffer of the kernel. The result is similarto the command dmesg in the runnign system. Example (shortened)
psdisplays the process list, similar to the corresponding command inthe running system. Example:
> 1325 1 0 e68d36e0 RU 0.1 820 460 sh
Aprocess which was running at the time of the crash is marked with a>.
usingthe command to check the used configure file
crash>sys config | head -10
#
#Automatically generated file; DO NOT EDIT.
#Linux/arm 3.4.0 Kernel Configuration
#
CONFIG_ARM=y
CONFIG_SYS_SUPPORTS_APM_EMULATION=y
CONFIG_GENERIC_GPIO=y
#CONFIG_ARCH_USES_GETTIMEOFFSET is not set
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
Thereare commands about variable, back trace, list, rd ,search and others.Please see the corresponding help.
Crashdumps provide facilities to the sysadmin as well as the kernel hackerto gather useful information in the case of a system crash. Time andeffort to set them up is relatively low and is quickly compensatedfor by the possibility of convenient postmortem analysis.