kdump是在系统崩溃、死锁或者死机的时候用来转储内存运行参数的一个工具和服务。
在linux系统软件开发的过程中,会遇到开发完的程序导致kernel崩溃,此时我们想要查找崩溃留下的蛛丝马迹来定位bug,但是此时kernel已经崩溃了,平时定位bug的命令和工具都不能工作了,唯有断电重启,此时可以采用kdump工具,定位kenel崩溃。
kdump是一种先进的基于 kexec 的内核崩溃内存转储机制,该机制在内存中留有crashKernel,当我们运行的内核崩溃时,kexec会切换到crashKernel继续运行,并使用kdump工具来保存内存堆栈,来达到内存转储的目的。
kdump用于捕获linux内核的崩溃,通过一种机制将崩溃的内存堆栈信息在内核崩溃后保存到制定目录中。
kexec是一个 Linux 内核到内核的引导加载程序,可以帮助从第一个内核的上下文引导到第二个内核。
kdump分为第一内核与第二内核,第一内核即我们正在使用的kernel,用于工作开发生产,当第一kernel崩溃后,kexec工具自动切换到第二内核,即crashKernel。kdump工具在开机启动的grub中为第二内核留有一片内存,在该内存中存放第二kernel镜像即initrd,安装kdump后的grub片段为
menuentry 'Ubuntu' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-4c54bbf8-7fd6-482e-9939-aa98847d61d1' {
recordfail
load_video
gfxmode $linux_gfx_mode
insmod gzio
insmod part_msdos
insmod ext2
set root='hd0,msdos1'
if [ x$feature_platform_search_hint = xy ]; then
search --no-floppy --fs-uuid --set=root --hint-bios=hd0,msdos1 --hint-efi=hd0,msdos1 --hint-baremetal=ahci0,msdos1 4c54bbf8-7fd6-482e-9939-aa98847d61d1
else
search --no-floppy --fs-uuid --set=root 4c54bbf8-7fd6-482e-9939-aa98847d61d1
fi
linux /boot/vmlinuz-4.2.0-27-generic root=UUID=4c54bbf8-7fd6-482e-9939-aa98847d61d1 ro quiet splash crashkernel=384M-:128M crashkernel=384M-:128M $vt_handoff
initrd /boot/initrd.img-4.2.0-27-generic
}
安装环境:ubuntu-14.04.4-desktop-amd64
安装方法:apt-get install linux-crashdump
下载dbgsym安装包:http://ddebs.ubuntu.com/pool/main/l/linux/ 需要下载与第一内核相同的版本
下载后使用dpkg -i packet.ddeb安装
安装结束,使用以下命令查看是否安装成功:
root@sis-YangTianT4900d-11:~# kdump-config show
DUMP_MODE: kdump
USE_KDUMP: 1
KDUMP_SYSCTL: kernel.panic_on_oops=1
KDUMP_COREDIR: /var/crash
crashkernel addr: 0x2d000000
/var/lib/kdump/vmlinuz: symbolic link to `/boot/vmlinuz-4.2.0-27-generic'
kdump initrd:
/var/lib/kdump/initrd.img: symbolic link to `/var/lib/kdump/initrd.img-4.2.0-27-generic'
current state: ready to kdump
kexec command:
/sbin/kexec -p --command-line="BOOT_IMAGE=/boot/vmlinuz-4.2.0-27-generic root=UUID=4c54bbf8-7fd6-482e-9939-aa98847d61d1 ro quiet splash vt.handoff=7 irqpoll maxcpus=1 nousb systemd.unit=kdump-tools.service" --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz
使用命令#cat /proc/cmdline 查看启动信息
root@sis-YangTianT4900d-11:~# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.2.0-27-generic root=UUID=4c54bbf8-7fd6-482e-9939-aa98847d61d1 ro quiet splash crashkernel=384M-:128M crashkernel=384M-:128M vt.handoff=7
(
crashkernel=384M-2G:64M,2G-:128M
The above value means:
1).if the RAM is smaller than 384M, then don't reserve anything (this is the "rescue" case)
2).if the RAM size is between 386M and 2G (exclusive), then reserve 64M
3).if the RAM size is larger than 2G, then reserve 128M
)
使用linux自带的可以崩溃的中断,触发kdump
echo c > /proc/sysrq-trigger
此时linux崩溃,并自动重启;重启后查看/var/crash
root@sis-YangTianT4900d-11:/var/crash# tree
.
├── 201910211056
│ ├── dmesg.201910211056
│ └── dump.201910211056
└── kexec_cmd
1 directory, 3 files
可以看到以时间命令的文件夹下存在两个文件dmesg.*和dump.*;使用crash指令解析dump文件
root@sis-YangTianT4900d-11:/var/crash# crash /usr/lib/debug/boot/vmlinux-4.2.0-27-generic 201910211056/dump.201910211056
crash 7.0.3
Copyright (C) 2002-2013 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...
KERNEL: /usr/lib/debug/boot/vmlinux-4.2.0-27-generic
DUMPFILE: 201910211056/dump.201910211056 [PARTIAL DUMP]
CPUS: 4
DATE: Thu Jan 1 08:00:00 1970
UPTIME: 00:03:10
LOAD AVERAGE: 0.08, 0.13, 0.06
TASKS: 360
NODENAME: sis-YangTianT4900d-11
RELEASE: 4.2.0-27-generic
VERSION: #32~14.04.1-Ubuntu SMP Fri Jan 22 15:32:26 UTC 2016
MACHINE: x86_64 (2998 Mhz)
MEMORY: 7.9 GB
PANIC: "Oops: 0002 [#1] SMP " (check log for details)
PID: 2016
COMMAND: "bash"
TASK: ffff8802bb518000 [THREAD_INFO: ffff8802d3c00000]
CPU: 1
STATE: TASK_RUNNING (PANIC)
crash>
此时进入到crash命令终端
crash命令说明:命令的第一个参数指定的内核镜像是通过dbgsym安装包生成的,该镜像应该与当前第一内核版本一致;第二参数为dump崩溃日志
使用crash -h查看使用帮助
常用指令如下:
crash> bt
PID: 2016 TASK: ffff8802bb518000 CPU: 1 COMMAND: "bash"
#0 [ffff8802d3c03ab0] machine_kexec at ffffffff81055832
#1 [ffff8802d3c03b00] crash_kexec at ffffffff810ffc43
#2 [ffff8802d3c03bd0] oops_end at ffffffff81017d8d
#3 [ffff8802d3c03c00] no_context at ffffffff81063a8d
#4 [ffff8802d3c03c70] __bad_area_nosemaphore at ffffffff81063e09
#5 [ffff8802d3c03cc0] bad_area_nosemaphore at ffffffff81063f23
#6 [ffff8802d3c03cd0] __do_page_fault at ffffffff81064182
#7 [ffff8802d3c03d40] do_page_fault at ffffffff81064542
#8 [ffff8802d3c03d60] page_fault at ffffffff817be308
[exception RIP: sysrq_handle_crash+22]
RIP: ffffffff814a0006 RSP: ffff8802d3c03e18 RFLAGS: 00010292
RAX: 000000000000000f RBX: ffffffff81cb5a20 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffff8802dec8e938 RDI: 0000000000000063
RBP: ffff8802d3c03e18 R8: ffffffff81f0a48c R9: 0000000000000030
R10: ffffffff81ef843c R11: 0000000000000395 R12: 0000000000000063
R13: 0000000000000000 R14: 0000000000000004 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#9 [ffff8802d3c03e20] __handle_sysrq at ffffffff814a0809
#10 [ffff8802d3c03e50] write_sysrq_trigger at ffffffff814a0c63
#11 [ffff8802d3c03e70] proc_reg_write at ffffffff8125181d
#12 [ffff8802d3c03ea0] __vfs_write at ffffffff811eae28
#13 [ffff8802d3c03eb0] vfs_write at ffffffff811eb469
#14 [ffff8802d3c03f00] sys_write at ffffffff811ec1d6
#15 [ffff8802d3c03f50] entry_SYSCALL_64_fastpath at ffffffff817bc3b2
RIP: 00007fe8920a83c0 RSP: 00007ffdd31afb98 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe8920a83c0
RDX: 0000000000000002 RSI: 0000000001180008 RDI: 0000000000000001
RBP: 00007fe89237c400 R8: 000000000000000a R9: 00007fe8929b8740
R10: 00007fe89237a6a0 R11: 0000000000000246 R12: 0000000001309dc8
R13: 0000000000000001 R14: 0000000000000000 R15: 00007ffdd31afb48
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
crash> log
[ 0.000000] Initializing cgroup subsys cpuset
[ 0.000000] Initializing cgroup subsys cpu
[ 0.000000] Initializing cgroup subsys cpuacct
[ 0.000000] Linux version 4.2.0-27-generic (buildd@lcy01-23) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #32~14.04.1-Ubuntu SMP Fri Jan 22 15:32:26 UTC 2016 (Ubuntu 4.2.0-27.32~14.04.1-generic 4.2.8-ckt1)
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.2.0-27-generic root=UUID=4c54bbf8-7fd6-482e-9939-aa98847d61d1 ro quiet splash crashkernel=384M-:128M crashkernel=384M-:128M vt.handoff=7
[ 0.000000] KERNEL supported cpus:
[ 0.000000] Intel GenuineIntel
[ 0.000000] AMD AuthenticAMD
[ 0.000000] Centaur CentaurHauls
[ 0.000000] x86/fpu: xstate_offset[2]: 0240, xstate_sizes[2]: 0100
[ 0.000000] x86/fpu: xstate_offset[3]: 03c0, xstate_sizes[3]: 0040
[ 0.000000] x86/fpu: xstate_offset[4]: 0400, xstate_sizes[4]: 0040
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x01: 'x87 floating point registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x02: 'SSE registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x04: 'AVX registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x08: 'MPX bounds registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x10: 'MPX CSR'
[ 0.000000] x86/fpu: Enabled xstate features 0x1f, context size is 0x440 bytes, using 'standard' format.
[ 0.000000] x86/fpu: Using 'eager' FPU context switches.
[ 0.000000] e820: BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009c3ff] usable
[ 0.000000] BIOS-e820: [mem 0x000000000009c400-0x000000000009ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000014acefff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000014acf000-0x0000000014acffff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x0000000014ad0000-0x0000000014ad0fff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000014ad1000-0x000000001b338fff] usable
[ 0.000000] BIOS-e820: [mem 0x000000001b339000-0x000000001c2e3fff] reserved
[ 0.000000] BIOS-e820: [mem 0x000000001c2e4000-0x000000001c36bfff] ACPI data
[ 0.000000] BIOS-e820: [mem 0x000000001c36c000-0x000000001cb45fff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x000000001cb46000-0x000000001cffdfff] reserved
[ 0.000000] BIOS-e820: [mem 0x000000001cffe000-0x000000001cffefff] usable
[ 0.000000] BIOS-e820: [mem 0x000000001cfff000-0x000000001fffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000f8000000-0x00000000fbffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fe000000-0x00000000fe010fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fed00000-0x00000000fed00fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x00000002deffffff] usable
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] SMBIOS 3.0 present.
[ 0.000000] DMI: LENOVO 90GKCTO1WW/3102, BIOS M16KT52A 09/10/2018
[ 0.000000] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
[ 0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable
[ 0.000000] e820: last_pfn = 0x2df000 max_arch_pfn = 0x400000000
[ 0.000000] MTRR default type: write-back
。。。