使用kdump捕获kernel的崩溃

kdump是在系统崩溃、死锁或者死机的时候用来转储内存运行参数的一个工具和服务。

背景介绍

在linux系统软件开发的过程中,会遇到开发完的程序导致kernel崩溃,此时我们想要查找崩溃留下的蛛丝马迹来定位bug,但是此时kernel已经崩溃了,平时定位bug的命令和工具都不能工作了,唯有断电重启,此时可以采用kdump工具,定位kenel崩溃。

工具介绍

kdump是一种先进的基于 kexec 的内核崩溃内存转储机制,该机制在内存中留有crashKernel,当我们运行的内核崩溃时,kexec会切换到crashKernel继续运行,并使用kdump工具来保存内存堆栈,来达到内存转储的目的。

kdump工具

kdump用于捕获linux内核的崩溃,通过一种机制将崩溃的内存堆栈信息在内核崩溃后保存到制定目录中。

kexec工具

kexec是一个 Linux 内核到内核的引导加载程序,可以帮助从第一个内核的上下文引导到第二个内核。

kdump工作原理

kdump分为第一内核与第二内核,第一内核即我们正在使用的kernel,用于工作开发生产,当第一kernel崩溃后,kexec工具自动切换到第二内核,即crashKernel。kdump工具在开机启动的grub中为第二内核留有一片内存,在该内存中存放第二kernel镜像即initrd,安装kdump后的grub片段为

menuentry 'Ubuntu' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-4c54bbf8-7fd6-482e-9939-aa98847d61d1' {
        recordfail
        load_video
        gfxmode $linux_gfx_mode
        insmod gzio
        insmod part_msdos
        insmod ext2
        set root='hd0,msdos1'
        if [ x$feature_platform_search_hint = xy ]; then
          search --no-floppy --fs-uuid --set=root --hint-bios=hd0,msdos1 --hint-efi=hd0,msdos1 --hint-baremetal=ahci0,msdos1  4c54bbf8-7fd6-482e-9939-aa98847d61d1
        else
          search --no-floppy --fs-uuid --set=root 4c54bbf8-7fd6-482e-9939-aa98847d61d1
        fi
        linux   /boot/vmlinuz-4.2.0-27-generic root=UUID=4c54bbf8-7fd6-482e-9939-aa98847d61d1 ro  quiet splash crashkernel=384M-:128M crashkernel=384M-:128M $vt_handoff
        initrd  /boot/initrd.img-4.2.0-27-generic
}
 

安装使用

安装环境:ubuntu-14.04.4-desktop-amd64

安装方法:apt-get install linux-crashdump

下载dbgsym安装包:http://ddebs.ubuntu.com/pool/main/l/linux/ 需要下载与第一内核相同的版本

下载后使用dpkg -i packet.ddeb安装

安装结束,使用以下命令查看是否安装成功:

root@sis-YangTianT4900d-11:~# kdump-config show
DUMP_MODE:        kdump
USE_KDUMP:        1
KDUMP_SYSCTL:     kernel.panic_on_oops=1
KDUMP_COREDIR:    /var/crash
crashkernel addr: 0x2d000000
   /var/lib/kdump/vmlinuz: symbolic link to `/boot/vmlinuz-4.2.0-27-generic'
kdump initrd:
   /var/lib/kdump/initrd.img: symbolic link to `/var/lib/kdump/initrd.img-4.2.0-27-generic'
current state:    ready to kdump

kexec command:
  /sbin/kexec -p --command-line="BOOT_IMAGE=/boot/vmlinuz-4.2.0-27-generic root=UUID=4c54bbf8-7fd6-482e-9939-aa98847d61d1 ro quiet splash vt.handoff=7 irqpoll maxcpus=1 nousb systemd.unit=kdump-tools.service" --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz

使用命令#cat /proc/cmdline 查看启动信息

root@sis-YangTianT4900d-11:~# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.2.0-27-generic root=UUID=4c54bbf8-7fd6-482e-9939-aa98847d61d1 ro quiet splash crashkernel=384M-:128M crashkernel=384M-:128M vt.handoff=7

crashkernel=384M-2G:64M,2G-:128M

The above value means:

1).if the RAM is smaller than 384M, then don't reserve anything (this is the "rescue" case)

2).if the RAM size is between 386M and 2G (exclusive), then reserve 64M

3).if the RAM size is larger than 2G, then reserve 128M

 

测试

使用linux自带的可以崩溃的中断,触发kdump

echo c > /proc/sysrq-trigger

此时linux崩溃,并自动重启;重启后查看/var/crash

root@sis-YangTianT4900d-11:/var/crash# tree
.
├── 201910211056
│   ├── dmesg.201910211056
│   └── dump.201910211056
└── kexec_cmd

1 directory, 3 files


可以看到以时间命令的文件夹下存在两个文件dmesg.*和dump.*;使用crash指令解析dump文件

root@sis-YangTianT4900d-11:/var/crash# crash  /usr/lib/debug/boot/vmlinux-4.2.0-27-generic 201910211056/dump.201910211056

crash 7.0.3
Copyright (C) 2002-2013  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.

GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

      KERNEL: /usr/lib/debug/boot/vmlinux-4.2.0-27-generic
    DUMPFILE: 201910211056/dump.201910211056  [PARTIAL DUMP]
        CPUS: 4
        DATE: Thu Jan  1 08:00:00 1970
      UPTIME: 00:03:10
LOAD AVERAGE: 0.08, 0.13, 0.06
       TASKS: 360
    NODENAME: sis-YangTianT4900d-11
     RELEASE: 4.2.0-27-generic
     VERSION: #32~14.04.1-Ubuntu SMP Fri Jan 22 15:32:26 UTC 2016
     MACHINE: x86_64  (2998 Mhz)
      MEMORY: 7.9 GB
       PANIC: "Oops: 0002 [#1] SMP " (check log for details)
         PID: 2016
     COMMAND: "bash"
        TASK: ffff8802bb518000  [THREAD_INFO: ffff8802d3c00000]
         CPU: 1
       STATE: TASK_RUNNING (PANIC)

crash>

此时进入到crash命令终端

crash命令说明:命令的第一个参数指定的内核镜像是通过dbgsym安装包生成的,该镜像应该与当前第一内核版本一致;第二参数为dump崩溃日志

使用crash -h查看使用帮助

常用指令如下:

crash> bt
PID: 2016   TASK: ffff8802bb518000  CPU: 1   COMMAND: "bash"
 #0 [ffff8802d3c03ab0] machine_kexec at ffffffff81055832
 #1 [ffff8802d3c03b00] crash_kexec at ffffffff810ffc43
 #2 [ffff8802d3c03bd0] oops_end at ffffffff81017d8d
 #3 [ffff8802d3c03c00] no_context at ffffffff81063a8d
 #4 [ffff8802d3c03c70] __bad_area_nosemaphore at ffffffff81063e09
 #5 [ffff8802d3c03cc0] bad_area_nosemaphore at ffffffff81063f23
 #6 [ffff8802d3c03cd0] __do_page_fault at ffffffff81064182
 #7 [ffff8802d3c03d40] do_page_fault at ffffffff81064542
 #8 [ffff8802d3c03d60] page_fault at ffffffff817be308
    [exception RIP: sysrq_handle_crash+22]
    RIP: ffffffff814a0006  RSP: ffff8802d3c03e18  RFLAGS: 00010292
    RAX: 000000000000000f  RBX: ffffffff81cb5a20  RCX: 0000000000000000
    RDX: 0000000000000001  RSI: ffff8802dec8e938  RDI: 0000000000000063
    RBP: ffff8802d3c03e18   R8: ffffffff81f0a48c   R9: 0000000000000030
    R10: ffffffff81ef843c  R11: 0000000000000395  R12: 0000000000000063
    R13: 0000000000000000  R14: 0000000000000004  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff8802d3c03e20] __handle_sysrq at ffffffff814a0809
#10 [ffff8802d3c03e50] write_sysrq_trigger at ffffffff814a0c63
#11 [ffff8802d3c03e70] proc_reg_write at ffffffff8125181d
#12 [ffff8802d3c03ea0] __vfs_write at ffffffff811eae28
#13 [ffff8802d3c03eb0] vfs_write at ffffffff811eb469
#14 [ffff8802d3c03f00] sys_write at ffffffff811ec1d6
#15 [ffff8802d3c03f50] entry_SYSCALL_64_fastpath at ffffffff817bc3b2
    RIP: 00007fe8920a83c0  RSP: 00007ffdd31afb98  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 0000000000000001  RCX: 00007fe8920a83c0
    RDX: 0000000000000002  RSI: 0000000001180008  RDI: 0000000000000001
    RBP: 00007fe89237c400   R8: 000000000000000a   R9: 00007fe8929b8740
    R10: 00007fe89237a6a0  R11: 0000000000000246  R12: 0000000001309dc8
    R13: 0000000000000001  R14: 0000000000000000  R15: 00007ffdd31afb48
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
 

 

crash> log
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] Linux version 4.2.0-27-generic (buildd@lcy01-23) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #32~14.04.1-Ubuntu SMP Fri Jan 22 15:32:26 UTC 2016 (Ubuntu 4.2.0-27.32~14.04.1-generic 4.2.8-ckt1)
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.2.0-27-generic root=UUID=4c54bbf8-7fd6-482e-9939-aa98847d61d1 ro quiet splash crashkernel=384M-:128M crashkernel=384M-:128M vt.handoff=7
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000]   Centaur CentaurHauls
[    0.000000] x86/fpu: xstate_offset[2]: 0240, xstate_sizes[2]: 0100
[    0.000000] x86/fpu: xstate_offset[3]: 03c0, xstate_sizes[3]: 0040
[    0.000000] x86/fpu: xstate_offset[4]: 0400, xstate_sizes[4]: 0040
[    0.000000] x86/fpu: Supporting XSAVE feature 0x01: 'x87 floating point registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x02: 'SSE registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x04: 'AVX registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x08: 'MPX bounds registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x10: 'MPX CSR'
[    0.000000] x86/fpu: Enabled xstate features 0x1f, context size is 0x440 bytes, using 'standard' format.
[    0.000000] x86/fpu: Using 'eager' FPU context switches.
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009c3ff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009c400-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000014acefff] usable
[    0.000000] BIOS-e820: [mem 0x0000000014acf000-0x0000000014acffff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x0000000014ad0000-0x0000000014ad0fff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000014ad1000-0x000000001b338fff] usable
[    0.000000] BIOS-e820: [mem 0x000000001b339000-0x000000001c2e3fff] reserved
[    0.000000] BIOS-e820: [mem 0x000000001c2e4000-0x000000001c36bfff] ACPI data
[    0.000000] BIOS-e820: [mem 0x000000001c36c000-0x000000001cb45fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x000000001cb46000-0x000000001cffdfff] reserved
[    0.000000] BIOS-e820: [mem 0x000000001cffe000-0x000000001cffefff] usable
[    0.000000] BIOS-e820: [mem 0x000000001cfff000-0x000000001fffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000f8000000-0x00000000fbffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fe000000-0x00000000fe010fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fed00000-0x00000000fed00fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x00000002deffffff] usable
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] SMBIOS 3.0 present.
[    0.000000] DMI: LENOVO 90GKCTO1WW/3102, BIOS M16KT52A 09/10/2018
[    0.000000] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
[    0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable
[    0.000000] e820: last_pfn = 0x2df000 max_arch_pfn = 0x400000000
[    0.000000] MTRR default type: write-back
。。。

 

你可能感兴趣的:(linux使用,c语言学习)