分析Ubuntu Kernel Kdump文件(by quqi99)

作者:张华  发表于:2014-07-23
版权声明:可以任意转载,转载时请务必以超链接形式标明文章原始出处和作者信息及本版权声明

(http://blog.csdn.net/quqi99 )

      Linux内核在发生kernel panic时会打印出Oops信息,把目前的寄存器状态、堆栈内容、以及完整的Call trace都使用内核转储工具kdump dump到一个文件里,之后我们再用gdb来分析.
      kexec是一个快速启动的机制,允许通过已经运行的内核的上下文不经过BIOS启动一个linux内核,它是实现kdump的关键。首先用户空间的工具kexec-tools将捕获内核的地址传递给生产内核,从而在系统崩溃时能找到捕获内核的地址并运行。

1, 安装linux-crashdump (sudo apt-get install linux-crashdump)将安装crash, kexec-tools, makedumpfile三个工具,安装后会产生如下grub2选项。

hua@node1:~$ cat /etc/default/grub.d/kexec-tools.cfg 
GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=384M-:128M"

2,  修改kdump配置文件(/etc/default/kdump-tools): USE_KDUMP=1

3, 启动kdump, sudo /etc/init.d/kdump-tools start ,或重启机器会看到如下信息代表成功。

hua@node1:~$ cat /var/crash/kexec_cmd   #捕获镜像与生产镜像用的是同一个
/sbin/kexec -p --command-line="BOOT_IMAGE=/boot/vmlinuz-4.4.0-9-generic root=UUID=e9127b13-568c-4a69-9bbd-0b00e44f0ad9 ro quiet splash intel_iommu=on pci=assign-busses vt.handoff=7 irqpoll maxcpus=1 nousb" --initrd=/boot/initrd.img-4.4.0-9-generic /boot/vmlinuz-4.4.0-9-generic

hua@node1:~$ cat /proc/cmdline 
BOOT_IMAGE=/boot/vmlinuz-4.4.0-9-generic root=UUID=e9127b13-568c-4a69-9bbd-0b00e44f0ad9 ro quiet splash intel_iommu=on pci=assign-busses crashkernel=384M-:128M vt.handoff=7

hua@node1:~$ sudo dmesg | grep crash
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.4.0-9-generic root=UUID=e9127b13-568c-4a69-9bbd-0b00e44f0ad9 ro quiet splash intel_iommu=on pci=assign-busses crashkernel=384M-:128M vt.handoff=7
[    0.000000] Reserving 128MB of memory at 704MB for crashkernel (System RAM: 32640MB)


cat /etc/sysctl.conf

# VMCORE setup
kernel.hung_task_panic = 0
kernel.panic = 0
kernel.panic_on_io_nmi = 0
kernel.panic_on_oops = 0
kernel.panic_on_unrecovered_nmi = 0
kernel.softlockup_panic = 0
kernel.unknown_nmi_panic = 0

 kernel.sysrq=1

kernel.hung_task_timeout_secs = 0

sysctl -w vm.dirty_ratio=5                  #写缓存频率


fs.pipe-user-pages-soft = 16384
fs.xfs.panic_mask = 0
kernel.hardlockup_panic = 0
kernel.hung_task_check_count = 4194304
kernel.hung_task_panic = 0
kernel.hung_task_timeout_secs = 120
kernel.hung_task_warnings = 8
kernel.panic = 0
kernel.panic_on_io_nmi = 0
kernel.panic_on_oops = 0
kernel.panic_on_unrecovered_nmi = 0
kernel.panic_on_warn = 0
kernel.soft_watchdog = 1
kernel.softlockup_all_cpu_backtrace = 0
kernel.softlockup_panic = 0
kernel.unknown_nmi_panic = 0

4,  按组合键Alt+SysRq+c就可以产生一个panic(或 echo c > /proc/sysrq-trigger), 然后在/var/crash/目录可以找到crash日志

     ubuntu里也有一个命令如sudo apport-cli -f -P `pidof firefox`用于收集panic时的kernel日志的


5, 一般拿过来的crash log是通过base64加密过,可解密:apport-unpack 00069866-linux-image-3.2.0-23-generic.0.txt ./tmp/


6, crash工具需要内核调试信息dbgsym才可以工作,先看看有没有/user/lib/debug/boot这个目录,没有的话从这里下载http://ddebs.ubuntu.com/pool/main/l/linux/, 或者:
echo "deb http://ddebs.ubuntu.com $(lsb_release -cs) main restricted universe multiverse" | sudo tee -a /etc/apt/sources.list.d/ddebs.list


deb http://ddebs.ubuntu.com trusty main restricted universe multiverse
deb http://ddebs.ubuntu.com trusty-updates main restricted universe multiverse
deb http://ddebs.ubuntu.com trusty-proposed main restricted universe multiverse


   sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys ECDCAD72428D7C01
   sudo apt-get update
   sudo apt-get install systemtap
   sudo apt-get install linux-image-$(uname -r)-dbgsym

   sudo apt-get source linux-image-$(uname -r)

或者使用“axel -n 64”命令高速下载后安装‘sudo dpkg -i *.ddeb’

没有时再:

https://wiki.ubuntu.com/Kernel/Systemtap
http://chrisarges.net/2015/10/02/building-ubuntu-kernels-with-debug-symbols.html


6, crash /usr/lib/debug/boot/vmlinux-3.5.0-17-generic VmDump

注意:使用4.4内核调试实时内核时( sudo crash /usr/lib/debug/boot/vmlinux-4.4.0-9-generic)报错“invalid structure member offset: module_core_size”, 见这个bug (https://www.redhat.com/archives/crash-utility/2016-January/msg00030.html ), 我采用(sudo crash --no_modules /usr/lib/debug/boot/vmlinux-4.4.0-9-generic)避免模块的初始化错误中断crash

自己调试机运行的什么版本的kernel和客户运行的没有关系,只需要使用--mod参数指定和客户运行的debug symbol的版本一致即可。

wget http://ddebs.ubuntu.com/pool/main/l/linux/linux-image-4.13.0-37-generic-dbgsym_4.13.0-37.42_amd64.ddeb -O /home/zhhuabj/debs/4.13.0-37.42/linux-image-4.13.0-37-generic-dbgsym_4.13.0-37.42_amd64.ddeb

strings vmcore.201803282124 |less  #NOTE: vmcore may first need to unzip
dpkg -x linux-image-4.13.0-37-generic-dbgsym_4.13.0-37.42_amd64.ddeb /home/zhhuabj/debs/4.13.0-37.42/
cat run_crash.sh

exec crash --mod /home/zhhuabj/debs/4.13.0-37.42/usr/lib/debug/lib/modules/4.13.0-37-generic /home/zhhuabj/debs/4.13.0-37.42/usr/lib/debug/boot/vmlinux-4.13.0-37-generic vmcore.201803282124

NOTE: 上面的方法可能不成功,例如,4.13.0-37是artful的GA Kernel,但对于xenial来说是HWE Kernel。sosreport中的uname显示版本号(上面是通过strings命令查看的版本号,似乎不能显示更详细的版本号)是“4.13.0-37-generic #42~16.04.1”,这种版本号就是在xenial中使用4.13.0-37 hwe kernel。所以正确的debug symbol应该是:https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/ppa/+build/14432003/+files/linux-image-4.13.0-37-generic-dbgsym_4.13.0-37.42~16.04.1_amd64.ddeb


附录一, 如何利用dump文件找到源码中出错误的地方

因为编译优化会产生代码行号不一致,所以一般是理解汇编,再看源码作为参考:

1, 查看backtrace列表,exception RIP是kmem_cache_alloc_trace+0x7c (或者查看RIP寄存器)

crash> bt -l
PID: 47656  TASK: ffff880115fa3000  CPU: 14  COMMAND: "make"
 ...
 #5 [ffff881f136edb30] general_protection at ffffffff817650c8
    /build/buildd/linux-lts-trusty-3.13.0/arch/x86/kernel/entry_64.S: 1514
    [exception RIP: kmem_cache_alloc_trace+0x7c]

2, 将标识符 kmem_cache_alloc_trace+0x7c 转换为其所对应的物理地址 0xffffffff811af2fc
crash> dis kmem_cache_alloc_trace+0x7c
0xffffffff811af2fc :       mov    0x0(%r13,%rax,1),%rbx

查看汇编也可以使用x/100gi命令:

故对于动态链接库采用gdb查看CoreDump时可以采用x/100gi $rip代替

(gdb) p $rip
$2 = (void (*)()) 0x7fa84c275470
(gdb) x/2gi $rip
=> 0x7fa84c275470 : mov    0x8(%rbx),%rax
   0x7fa84c275474 : and    $0xfffffffffffffff8,%rax


readelf命令也可以查看物理地址:
readelf -wL vmlinux
Decoded dump of debug contents of section .debug_line:
[... search for %rip value ...]
skbuff.c                                    2918  0xffffffff81614495
skbuff.c                                    2911  0xffffffff816144b8
skbuff.c                                    2904  0xffffffff816144ba
skbuff.c                                    2825  0xffffffff816144bc

3, 用addr2line命令来找出其对应的源代码
   -e, 标明需要对其进行地址转换的程序的名称。
   -f, 显示函数名称,以及它们的所在文件和行数信息。
   -i, 如果要转换的地址属于一个内联函数, 那么,所有作用域内从第一个非内联函数起的信息都会被列出。
addr2line 0xffffffff811af2fc -e /mnt/ddeb-3.13.0-34.60/usr/lib/debug/boot/vmlinux-3.13.0-45-generic -f -i
get_freepointer
/build/buildd/linux-lts-trusty-3.13.0/mm/slub.c:260
get_freepointer_safe
/build/buildd/linux-lts-trusty-3.13.0/mm/slub.c:275
slab_alloc_node
/build/buildd/linux-lts-trusty-3.13.0/mm/slub.c:2416
slab_alloc
/build/buildd/linux-lts-trusty-3.13.0/mm/slub.c:2455
kmem_cache_alloc_trace
/build/buildd/linux-lts-trusty-3.13.0/mm/slub.c:2472

或者如果已经安装了源码的话可以直接使用crash的list命令查看相关代码。
crash> list *0xffffffff816144ba
0xffffffff816144ba is in skb_segment (/build/buildd/linux-3.13.0/net/core/skbuff.c:2904).
2899                    skb_shinfo(nskb)->tx_flags = skb_shinfo(head_skb)->tx_flags &
2900                            SKBTX_SHARED_FRAG;
2901    
2902                    while (pos < offset + len) {
2903                            if (i >= nfrags) {
2904                                    BUG_ON(skb_headlen(list_skb));
[...]

注意,对应用态的动态链接库使用addr2line却显示了下列的问号(因为动态链接库要重定位),对动态链接库如何使用addr2line待查。
$ addr2line 0x7fa84c275470 -f -i -e /usr/lib/x86_64-linux-gnu/debug/libstdc++.so.6
??
??:0


4, 确认源代码, 直接看源码,或者若是结构体直接采用下列命令显示行号后和汇编里的代码对行号。

crash> struct -o transction_s.t_locaked_list


附录二,从soft lookup的例子熟悉kernel开发流程

1, 使用'crash> log > log.txt'可以看到很多有用的信息

2, 说明systemd进程是在等锁(_raw_spin_lock_irq),其他类似很多日志表明其他一些进程也是类似在等锁,所以systemd等进程不会是凶手
[ 1043.679592] NMI watchdog: Watchdog detected hard LOCKUP on cpu 17
...
[ 1043.679646] CPU: 17 PID: 86236 Comm: systemd Not tainted 4.13.0-37-generic #42~16.04.1-Ubuntu
...
[ 1043.679664] Call Trace:
[ 1043.679671]  _raw_spin_lock_irq+0x28/0x30
...

3, 真正造成死锁的是40号CPU的sgdisk(sgdisk -> block_ioctl -> blkdev_ioctl -> blkdev_reread_part -> __blkdev_reread_part -> rescan_partitions -> drop_partitions -> invalidate_partition -> __invalidate_device -> invalidate_bdev -> invalidate_bh_lrus -> on_each_cpu_cond -> on_each_cpu_mask -> smp_call_function_many)

其smp_call_function_many用于要求其他CPU执行相关函数, 所以原因可能是:

1) 其他CPU也stuck了,这需要一个个检查其他CPU的情况。但crash显示其他CPU都在等待spin lock而无法访问状态.

PID: 384217 TASK: ffff8fa582ed9e40 CPU: 47 COMMAND: "ceph-osd"
bt: read error: kernel virtual address: fffffe000081b000 type: "stack contents"

bt: read of stack at fffffe000081b000 failed

需禁用KASLR后(nokaslr)并设置一旦watchdog捕获到panic后立即抓crash(nmi_watchdog=panic)后重新生成fresh crash

2)40号CPU自己在执行长任务。


[ 1062.041799] Kernel panic - not syncing: softlockup: hung tasks
[ 1062.042657] CPU: 40 PID: 383965 Comm: sgdisk Tainted: G             L  4.13.0-37-generic #42~16.04.1-Ubuntu
[ 1062.043524] Hardware name: Lenovo ThinkSystem SR650 -[7X06CTO1WW]-/-[7X06CTO1WW]-, BIOS -[IVE116S-1.20]- 02/08/2018
[ 1062.044396] Call Trace:
[ 1062.045251] 
[ 1062.046093]  dump_stack+0x63/0x8b
[ 1062.046931]  panic+0xe4/0x24d
[ 1062.047752]  watchdog_timer_fn+0x219/0x220
[ 1062.048557]  ? watchdog_park_threads+0x70/0x70
[ 1062.049358]  __hrtimer_run_queues+0xe7/0x230
[ 1062.050149]  hrtimer_interrupt+0xb1/0x200
[ 1062.050941]  smp_trace_apic_timer_interrupt+0x6f/0xa0
[ 1062.051720]  ? __brelse+0x30/0x30
[ 1062.052486]  smp_apic_timer_interrupt+0xe/0x10
[ 1062.053244]  apic_timer_interrupt+0x1af/0x1c0
[ 1062.053984] 

[ 1062.054707] RIP: 0010:smp_call_function_many+0x20b/0x270
[ 1062.055433] RSP: 0018:ffffbaae766efc40 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff10
[ 1062.056159] RAX: 0000000000000003 RBX: 0000000000000130 RCX: 000000000000000b
[ 1062.056877] RDX: ffff8f487dee7b18 RSI: 0000000000000130 RDI: ffff8f486e974280
[ 1062.057588] RBP: ffffbaae766efc78 R08: 0000000000000000 R09: c4ff7effffddea38
[ 1062.058295] R10: fffff1c6f324aac0 R11: ffff8f486e974280 R12: ffffffffb88915b0
[ 1062.058998] R13: 0000000000000000 R14: ffff8f487e123a00 R15: 0000000000000130
[ 1062.059696]  ? __brelse+0x30/0x30
[ 1062.060387]  ? __brelse+0x30/0x30
[ 1062.061067]  on_each_cpu_mask+0x28/0x70
[ 1062.061733]  ? mark_buffer_async_write+0x20/0x20
[ 1062.062398]  on_each_cpu_cond+0xb6/0x160
[ 1062.063053]  ? __brelse+0x30/0x30
[ 1062.063686]  invalidate_bh_lrus+0x29/0x30
[ 1062.064303]  invalidate_bdev+0x3a/0x60
[ 1062.064906]  __invalidate_device+0x4d/0x60
[ 1062.065489]  invalidate_partition+0x31/0x50
[ 1062.066054]  rescan_partitions+0x50/0x330
[ 1062.066602]  ? security_capable+0x4e/0x70
[ 1062.067130]  __blkdev_reread_part+0x65/0x70
[ 1062.067659]  blkdev_reread_part+0x23/0x40
[ 1062.068168]  blkdev_ioctl+0x38f/0x930
[ 1062.068657]  ? mempool_free_slab+0x17/0x20
[ 1062.069130]  ? mempool_free+0x2f/0x90
[ 1062.069579]  block_ioctl+0x3d/0x50
[ 1062.070012]  do_vfs_ioctl+0xa4/0x600
[ 1062.070430]  ? blkdev_fsync+0x35/0x50
[ 1062.070854]  ? vfs_fsync_range+0x4e/0xb0
[ 1062.071281]  SyS_ioctl+0x79/0x90
[ 1062.071712]  entry_SYSCALL_64_fastpath+0x24/0xab
[ 1062.072150] RIP: 0033:0x7f3b43fe0f47
[ 1062.072579] RSP: 002b:00007ffe7affbc58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 1062.073021] RAX: ffffffffffffffda RBX: 00007ffe7afffaa8 RCX: 00007f3b43fe0f47
[ 1062.073455] RDX: 00007ffe7affbb90 RSI: 000000000000125f RDI: 0000000000000004
[ 1062.073876] RBP: 00007ffe7affba20 R08: 0000555ea8d45a80 R09: 0000000000000001
[ 1062.074281] R10: 0000555ea8d45a80 R11: 0000000000000246 R12: 0000000000000001
[ 1062.074691] R13: 00007ffe7afffaa8 R14: 0000000000000000 R15: 00007ffe7affe8d2

4, upstream kernel中搜索invalidate_bh_lrus,及使用google搜索相应的信息
git log --grep=invalidate_bh_lrus
git grep invalidate_bh_lrus
git blame fs/block_dev.c

5, 向Ubuntu贡献patch, 订阅邮件列表(https://lists.ubuntu.com/mailman/listinfo/kernel-team), 该页面上有"kernel-team Archives"的链接,里面对SRU的相关的内容类似如下(https://lists.ubuntu.com/archives/kernel-team/2018-March/thread.html), PATCH0是patch的说明,之后有PATH1等,被接受是ACK,已经合并到branch是APPLIED。机器人会解析这些邮件标题然后自动往lp bug中更新。所以SRU时并不在lp bug提交的并不是debdiff,而是在邮件里提交的patch,patch是由git-send-email生成的(最初先发到自己邮箱测试),测试通过后发到邮件列表[email protected]
[SRU] [A/B] [PATCH 0/2] Fix wrong battery status on Asus laptops   Kai-Heng Feng
[SRU] [A/B] [PATCH 1/2] ACPI / battery: Add quirk for Asus GL502VSK and UX305LA   Kai-Heng Feng
[SRU] [A/B] [PATCH 2/2] ACPI / battery: Add quirk for Asus UX360UA and UX410UAK   Kai-Heng Feng
ACK / APPLIED[B]: [SRU] [A/B] [PATCH 0/2] Fix wrong battery status on Asus laptops   Seth Forshee
ACK: [SRU] [A/B] [PATCH 0/2] Fix wrong battery status on Asus laptops   Colin Ian King
APPLIED[Artful/backlog]: [SRU] [A/B] [PATCH 0/2] Fix wrong battery status on Asus laptops   Kleber Souza


Reference
1, http://my.oschina.net/guol/blog/128030

2, http://www.inaddy.org/mini-howtos/dumps/using-ubuntu-crash-dump-with-kdum

你可能感兴趣的:(Linux,Kernel)