怎样分析crash dump(空指针)

以简单的系统提供的crash方法为例,echoc > /proc/sysrq-trigger.

得到crash文件后,一般情况下,最想看到的是错误类型和发生错误时的registersbacktrace.可以通过命令log| tail -200得到,意思是得到log文件的最后200行:

crash>log | tail -200

[2207.597488:0] Unable to handle kernel NULL pointer dereference atvirtual address 00000000

[2207.605719:0] pgd = ddf30000

[2207.608588:0] [00000000] *pgd=00000000

[2207.612339:0] Internal error: Oops: 805 [#1] PREEMPT SMP ARM

[2207.617975:0] Modules linked in:

[2207.621205:0] CPU: 0 Not tainted (3.4.0-gc37fe8c-dirty #651)

[2207.627196:0] PC is at sysrq_handle_crash+0x38/0x48

[2207.632059:0] LR is at _raw_spin_unlock_irqrestore+0x20/0x40

[2207.637699:0] pc : [] lr : [] psr: 60000093

[2207.637704:0] sp : e7c61ec8 ip : e7c61e98 fp : e7c61ed4

[2207.649487:0] r10: e7c61f70 r9 : e7c60000 r8 : 00000000

[2207.654865:0] r7 : 60000013 r6 : 00000063 r5 : 00000004 r4 :c079fc74

[2207.661539:0] r3 : 00000000 r2 : 00000001 r1 : 20000093 r0 :00000001

[2207.668215:0] Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user

[2207.675582:0] Control: 10c53c7d Table: 9ff3004a DAC: 00000015

[2207.681477:0]

[2208.257865:0] Process sh (pid: 2309, stack limit = 0xe7c602f0)



[2208.351387:0] Backtrace:

[2208.354019:0] [] (sysrq_handle_crash+0x0/0x48) from[] (__handle_sysrq+0xac/0x158)

[2208.363293:0] [] (__handle_sysrq+0x0/0x158) from[] (write_sysrq_trigger+0x30/0x38)

[2208.372647:0] r8:b76e780c r7:00000002 r6:e331c5c0 r5:c01ed8e4r4:00000002

[2208.379373:0] r3:e7c61f70

[2208.382188:0] [] (write_sysrq_trigger+0x0/0x38) from[] (proc_reg_write+0x88/0x9c)

[2208.391455:0] r4:ed98b5e0 r3:e7c61f70

[2208.395221:0] [] (proc_reg_write+0x0/0x9c) from[] (vfs_write+0xb8/0x144)

[2208.403715:0] [] (vfs_write+0x0/0x144) from[] (sys_write+0x44/0x70)

[2208.411772:0] r8:00000002 r7:00000000 r6:00000000 r5:b76e780cr4:e331c5c0

[2208.418695:0] [] (sys_write+0x0/0x70) from[] (ret_fast_syscall+0x0/0x30)

[2208.427185:0] r8:c000e1e8 r7:00000004 r6:00000001 r5:00000002r4:00000003

[2208.434101:0] Code: 0a000000 e12fff33 e3a03000 e3a02001 (e5c32000)

[2208.440344:0] Enter crash kexec !!

[2208.443747:1] CPU 1 will stop doing anything useful since anotherCPU has crashed

[2208.451905:0] Loading crashdump kernel...

[2208.455900:0] Software reset on panic!

得到当前出问题的进程:

crash>ps | grep ">"


> 1904 1384 1 e3355000 RU 2.9 663340 23740 MediaScannerSer


> 2309 1394 0 d9742c00 RU 0.1 820 480 sh


因为是多核,到底是哪个进程?

其实上面的log信息已经显示出

[2207.621205:0] CPU: 0 Nottainted (3.4.0-gc37fe8c-dirty #651)

[2208.257865:0] Process sh (pid: 2309, stack limit = 0xe7c602f0)


也可以通过set命令得出:

crash>set 2309

PID:2309

COMMAND:"sh"

TASK:d9742c00 [THREAD_INFO: e7c60000]

CPU:0

STATE:TASK_RUNNING (PANIC)


crash>set 1904

PID:1904

COMMAND:"MediaScannerSer"

TASK:e3355000 [THREAD_INFO: e0422000]

CPU:1

STATE:TASK_RUNNING (ACTIVE)

从出问题的具体位置开始分析

[2207.621205:0] CPU: 0 Not tainted (3.4.0-gc37fe8c-dirty #651)

[2207.627196:0] PC is at sysrq_handle_crash+0x38/0x48


[2207.632059:0] LR is at _raw_spin_unlock_irqrestore+0x20/0x40

[2207.637699:0] pc : [] lr : [] psr: 60000093

[2207.637704:0] sp : e7c61ec8 ip : e7c61e98 fp : e7c61ed4

[2207.649487:0] r10: e7c61f70 r9 : e7c60000 r8 : 00000000

[2207.654865:0] r7 : 60000013 r6 : 00000063 r5 : 00000004 r4 :c079fc74

[2207.661539:0] r3 : 00000000 r2 : 00000001 r1 : 20000093 r0 :00000001


当前的PC值是c01ed158

,使用命令dis-r xxx得到出问题的具体地方和从函数入口到此处的代码

helpdis

-r (reverse) displays all instructions from the start of the


routineup to and including the designated address.


crash>dis -r c01ed158

0xc01ed120: mov r12, sp

0xc01ed124: push {r11, r12, lr, pc}

0xc01ed128: sub r11, r12, #4

0xc01ed12c: ldr r3, [pc, #44] ;0xc01ed160

0xc01ed130: mov r2, #1

0xc01ed134: str r2, [r3]

0xc01ed138: dsb sy

0xc01ed13c: ldr r3, [pc, #32] ;0xc01ed164

0xc01ed140: ldr r3, [r3, #24]

0xc01ed144: cmp r3, #0

0xc01ed148: beq 0xc01ed150

0xc01ed14c: blx r3

0xc01ed150: mov r3, #0

0xc01ed154: mov r2, #1

0xc01ed158: strb r2, [r3]


出问题的具体地方就是strbr2, [r3],且此时r3: 00000000,把数据放入0地址肯定异常。

下面查找原因,看r3来自哪里?向上看就是:

0xc01ed150: mov r3, #0,是代码显示赋值的,且不是来自入口参数。

找到错误位置,并更改

查找具体的代码,看问题的原因.

Helpdis

-l displays source code line number data in addition to the


disassemblyoutput.

crash>dis -rl c01ed158

/home/wenshuai/code/kernel3.4/linux_kernel/drivers/tty/sysrq.c:129

0xc01ed120: mov r12, sp

0xc01ed124: push {r11, r12, lr, pc}

0xc01ed128: sub r11, r12, #4

/home/wenshuai/code/kernel3.4/linux_kernel/drivers/tty/sysrq.c:132

0xc01ed12c: ldr r3, [pc, #44] ;0xc01ed160

0xc01ed130: mov r2, #1

0xc01ed134: str r2, [r3]

/home/wenshuai/code/kernel3.4/linux_kernel/drivers/tty/sysrq.c:133

0xc01ed138: dsb sy

/home/wenshuai/code/kernel3.4/linux_kernel/arch/arm/include/asm/outercache.h:114

0xc01ed13c: ldr r3, [pc, #32] ;0xc01ed164

0xc01ed140: ldr r3, [r3, #24]

0xc01ed144: cmp r3, #0

0xc01ed148: beq 0xc01ed150

/home/wenshuai/code/kernel3.4/linux_kernel/arch/arm/include/asm/outercache.h:115

0xc01ed14c: blx r3

/home/wenshuai/code/kernel3.4/linux_kernel/drivers/tty/sysrq.c:134

0xc01ed150: mov r3, #0

0xc01ed154: mov r2, #1

0xc01ed158: strb r2, [r3]


从上可知出问题的具体地方是inux_kernel/drivers/tty/sysrq.c

staticvoid sysrq_handle_crash(int key)

{

char*killer = NULL;


panic_on_oops= 1; /* force panic */

wmb();

*killer= 1;

}

这个例子当然很简单,可以很容易看出原因。更过的错误原因是入口参数导致的,输入参数的某个成员没有赋值等原因导致

你可能感兴趣的:(crash)