edit by liaoye@2014/9/16
http://blog.csdn.net/paul_liao
make target=ARM
如果需要64bit:
make target=ARM64
3、编译外部lib
make extensions target=ARM64
展讯 ramdump抓取方法
当系统出现kernel panic的时候会自动把ramdump保持在T卡log的 sysdump文件下,一共两个文件:
使用crash utility解析时需要合成一个dump文件才能解析:
cat sysdump.core.0* > dump.bin
Marvell ramdump抓取方法
当系统出现kernel panic的时候会自动进入EMMD dump模式,如果检测到SD card, 屏幕显示“EMMD SD DUMP”,系统会自动把整个memory 保存到sdcard中,然后关机,可以从sdcard中拿到RAMDUMP0000.gz;否则显示“EMMD USB DUMP”,通过USB连接电脑用fastboot 工具将memory dump出来。
Linux
# fastboot-linux-marvell dump dump.bin
Windows:
D:fastboot_windows>fastboot-windows-marvell.exe dump dump.bin
MTK ramdump抓取方法
a.使能ramdump机制
需要添加如下代码
diff --gita/alps/kernel-3.10/drivers/misc/mediatek/aee/mrdump/mrdump_full.cb/alps/kernel-3.10/drivers/misc/mediatek/aee/mrdump/mrdump_full.c
index 8b2b93a..2ec509f 100644
---a/alps/kernel-3.10/drivers/misc/mediatek/aee/mrdump/mrdump_full.c
+++b/alps/kernel-3.10/drivers/misc/mediatek/aee/mrdump/mrdump_full.c
@@ -457,6 +457,17 @@ static int __initmrdump_init(void)
}
atomic_notifier_chain_register(&panic_notifier_list,&mrdump_panic_blk);
+ //add this block
+
+ {
+ mrdump_enable = 1;
+
+ mrdump_plat->hw_enable(mrdump_enable);
+
+ mrdump_cb->machdesc.nr_cpus = NR_CPUS;
+
+ __inner_flush_dcache_all();
+ }
return 0;
}
打开config
+CONFIG_MTK_AEE_POWERKEY_HANG_DETECT=y
+CONFIG_MTK_AEE_MRDUMP=y
+CONFIG_MTK_MRDUMP=y
+CONFIG_MTK_DBG_DUMP=y
另外需要关闭:CONFIG_MTK_AEE_IPANIC,打开了会生成sys_mini_dump,从而不会生成sys_core_dump。
Cat /sys/module/mrdump/parameters/enable 确认是否生效
b.抓取ramdump
Kernel出现panic or oops之后会重启进入lkramdump mode,把ram转储到/data/No_Delete.rdmp,然后在收集到mtklog/aee_exp/db*文件中。通过gat工具导出并把SYS_COREDUMP解析出来即可。
高通ramdump抓取方法
Kernel出现panic or oops之后会重启进入ramdump mode, 然后通过QPST工具把ramdump导出来,高通提供了解析工具linux ramdump parser和crashscope可以进行简单的解析,更复杂的解析需要trace32。
crash utility使用
官方提供了详细的使用文档http://people.redhat.com/anderson/crash_whitepaper,可供参考,下面是一些常用的操作。
1、 进入crash命令行:./crash-arm vmlinux dump.bin
paul@paul-VirtualBox:~$ ./crash-arm vmlinux dump.bin
crash-arm 7.0.5
Copyright (C) 2002-2014 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=i686-pc-linux-gnu --target=arm-elf-linux"...
KERNEL: vmlinux
DUMPFILE: dump.bin
CPUS: 1
DATE: Wed Jan 1 10:26:26 2014
UPTIME: 00:34:14
LOAD AVERAGE: 3.61, 3.59, 3.16
TASKS: 650
NODENAME: localhost
RELEASE: 3.10.33
VERSION: #4 SMP PREEMPT Wed Sep 10 14:44:32 CST 2014
MACHINE: armv7l (unknown Mhz)
MEMORY: 512 MB
PANIC: "c0 4233 (sh) Internal error: Oops: 805 [#1] PREEMPT SMP ARM" (check log for details)
PID: 4233
COMMAND: "sh"
TASK: d37f7b40 [THREAD_INFO: cf512000]
CPU: 0
STATE: TASK_RUNNING (PANIC)
crash-arm>
Crash-arm是编译出来的crash工具二进制文件, dump.bin是抓取到的ramdump,vmlinux和dump.bin的版本必须要要匹配上,否则无法解析。
2、然后在命令行执行log指令获取到kmsg
crash-arm> log
or
crash-arm> log > kmsg
3、bt 获取调用栈,通过调用栈的信息可以恢复现场查找问题。
crash-arm> bt
PID: 37 TASK: db34a640 CPU: 0 COMMAND: "kworker/u8:1"
#0 [
#1 [
#2 [
#3 [
crash-arm> bt -f
PID: 37 TASK: db34a640 CPU: 0 COMMAND: "kworker/u8:1"
#0 [
[PC: c016ad38 LR: c0143a5c SP: db391ee8 SIZE: 16]
db391ee8: 00000838 c0a5f01c db367080 c0143a5c
#1 [
[PC: c0143a5c LR: c0144138 SP: db391ef8 SIZE: 56]
db391ef8: c2907600 c0a7be74 00000001 00000000
db391f08: 00000000 db367080 db80ec14 db367098
db391f18: db390000 db390000 c0ab39a3 00000001
db391f28: db80ec00 c0144138
#2 [
[PC: c0144138 LR: c0149c94 SP: db391f30 SIZE: 56]
db391f30: c0144000 00000000 00000000 db390000
db391f40: db391f64 db8b3e98 00000000 db367080
db391f50: c0144000 00000000 00000000 00000000
db391f60: 00000000 c0149c94
#3 [
[PC: c0149c94 LR: c010f498 SP: db391f68 SIZE: 72]
db391f68: 04000000 00000000 00000000 db367080
db391f78: 00000000 00000000 db391f80 db391f80
db391f88: 00000000 00000000 db391f90 db391f90
db391f98: db391fac db8b3e98 c0149bf0 00000000
db391fa8: 00000000 c010f498
PC program counter,指向当前指向的指令;
LR link register,指向下一条指向的指令;
SP stack pointer,Linux栈的生长方向是由高地址向低地址。
分析下上面红颜色标记的栈数据的含义,首先反汇编vmlinux得到:
static void process_one_work(struct worker *worker, struct work_struct *work)
162360 __releases(&pool->lock)
162361 __acquires(&pool->lock)
162362 {
162363 c0143928: e92d4ff0 push {r4, r5, r6, r7, r8, r9, sl, fp, lr}
162364 c014392c: e1a05001 mov r5, r1
162365 c0143930: e5913000 ldr r3, [r1]
162366 c0143934: e24dd014 sub sp, sp, #20
162367 c0143938: e1a04000 mov r4, r0
……
可以看出从后面开始依次是lr, fp, sl, r9, r8, r7, r6, r5, r4,其他的是后来入栈的数据,可以对照汇编查找。
c2907600 c0a7be74 00000001 00000000
00000000 db367080 db80ec14 db367098
db390000 db390000 c0ab39a3 00000001
db80ec00 c0144138
4、struct指令, 通过上面的调用栈信息可以恢复相关的数据,比如struct work_struct。
crash-arm> struct work_struct c0a5f02c
struct work_struct {
data = {
counter = 0
},
entry = {
next = 0x0,
prev = 0xc0a5f034
},
func = 0xc0a5f034
}
5、whatis 获取函数原型
crash-arm> whatis try_to_suspend
void try_to_suspend(struct work_struct *);
6、解析出logcat
加载外部logcat.so
crash-arm> extend logcat.so
crash-arm> logcat
7、help, 更多指令可以输入help查询或http://people.redhat.com/anderson/crash_whitepaper
Case study
1、制造kernel panic可以添加空指针,也可以echo c > /proc/sysrq-trigger。我在代码里做了
如下修改:
+++kernel/power/autosleep.c
@@ -26,12 +30,16 @@
static void try_to_suspend(struct work_struct *work)
{
unsigned int initial_count, final_count;
+ int *p = 0;
if (!pm_get_wakeup_count(&initial_count, true))
goto out;
mutex_lock(&autosleep_lock);
+ if (work->func != NULL)
+ *p = 6;
+
if (!pm_save_wakeup_count(initial_count) ||
当work->func不为NULL(这里只是为了做实验,work->func肯定不会为NULL)时,给指向地址0的指针P赋值导致出现panic。
2、 执行log指令,从解析的kmsg信息中可以定位到出现panic的具体位置
PC is at try_to_suspend+0x38/0xe0
pc : [
0x38偏移量, 0xE0是try_to_suspend函数的总长度
1087 [ 82.566833] c0 37 (kworker/u8:1) Unable to handle kernel NULL pointer dereference at virtual address 00000000
1088 [ 82.577697] c0 37 (kworker/u8:1) pgd = c0104000
1089 [ 11.830322] c0 37 (kworker/u8:1) SEH:seh_api_ioctl_handler 6
1090
1091 [ 82.582458] c0 37 (kworker/u8:1) [00000000] *pgd=00000000
1092 [ 82.587860] c0 37 (kworker/u8:1)
1093 [ 82.589965] c0 37 (kworker/u8:1) Internal error: Oops: 805 [#1] PREEMPT SMP ARM
1094 [ 82.597259] c0 37 (kworker/u8:1) Modules linked in: audiostub cidatattydev gs_modem ccinetdev cci_datastub citt y iml_module seh cploaddev msocketk geu galcore(O)
1095 [ 82.610107] c0 37 (kworker/u8:1) CPU: 0 PID: 37 Comm: kworker/u8:1 Tainted: G W O 3.10.33 #51
1096 [ 82.619354] c0 37 (kworker/u8:1) Workqueue: autosleep try_to_suspend
1097 [ 82.623901] c0 37 (kworker/u8:1) task: db34a640 ti: db390000 task.ti: db390000
1098 [ 82.631164] c0 37 (kworker/u8:1) PC is at try_to_suspend+0x38/0xe0
1099 [ 82.637359] c0 37 (kworker/u8:1) LR is at try_to_suspend+0x28/0xe0
1100 [ 82.643585] c0 37 (kworker/u8:1) pc : [
1101 sp : db391ee8 ip : 00000000 fp : 00000000
1102 [ 82.656921] c0 37 (kworker/u8:1) r10: db2a5400 r9 : 00000000 r8 : db390000
1103 [ 82.664001] c0 37 (kworker/u8:1) r7 : db80ec00 r6 : c0ab3d34 r5 : c0a5f01c r4 : c0a5f01c
1104 [ 82.672393] c0 37 (kworker/u8:1) r3 : 00000000 r2 : 00000006 r1 : 200e0013 r0 : c0a5f02c
1105 [ 82.680755] c0 37 (kworker/u8:1) Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel
3、反汇编vmlinux
arm-linux-androideabi-objdump -C -S vmlinux > vmlinux-dump
通过地址c016ad38可以查找到是执行下面这条指令出现了panic,从kmsg可以得知r3 : 00000000、r2 : 00000006,向地址0x0赋值肯定是非法的。
272190 c016ad38: 15832000 strne r2, [r3]
执行*p = 6的条件是work->func != NULL,R0寄存器的值是try_to_suspend()函数的参数struct work_struct *。R0~R3为什么被用来装载函数参数,可以搜索下APCS标准。
if (work->func != NULL)
*p = 6;
执行 struct work_struct c0a5f02c 可以恢复当时的struct work_struct,可以清楚看到work->func并不为NULL。
crash-arm> struct work_struct c0a5f02c
struct work_struct {
data = {
counter = 0
},
entry = {
next = 0x0,
prev = 0xc0a5f034
},
func = 0xc0a5f034
}
上面只是给出一个简单的例子用作学习,实际调试过程中遇到的panic肯定不会像例子这么简单。
参考:
http://blog.csdn.net/keyboardota/article/details/6799054
http://people.redhat.com/anderson/crash_whitepaper