Android系统自带一个实用的程序异常退出的诊断daemondebuggerd。此进程可以侦测到程序崩溃,并将崩溃时的进程状态信息输出到文件和串口中,以供开发人员分析调试使用。
Debuggerd的数据,被保存在/data/tombstone/目录下(名字取的也很形象,tombstone是墓碑的意思),共可保存10个文件,当超过10个时,会覆盖重写最早生产的文件。串口中,则直接用DEBUG的tag,输出logcat信息。
Debuggerd的输出格式大约如下:
I/DEBUG ( 9114): *** *** *** *** *** *** *** *** *** *** *** *** *** ****** ***
I/DEBUG ( 9114): Build fingerprint:'generic/gs701b/gs701b:4.0.3/IML74K/eng.andy.xia.20120827.120650:user/test-keys'
I/DEBUG ( 9114): pid: 11053, tid: 11065 >>> net.osaris.turbofly<<<
I/DEBUG ( 9114): signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), fault addr42771108
I/DEBUG ( 9114): zr 00000000 at 00000000 v0 5596dbd8 v1 41c65990
I/DEBUG ( 9114): a0 42771014 a1 5596dbd8 a2 42771014 a3 5596dbd8
I/DEBUG ( 9114): t0 5596dbd8 t1 53f6b000 t2 00000001 t3 000000a8
I/DEBUG ( 9114): t4 5596dc58 t5 00000080 t6 0000001c t7 000000e4
I/DEBUG ( 9114): s0 540d96f0 s1 41c65990 s2 00000015 s3 00000015
I/DEBUG ( 9114): s4 2c143d40 s5 4cf8bdf4 s6 2b8d0178 s7 2c13cdd8
I/DEBUG ( 9114): t8 0000000f t9 53f70748 k0 000000d8 k1 00000000
I/DEBUG ( 9114): gp 53f93d50 sp 4f7feaf8 s8 4f7feb68 ra 53f7334c
I/DEBUG ( 9114): hi 00000000 lo 01910000 bva 42771108 epc 53f73374
I/DEBUG ( 9114): #00 pc 53f73374 sp 4f7feaf8 /system/lib/egl/libGLESv1_CM_VIVANTE.so
I/DEBUG ( 9114): #01 pc 53f73570 sp 4f7feb28 /system/lib/egl/libGLESv1_CM_VIVANTE.so:glBindTexture+468
I/DEBUG ( 9114): #02 pc 2b76c16c sp 4f7feb58 /system/lib/libdvm.so:dvmPlatformInvoke+220
I/DEBUG ( 9114): #03 pc 41a59c00 sp 4f7feb70 /dev/ashmem/dalvik-LinearAlloc (deleted)
I/DEBUG ( 9114):
I/DEBUG ( 9114): code around pc:
I/DEBUG ( 9114): 53f73354 8fa70018 acf100f4 8e2600f8 8fa50018 ..........&.....
I/DEBUG ( 9114): 53f73364 aca600f8 8fa20018 8c4300f4 8c4400f8 ..........C...D.
I/DEBUG ( 9114): 53f73374 ac8200f4 ac6200f8 8fa30018 8fbf002c ......b.....,...
I/DEBUG ( 9114): 53f73384 00601021 8fb20028 8fb10024 8fb00020 !.`.(...$......
I/DEBUG ( 9114): 53f73394 03e00008 27bd0030 3c1c0002 279c09b4 ....0..'...<...'
I/DEBUG ( 9114):
I/DEBUG ( 9114): code around ra:
I/DEBUG ( 9114): 53f7332c 8fbc0010 04400013 00001821 8f8981b4 ......@.!.......
I/DEBUG ( 9114): 53f7333c 8fa50018 25396d30 0320f809 02002021 ....0m9%...! ..
I/DEBUG ( 9114): 53f7334c 8fa80018 ad120000 8fa70018 acf100f4 ................
I/DEBUG ( 9114): 53f7335c 8e2600f8 8fa50018 aca600f8 8fa20018 ..&.............
I/DEBUG ( 9114): 53f7336c 8c4300f4 8c4400f8 ac8200f4 ac6200f8 ..C...D.......b.
I/DEBUG ( 9114):
I/DEBUG ( 9114): memory map around addr 42771108:
I/DEBUG ( 9114): 4265e000-4266d000 /system/framework/ext.jar
I/DEBUG ( 9114): 4266d000-427da000 /system/framework/ext.odex
I/DEBUG ( 9114): 427da000-431d7000 /system/framework/framework.odex
I/DEBUG ( 9114):
I/DEBUG ( 9114): stack:
I/DEBUG ( 9114): 4f7feab8 00000002
I/DEBUG ( 9114): 4f7feabc 002a3bc0 [heap]
I/DEBUG ( 9114): 4f7feac0 4f7fe0a8
I/DEBUG ( 9114): 4f7feac4 002a52a0 [heap]
I/DEBUG ( 9114): 4f7feac8 00009004
I/DEBUG ( 9114): 4f7feacc 00000000
I/DEBUG ( 9114): 4f7fead0 00000000
I/DEBUG ( 9114): 4f7fead4 00000000
I/DEBUG ( 9114): 4f7fead8 540d96f0
I/DEBUG ( 9114): 4f7feadc 41c65990 /dev/ashmem/dalvik-LinearAlloc(deleted)
I/DEBUG ( 9114): 4f7feae0 53f93d50 /system/lib/egl/libGLESv2_VIVANTE.so
I/DEBUG ( 9114): 4f7feae4 00000015
I/DEBUG ( 9114): 4f7feae8 2c143d40 /dev/ashmem/dalvik-heap (deleted)
I/DEBUG ( 9114): 4f7feaec 540d96f0
I/DEBUG ( 9114): 4f7feaf0 41c65990 /dev/ashmem/dalvik-LinearAlloc(deleted)
I/DEBUG ( 9114): 4f7feaf4 53f7334c /system/lib/egl/libGLESv1_CM_VIVANTE.so
I/DEBUG ( 9114): #00 4f7feaf8 00000000
I/DEBUG ( 9114): 4f7feafc 0026aed0 [heap]
I/DEBUG ( 9114): 4f7feb00 00000de1
I/DEBUG ( 9114): 4f7feb04 4fedbc78 /system/lib/egl/libEGL_VIVANTE.so:veglGetCurrentAPIContext+36
I/DEBUG ( 9114): 4f7feb08 53f93d50 /system/lib/egl/libGLESv2_VIVANTE.so
I/DEBUG ( 9114): 4f7feb0c 4fedbc78 /system/lib/egl/libEGL_VIVANTE.so:veglGetCurrentAPIContext+36
I/DEBUG ( 9114): 4f7feb10 5596dbd8
I/DEBUG ( 9114): 4f7feb14 53f5467c /system/lib/egl/libGLESv1_CM_VIVANTE.so:glBindBuffer+56
I/DEBUG ( 9114): 4f7feb18 00000de1
I/DEBUG ( 9114): 4f7feb1c 00000000
I/DEBUG ( 9114): 4f7feb20 540d8f18
I/DEBUG ( 9114): 4f7feb24 53f73570 /system/lib/egl/libGLESv1_CM_VIVANTE.so:glBindTexture+468
I/DEBUG ( 9114): #01 4f7feb28 53f93d50 /system/lib/egl/libGLESv2_VIVANTE.so
I/DEBUG ( 9114): 4f7feb2c 0026ef70 [heap]
I/DEBUG ( 9114): 4f7feb30 00000001
I/DEBUG ( 9114): 4f7feb34 00000000
I/DEBUG ( 9114): 4f7feb38 53f93d50 /system/lib/egl/libGLESv2_VIVANTE.so
I/DEBUG ( 9114): 4f7feb3c 2c143d40 /dev/ashmem/dalvik-heap (deleted)
I/DEBUG ( 9114): 4f7feb40 4cf8be2c
I/DEBUG ( 9114): 4f7feb44 0026ef70 [heap]
I/DEBUG ( 9114): 4f7feb48 00000001
I/DEBUG ( 9114): 4f7feb4c 00000000
I/DEBUG ( 9114): 4f7feb50 0026ef60 [heap]
I/DEBUG ( 9114): 4f7feb54 2b76c16c /system/lib/libdvm.so:dvmPlatformInvoke+220
I/DEBUG ( 9114): #02 4f7feb58 02320000
I/DEBUG ( 9114): 4f7feb5c 01910000
I/DEBUG ( 9114): 4f7feb60 00000de1
I/DEBUG ( 9114): 4f7feb64 00000015
I/DEBUG ( 9114): 4f7feb68 2b8d6530 /system/lib/libGLESv1_CM.so
I/DEBUG ( 9114): 4f7feb6c 41a59c00 /dev/ashmem/dalvik-LinearAlloc(deleted)
I/DEBUG ( 9114): 4f7feb70 41a59c00 /dev/ashmem/dalvik-LinearAlloc(deleted)
I/DEBUG ( 9114): 4f7feb74 00000001
I/DEBUG ( 9114): 4f7feb78 00000014
I/DEBUG ( 9114): 4f7feb7c 2b7d5328 /system/lib/libdvm.so
I/DEBUG ( 9114): 4f7feb80 2b8d6530 /system/lib/libGLESv1_CM.so
I/DEBUG ( 9114): 4f7feb84 2ab2126c /system/lib/libc.so
I/DEBUG ( 9114): 4f7feb88 00000002
I/DEBUG ( 9114): 4f7feb8c 00000033
I/DEBUG ( 9114): 4f7feb90 4cf8bdf4
I/DEBUG ( 9114): 4f7feb94 42e38a35 /system/framework/framework.odex
I/DEBUG ( 9114): 4f7feb98 2ac78884 /system/lib/libandroid_runtime.so
I/DEBUG ( 9114): 4f7feb9c 0026ef70 [heap]
从这些数据中,我们可以看到如下信息:
编译版本:
出错的进程和线程:
错误原因:
寄存器信息
调用堆栈
关键位置的memorydump
栈帧信息
Linuxkernel有自己的一套signal机制,在应用程序崩溃时,通常系统内核都会发送signal到出问题的进程,以通知进程出现什么异常,这些进程可以捕获这些signal并对其做相应的处理。通常对于程序异常信号的处理,就是退出。
Android在此机制上,实现了一个更实用的功能:拦截这些信号,dump进程信息以供调试。
在一个新进程启动时,android的实现是在其中插入debugger_init方法,以实现拦截系统异常的几个singal:SIGILL,SIGABRT, SIGBUS, SIGFPE, SIGSEGV和SIGPIPE,代码位于:bionic/linker/debugger.c
voiddebugger_init()
{
struct sigaction act;
memset(&act, 0, sizeof(act));
act.sa_sigaction = debugger_signal_handler;
act.sa_flags = SA_RESTART | SA_SIGINFO;
sigemptyset(&act.sa_mask);
sigaction(SIGILL, &act, NULL);
sigaction(SIGABRT, &act, NULL);
sigaction(SIGBUS, &act, NULL);
sigaction(SIGFPE, &act, NULL);
sigaction(SIGSEGV, &act, NULL);
#ifdefined(SIGSTKFLT)
sigaction(SIGSTKFLT, &act, NULL);
#endif
sigaction(SIGPIPE, &act, NULL);
}
Debugger_init的调用时机,是在应用程序入口地址__start后,__linker_init中调用的。这部分属于bionic实现的一部分,则对所有android的程序有效(android和传统的linux下基于glibc的不同,glibc的interpreter是/lib/ld-linux-xx.so.2,android的interpreter是/system/bin/linker)。
对于捕获的异常,异常处理函数:
/*
*Catches fatal signals so we can ask debuggerd to ptrace us before wecrash.
*/
voiddebugger_signal_handler(int n, siginfo_t* info, void* unused__attribute__((unused)))
{
char msgbuf[128];
unsigned tid;
int s;
/*
* It's possible somebody cleared the SA_SIGINFO flag, which wouldmean
* our "info" arg holds an undefined value.
*/
if (!haveSiginfo(n)) {
info = NULL;
}
logSignalSummary(n, info);
tid = gettid();
s = socket_abstract_client(DEBUGGER_SOCKET_NAME, SOCK_STREAM);
//#defineDEBUGGER_SOCKET_NAME "android:debuggerd"
if (s >= 0) {
/* debugger knows our pid from the credentials on the
* local socket but we need to tell it our tid. It
* is paranoid and will verify that we are giving a tid
* that's actually in our process
*/
int ret;
debugger_msg_t msg;
msg.action = DEBUGGER_ACTION_CRASH;
msg.tid = tid;
RETRY_ON_EINTR(ret, write(s, &msg, sizeof(msg)));
if (ret == sizeof(msg)) {
/* if the write failed, there is no point to read on
* the file descriptor. */
RETRY_ON_EINTR(ret, read(s, &tid, 1));
int savedErrno = errno;
notify_gdb_of_libraries();
errno = savedErrno;
}
if(ret < 0) {
/* read or write failed -- broken connection? */
format_buffer(msgbuf, sizeof(msgbuf),
"Failed while talking to debuggerd: %s",strerror(errno));
__libc_android_log_write(ANDROID_LOG_FATAL, "libc",msgbuf);
}
close(s);
} else {
/* socket failed; maybe process ran out of fds */
format_buffer(msgbuf, sizeof(msgbuf),
"Unable to open connection to debuggerd: %s",strerror(errno));
__libc_android_log_write(ANDROID_LOG_FATAL, "libc",msgbuf);
}
/* remove our net so we fault for real when we return */
signal(n, SIG_DFL);
/*
* These signals are not re-thrown when we resume. This means that
* crashing due to (say) SIGPIPE doesn't work the way you'd expectit
* to. We work around this by throwing them manually. We don'twant
* to do this for *all* signals because it'll screw up the addressfor
* faults like SIGSEGV.
*/
switch (n) {
case SIGABRT:
case SIGFPE:
case SIGPIPE:
#ifdefSIGSTKFLT
case SIGSTKFLT:
#endif
(void) tgkill(getpid(), gettid(), n);
break;
default: // SIGILL, SIGBUS, SIGSEGV
break;
}
}
从代码可见,这是socket的客户端,通过向名为android:debuggerd的socket,发送一个消息,参数是tid:也就是出错的线程ID。
这里,进程挂起,等待socket的服务端:也就是debuggerd,处理这个事件。
debuggerd这个daemon,是具体处理进程退出时,tombstone生成的服务,代码位于:system/core/debuggerd/debuggerd.c,看其main函数,这里即是android:debuggerd的服务端:
s =socket_local_server(DEBUGGER_SOCKET_NAME,
ANDROID_SOCKET_NAMESPACE_ABSTRACT, SOCK_STREAM);
if(s < 0) return1;
fcntl(s, F_SETFD,FD_CLOEXEC);
LOG("debuggerd:" __DATE__ " " __TIME__ "\n");
for(;;) {
struct sockaddraddr;
socklen_talen;
int fd;
alen =sizeof(addr);
XLOG("waitingfor connection\n");
fd = accept(s,&addr, &alen);
if(fd < 0){
XLOG("accept failed: %s\n", strerror(errno));
continue;
}
fcntl(fd,F_SETFD, FD_CLOEXEC);
handle_request(fd);
}
return 0;
}
当一个进程由于发生异常时,通过前一部分的介绍的debugger_signal_handler,会通过socket向debuggerd进程发送消息,这里,socket将accept到消息,通过handle_request(fd);来处理这个异常。在handle_request中,首先通过read_request(fd,&request),获取到socket通信的另外一端的信息:pid,uid和gid。然后从socket中,读到debugger_signal_handler送过来的tid,自此debuggerd即可知道需要被调试进程的信息了。
for (;;) {
intsignal = wait_for_signal(request.tid, &total_sleep_time_usec);
if(signal < 0) {
break;
}
switch (signal) {
case SIGSTOP:
if (request.action == DEBUGGER_ACTION_DUMP_TOMBSTONE) {
XLOG("stopped -- dumping to tombstone\n");
tombstone_path = engrave_tombstone(request.pid, request.tid,
signal, true, true, &detach_failed,
&total_sleep_time_usec);
} else if (request.action == DEBUGGER_ACTION_DUMP_BACKTRACE) {
XLOG("stopped -- dumping to fd\n");
dump_backtrace(fd, request.pid, request.tid, &detach_failed,
&total_sleep_time_usec);
} else {
XLOG("stopped -- continuing\n");
status = ptrace(PTRACE_CONT, request.tid, 0, 0);
if (status) {
LOG("ptrace continue failed: %s\n",strerror(errno));
}
continue; /* loop again */
}
break;
case SIGILL:
case SIGABRT:
case SIGBUS:
case SIGFPE:
case SIGSEGV:
case SIGPIPE:
#ifdef SIGSTKFLT
case SIGSTKFLT:
#endif
{
XLOG("stopped -- fatal signal\n");
/*
* Send a SIGSTOP to the process to make all of
* the non-signaled threads stop moving. Without
* this we get a lot of "ptrace detach failed:
* No such process".
*/
kill(request.pid, SIGSTOP);
/* don't dump sibling threads when attaching to GDB because it
* makes the process less reliable, apparently... */
tombstone_path = engrave_tombstone(request.pid, request.tid,
signal, !attach_gdb, false, &detach_failed,
&total_sleep_time_usec);
break;
}
default:
XLOG("stopped -- unexpected signal\n");
LOG("process stopped due to unexpected signal %d\n",signal);
break;
}
break;
}
读取客户端送过的tid,tid是标明那个线程ID执行中遇到错误了,debuggerd就专门针对该线程dump出其寄存器、backtrace和栈信息以供调试。ptrace(PTRACE_ATTACH,request.tid, 0,0)这里,debuggerd就挂上ptrace了,attach到出问题的线程,这样debuggerd就可以控制tid线程了。ptrace的实现,attach上之后,debuggerd进程就是被调试进程的父进程了,PTRACE_ATTACH会向被调试进程发送SIGSTOP。由于之前,在目标进程的signal处理函数中,是堵在socket的read中(这样做是等待被debuggerd响应到),TEMP_FAILURE_RETRY(write(fd,"\0", 1)) != 1)这里写一下,则read可以读到数据,等待结束,之后如果使用ptrace(PTRACE_CONT,request.tid, 0, 0)的话,被调线程可以继续执行。
signal= wait_for_signal(request.tid,&total_sleep_time_usec);这里查看wait的被调试进程的signal状态。
switch(signal) {
case SIGSTOP:
if (request.action ==DEBUGGER_ACTION_DUMP_TOMBSTONE) {
XLOG("stopped -- dumping totombstone\n");
tombstone_path =engrave_tombstone(request.pid, request.tid,
signal, true, true, &detach_failed,
&total_sleep_time_usec);
} else if (request.action ==DEBUGGER_ACTION_DUMP_BACKTRACE) {
XLOG("stopped -- dumping to fd\n");
dump_backtrace(fd, request.pid, request.tid,&detach_failed,
&total_sleep_time_usec);
} else {
XLOG("stopped -- continuing\n");
status = ptrace(PTRACE_CONT, request.tid, 0,0);
if (status) {
LOG("ptrace continue failed: %s\n",strerror(errno));
}
continue; /* loop again */
}
break;
case SIGILL:
case SIGABRT:
case SIGBUS:
case SIGFPE:
case SIGSEGV:
case SIGPIPE:
#ifdefSIGSTKFLT
case SIGSTKFLT:
#endif
{
XLOG("stopped -- fatal signal\n");
/*
* Send a SIGSTOP to the process to make all of
* the non-signaled threads stop moving. Without
* this we get a lot of "ptrace detachfailed:
* No such process".
*/
kill(request.pid, SIGSTOP);
/* don't dump sibling threads when attaching toGDB because it
* makes the process less reliable, apparently...*/
tombstone_path = engrave_tombstone(request.pid,request.tid,
signal, !attach_gdb, false,&detach_failed,
&total_sleep_time_usec);
break;
}
default:
XLOG("stopped -- unexpected signal\n");
LOG("process stopped due to unexpectedsignal %d\n", signal);
break;
}
break;
}
这块是debuggerd最核心的部分:生产tombstone的调试信息。
整个debuggerd的工作流程如下图:
Mips的栈帧结构简介
问题分析及定位方法
有用的信息
Debuggerd除了会在进程异常时产生tombstone外,还可以协助我们debug这个进程。使用方法是:
在串口中,设置需要debug的应用程序的uid后,如果这个程序出现异常,即可挂上gdb调试。
Androiduid的规则,所有zygote启动的app,都是从10000开始,比如ps时,看到一个app叫app_23,则可以设置:setpropdebug.db.uid 10023,即可debug此进程。
对于一些库,可能没有符号信息,这样在tombsotne中打印的trace,很难查看具体出错的函数,可以通过在该库模块的编译选项中,注释掉,重编编译,即可得到带符号信息的库。