引起ANR问题的根本原因,总的来说可以归纳为两类:
- 应用进程自身引起的,例如:
主线程阻塞、挂起、死循环
应用进程的其他线程的CPU占用率高,使得主线程无法抢占到CPU时间片
2.其他进程间接引起的,例如:
当前应用进程进行进程间通信请求其他进程,其他进程的操作长时间没有反馈
其他进程的CPU占用率高,使得当前应用进程无法抢占到CPU时间片
分析ANR问题时,以上述可能的2种原因为线索,通过分析各种日志信息,大多数情况下你就可以很容易找到问题所在了。
从log分析
在BottomBarManager里, 当点击find_in_page button时, sleep 10秒,模拟一个ANR。
用"ActivityManager"作为filter tag, 可以得到下面的信息。(btw, ultraedit有个filter 字符串的功能非常好, 看看linux下有没有类似的功能?)
可以得到ANR发生在哪个进程, reason, 在ANR发生前一段时间和发生后的一段时间, CPU占用率的统计信息.
06-02 11:39:42.260 743-943/? E/ActivityManager﹕ ANR in com.qihoo.browser (com.qihoo.browser/org.chromium.
chrome.browser.ChromeTabbedActivity)
Reason: keyDispatchingTimedOut
//CPU前一分钟、五分钟、十五分钟的CPU平均负载,
//CPU平均负载可以理解为一段时间内正在使用和等待使用CPU的活动进程的平均数量。
Load: 5.16 / 9.69 / 30.66
//请注意ago,表示ANR发生之前的一段时间内的CPU使用率,并不是某一时刻的值
CPU usage from 34388ms to -1ms ago:
4.1% 32614/com.qihoo.browser: 2.5% user + 1.6% kernel / faults: 465 minor 1 major
3% 379/adbd: 0.2% user + 2.8% kernel / faults: 1653 minor
2.6% 743/system_server: 1.8% user + 0.7% kernel / faults: 689 minor
2.5% 2326/com.qihoo.daemon: 2.2% user + 0.2% kernel / faults: 601 minor
2.4% 1009/RX_Thread: 0% user + 2.4% kernel
2% 280/surfaceflinger: 1.2% user + 0.8% kernel / faults: 1361 minor
1.8% 2675/com.lbe.security:service: 1.7% user + 0.1% kernel / faults: 749 minor
1.1% 1768/mpdecision: 0.1% user + 1% kernel
0.9% 29165/kworker/0:2: 0% user + 0.9% kernel
0.7% 23576/com.qihoo.appstore: 0.5% user + 0.1% kernel / faults: 1401 minor
0.5% 28500/kworker/0:0: 0% user + 0.5% kernel
0.4% 1055/com.android.systemui: 0.3% user + 0% kernel / faults: 48 minor
0.4% 6850/kworker/u:1: 0% user + 0.4% kernel
0.3% 23077/com.qihoo.appstore:PluginP01: 0.2% user + 0.1% kernel / faults: 200 minor
0.2% 271/mediaserver: 0.2% user + 0% kernel / faults: 46 minor
0.3% 6883/kworker/u:37: 0% user + 0.3% kernel
0.2% 302/illumination_service: 0% user + 0.2% kernel
0.2% 134/mmcqd/0: 0% user + 0.2% kernel
0.1% 32727/com.qihoo.browser:privileged_process0: 0% user + 0% kernel / faults: 288 minor
0.1% 2885/com.qihoo.browser:sandboxed_process1: 0% user + 0% kernel
0.1% 3378/com.sonyericsson.playnowchina.android: 0.1% user + 0% kernel
0.1% 2027/com.cleanmaster.mguard_cn:service: 0% user + 0.1% kernel / faults: 3 minor
0.1% 1008/TX_Thread: 0% user + 0.1% kernel
0.1% 28341/com.sonyericsson.dlna: 0.1% user + 0% kernel
0% 130/cfinteractive: 0% user + 0% kernel
0% 293/ric: 0% user + 0% kernel / faults: 93 minor
0% 2468/com.qihoo360.mobilesafe.clean: 0% user + 0% kernel / faults: 45 minor
0% 4861/com.tencent.mm: 0% user + 0% kernel / faults: 9 minor
0% 127/irq/294-clearpa: 0% user + 0% kernel
0% 281/zygote: 0% user + 0% kernel / faults: 129 minor
0% 1007/MC_Thread: 0% user + 0% kernel
0% 2648/com.qihoo.appstore_CoreDaemon: 0% user + 0% kernel
0% 29640/com.lbe.security: 0% user + 0% kernel / faults: 1 minor
0% 155/jbd2/mmcblk0p26: 0% user + 0% kernel
0% 170/flush-179:0: 0% user + 0% kernel
0% 269/servicemanager: 0% user + 0% kernel
0% 276/system_monitor: 0% user + 0% kernel
0% 291/qrngd: 0% user + 0% kernel
0% 365/sdcard: 0% user + 0% kernel
0% 1041/wpa_supplicant: 0% user + 0% kernel
0% 1360/com.android.nfc: 0% user + 0% kernel
0% 2958/lbesec.loader: 0% user + 0% kernel / faults: 14 minor
0% 5128/com.tencent.mm:push: 0% user + 0% kernel / faults: 2 minor
0% 6028/com.UCMobile:SearchBackgroundProcess: 0% user + 0% kernel
+0% 3318/com.sonyericsson.textinput.chinese: 0% user + 0% kernel
+0% 3544/sh: 0% user + 0% kernel
+0% 3546/logcat: 0% user + 0% kernel
+0% 3682/migration/1: 0% user + 0% kernel
+0% 3683/kworker/1:0: 0% user + 0% kernel
+0% 3684/ksoftirqd/1: 0% user + 0% kernel
9.7% TOTAL: 4.8% user + 4.2% kernel + 0.3% iowait + 0.3% softirq
//这里是later,表示ANR发生之后
CPU usage from 1656ms to 2187ms later:
8.7% 743/system_server: 0% user + 8.7% kernel / faults: 4 minor
7% 943/InputDispatcher: 0% user + 7% kernel
1.7% 1199/Binder_6: 0% user + 1.7% kernel
5.2% 379/adbd: 0% user + 5.2% kernel / faults: 27 minor
3.5% 379/adbd: 0% user + 3.5% kernel
1.7% 1009/RX_Thread: 0% user + 1.7% kernel
1.7% 1768/mpdecision: 0% user + 1.7% kernel
1.7% 1784/mpdecision: 0% user + 1.7% kernel
1.2% 6883/kworker/u:37: 0% user + 1.2% kernel
1.3% 29165/kworker/0:2: 0% user + 1.3% kernel
2.3% TOTAL: 0.1% user + 0.3% kernel + 1.8% iowait
06-02 12:00:15.753 743-825/? I/ActivityManager﹕ Killing ProcessRecord{4190b0d8 11880:com.qihoo.browser/u0a10192}: user's request
06-02 12:00:15.813 743-1332/? I/ActivityManager﹕ Process com.qihoo.browser (pid 11880) has died.
用"Proces"作为filter tag,可以看到
//向浏览器进程(11880)发送signal3, 也就是SIGNAL_QUIT.
06-02 12:00:10.889 743-943/? I/Process﹕ Sending signal. PID: 11880 SIG: 3
用"dalvikvm"作为filter tag,可以看到
com.qihoo.browser进程处理SIGNAL_QUIT signal, 并把anr信息写入traces.txt文件.
进程的虚拟机实例接收到SIGNAL_QUIT信号后会由"Signal Catcher"线程将进程中各个线程的函数堆栈信息输出到traces.txt文件中.
值得注意的是, 当前运行的其他进程也会把当前的函数堆栈信息输出到traces.txt文件中, 但发生ANR的进程正常情况下会第一个输出,所以一般情况下我们只看traces.txt的开头就行了。
06-02 11:51:31.252 8857-8862/com.qihoo.browser I/dalvikvm﹕ threadid=3: reacting to signal 3
06-02 11:51:31.422 8857-8862/com.qihoo.browser I/dalvikvm﹕ Wrote stack traces to '/data/anr/traces.txt'
06-02 11:51:31.422 743-748/? I/dalvikvm﹕ threadid=3: reacting to signal 3
06-02 11:51:32.243 1275-1280/? I/dalvikvm﹕ Wrote stack traces to '/data/anr/traces.txt'
06-02 11:51:32.253 1282-1287/? I/dalvikvm﹕ threadid=3: reacting to signal 3
06-02 11:51:32.283 1282-1287/? I/dalvikvm﹕ Wrote stack traces to '/data/anr/traces.txt'
06-02 11:51:32.283 1348-1353/? I/dalvikvm﹕ threadid=3: reacting to signal 3
06-02 11:51:32.283 1360-1365/? I/dalvikvm﹕ Wrote stack traces to '/data/anr/traces.txt'
从以上的分析能够知道, 发生ANR时Android为我们提供了两种“利器”:traces文件和CPU使用率。
分析CPU负载和CPU使用率
这部分讲的还不是很清楚, 就先不做总结了, 以后有需要再系统的学习这小节.
从traces.txt分析
只需要看开头,每次发生ANR, 这个文件都会被清空,写入新的内容. 如果想查看以前发生ANR的信息, 可以去查看DB文件.
----- pid 11880 at 2016-06-02 12:00:10 -----
Cmd line: com.qihoo.browser
DALVIK THREADS:
(mutexes: tll=0 tsl=0 tscl=0 ghl=0)
"main" prio=5 tid=1 TIMED_WAIT
| group="main" sCount=1 dsCount=0 obj=0x41458b50 self=0x414481f0
| sysTid=11880 nice=-6 sched=0/0 cgrp=apps handle=1075152348
| state=S schedstat=( 8455108583 3379699755 13157 ) utm=664 stm=181 core=0
at java.lang.VMThread.sleep(Native Method)
at java.lang.Thread.sleep(Thread.java:1031)
at java.lang.Thread.sleep(Thread.java:1013)
at com.qihoo.browser.BottomBarManager.handlePopupMenuAction(BottomBarManager.java:1220)
at com.qihoo.browser.BottomBarManager.actionPerformed(BottomBarManager.java:629)
at com.qihoo.browser.view.PopupMenu.onClick(PopupMenu.java:514)
at android.view.View.performClick(View.java:4278)
at android.view.View$PerformClick.run(View.java:17430)
at android.os.Handler.handleCallback(Handler.java:725)
at android.os.Handler.dispatchMessage(Handler.java:92)
at android.os.Looper.loop(Looper.java:213)
at android.app.ActivityThread.main(ActivityThread.java:5092)
at java.lang.reflect.Method.invokeNative(Native Method)
at java.lang.reflect.Method.invoke(Method.java:511)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:797)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:564)
at dalvik.system.NativeStart.main(Native Method)
"pool-9-thread-1" prio=5 tid=52 WAIT
| group="main" sCount=1 dsCount=0 obj=0x41e3a6b8 self=0x7ce144e0
| sysTid=12261 nice=0 sched=0/0 cgrp=apps handle=2008331720
| state=S schedstat=( 37597659 95306395 149 ) utm=2 stm=1 core=1
at java.lang.Object.wait(Native Method)
- waiting on <0x41e3a7e0> (a java.lang.VMThread) held by tid=52 (pool-9-thread-1)
at java.lang.Thread.parkFor(Thread.java:1231)
at sun.misc.Unsafe.park(Unsafe.java:323)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:159)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2019)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:413)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1013)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1073)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:573)
at java.lang.Thread.run(Thread.java:856)
//“Signal Catcher”负责接收和处理kernel发送的各种信号,例如SIGNAL_QUIT、SIGNAL_USR1等就是被该线程
//接收到并处理的,traces.txt 文件中的内容就是由该线程负责输出的,可以看到它的状态是RUNNABLE.
"Signal Catcher" daemon prio=5 tid=3 RUNNABLE
| group="system" sCount=0 dsCount=0 obj=0x41735a58 self=0x6256b008
| sysTid=11885 nice=0 sched=0/0 cgrp=apps handle=1074396464
| state=R schedstat=( 41015624 121002197 61 ) utm=1 stm=3 core=0
at dalvik.system.NativeStart.run(Native Method)
//JDWP线程是支持虚拟机调试的线程,不需要关心
"JDWP" daemon prio=5 tid=4 VMWAIT
| group="system" sCount=1 dsCount=0 obj=0x41735b50 self=0x72d47008
| sysTid=11886 nice=0 sched=0/0 cgrp=apps handle=1074396224
| state=S schedstat=( 3051760 9521482 15 ) utm=0 stm=0 core=1
#00 pc 000171ec /system/lib/libc.so (select+20)
#01 pc 00062faf /system/lib/libdvm.so
#02 pc 00065b41 /system/lib/libdvm.so
#03 pc 0005637b /system/lib/libdvm.so
#04 pc 0000e4b8 /system/lib/libc.so (__thread_entry+72)
#05 pc 0000dba4 /system/lib/libc.so (pthread_create+160)
at dalvik.system.NativeStart.run(Native Method)
线程有很多状态,了解这些状态的意义对分析ANR的原因是有帮助的.
DropBox中的log
traces.txt只保留最后一次发生ANR时的信息, android 2.2开始增加了DropBox功能, 保留历史上发生的所有ANR的log.
“/data/system/dropbox”是DB指定的文件存放位置.
日志保存的最长时间, 默认是3天.
root@android:/data/system/dropbox # ls
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
root@android:/data/system/dropbox #
SystemServer在启动时, 会创建并添加DROPBOX_SERVICE.
//SystemServer.java
private void startOtherServices() {
... ...
try {
Slog.i(TAG, "DropBox Service");
ServiceManager.addService(Context.DROPBOX_SERVICE,
new DropBoxManagerService(context, new File("/data/system/dropbox")));
} catch (Throwable e) {
reportWtf("starting DropBoxManagerService", e);
}
... ...
}
framework是如何输出ANR信息到traces.txt文件中去的
ActivityManagerService.java中的appNotResponding()方法.
final void appNotResponding(ProcessRecord app, ActivityRecord activity,
ActivityRecord parent, boolean aboveSystem, final String annotation) {
//firstPids和lastPids两个集合存放那些将会在traces中输出信息的进程的进程号
ArrayList firstPids = new ArrayList(5);
SparseArray lastPids = new SparseArray(20);
//mController是IActivityController接口的实例,是为Monkey测试程序预留的,默认为null
if (mController != null) {
try {
// 0 == continue, -1 = kill process immediately
int res = mController.appEarlyNotResponding(app.processName, app.pid, annotation);
if (res < 0 && app.pid != MY_PID) {
app.kill("anr", true);
}
} catch (RemoteException e) {
mController = null;
Watchdog.getInstance().setActivityController(null);
}
}
long anrTime = SystemClock.uptimeMillis();
if (MONITOR_CPU_USAGE) {
updateCpuStatsNow(); //更新CPU使用率
}
synchronized (this) {
//一些特定条件下会忽略ANR
// PowerManager.reboot() can block for a long time, so ignore ANRs while shutting down.
if (mShuttingDown) {
Slog.i(TAG, "During shutdown skipping ANR: " + app + " " + annotation);
return;
} else if (app.notResponding) {
Slog.i(TAG, "Skipping duplicate ANR: " + app + " " + annotation);
return;
} else if (app.crashing) {
Slog.i(TAG, "Crashing app skipping ANR: " + app + " " + annotation);
return;
}
// In case we come through here for the same app before completing
// this one, mark as anring now so we will bail out.
app.notResponding = true;
// Log the ANR to the event log.
EventLog.writeEvent(EventLogTags.AM_ANR, app.userId, app.pid,
app.processName, app.info.flags, annotation);
//当前发生ANR的应用进程被第一个添加进firstPids集合中.
//所以会第一个向traces文件中写入信息。反过来说,traces文件中出现的第一个进程正常情况下就是发生ANR的那个进程。
// Dump thread traces as quickly as we can, starting with "interesting" processes.
firstPids.add(app.pid);
int parentPid = app.pid;
if (parent != null && parent.app != null && parent.app.pid > 0) parentPid = parent.app.pid;
if (parentPid != app.pid) firstPids.add(parentPid);
if (MY_PID != app.pid && MY_PID != parentPid) firstPids.add(MY_PID);
for (int i = mLruProcesses.size() - 1; i >= 0; i--) {
ProcessRecord r = mLruProcesses.get(i);
if (r != null && r.thread != null) {
int pid = r.pid;
if (pid > 0 && pid != app.pid && pid != parentPid && pid != MY_PID) {
if (r.persistent) {
firstPids.add(pid);
} else {
lastPids.put(pid, Boolean.TRUE);
}
}
}
}
}
// Log the ANR to the main log.
StringBuilder info = new StringBuilder();
info.setLength(0);
info.append("ANR in ").append(app.processName);
if (activity != null && activity.shortComponentName != null) {
info.append(" (").append(activity.shortComponentName).append(")");
}
info.append("\n");
info.append("PID: ").append(app.pid).append("\n");
if (annotation != null) {
info.append("Reason: ").append(annotation).append("\n");
}
if (parent != null && parent != activity) {
info.append("Parent: ").append(parent.shortComponentName).append("\n");
}
final ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true);
//dumpStackTraces()是输出ANR信息到traces文件的函数
File tracesFile = dumpStackTraces(true, firstPids, processCpuTracker, lastPids,
NATIVE_STACKS_OF_INTEREST);
String cpuInfo = null;
if (MONITOR_CPU_USAGE) {
updateCpuStatsNow();
synchronized (mProcessCpuTracker) {
cpuInfo = mProcessCpuTracker.printCurrentState(anrTime);
}
//输出ANR发生前一段时间内的CPU使用率
info.append(processCpuTracker.printCurrentLoad());
info.append(cpuInfo);
}
//输出ANR发生后一段时间内的CPU使用率
info.append(processCpuTracker.printCurrentState(anrTime));
//此处, 把ANR的信息输出到main log中.
Slog.e(TAG, info.toString());
if (tracesFile == null) {
// There is no trace file, so dump (only) the alleged culprit's threads to the log
Process.sendSignal(app.pid, Process.SIGNAL_QUIT);
}
//将ANR信息同时输出到DropBox中
addErrorToDropBox("anr", app, app.processName, activity, parent, annotation,
cpuInfo, tracesFile, null);
//在Android4.0中可以设置是否不显示ANR提示对话框,如果设置的话就不会显示对话框,并且会杀掉ANR进程
// Unless configured otherwise, swallow ANRs in background processes & kill the process.
boolean showBackground = Settings.Secure.getInt(mContext.getContentResolver(),
Settings.Secure.ANR_SHOW_BACKGROUND, 0) != 0;
synchronized (this) {
mBatteryStatsService.noteProcessAnr(app.processName, app.uid);
if (!showBackground && !app.isInterestingToUserLocked() && app.pid != MY_PID) {
app.kill("bg anr", true);
return;
}
// Set the app's notResponding state, and look up the errorReportReceiver
makeAppNotRespondingLocked(app,
activity != null ? activity.shortComponentName : null,
annotation != null ? "ANR " + annotation : "ANR",
info.toString());
// 显示ANR提示对话框
// Bring up the infamous App Not Responding dialog
Message msg = Message.obtain();
HashMap map = new HashMap();
msg.what = SHOW_NOT_RESPONDING_MSG;
msg.obj = map;
msg.arg1 = aboveSystem ? 1 : 0;
map.put("app", app);
if (activity != null) {
map.put("activity", activity);
}
mUiHandler.sendMessage(msg);
}
}
public static File dumpStackTraces(boolean clearTraces, ArrayList firstPids,
ProcessCpuTracker processCpuTracker, SparseArray lastPids, String[] nativeProcs) {
//系统属性“dalvik.vm.stack-trace-file”用来配置trace信息输出文件
//之所以trace信息会输出到“/data/anr/traces.txt”文件中,就是系统属性“dalvik.vm.stack-trace-file”设置的
//adb shell下, 可以使用setprop和getprop对系统属性进行设置和读取
//getprop dalvik.vm.stack-trace-file
//setprop dalvik.vm.stack-trace-file /tmp/stack-traces.txt
String tracesPath = SystemProperties.get("dalvik.vm.stack-trace-file", null);
if (tracesPath == null || tracesPath.length() == 0) {
return null;
}
File tracesFile = new File(tracesPath);
try {
File tracesDir = tracesFile.getParentFile();
if (!tracesDir.exists()) {
tracesDir.mkdirs();
if (!SELinux.restorecon(tracesDir)) {
return null;
}
}
//FileUtils.setPermissions是个很有用的函数,设置文件属性时经常会用到
FileUtils.setPermissions(tracesDir.getPath(), 0775, -1, -1); // drwxrwxr-x
//clearTraces为true,会删除旧文件,创建新文件
if (clearTraces && tracesFile.exists()) tracesFile.delete();
tracesFile.createNewFile();
FileUtils.setPermissions(tracesFile.getPath(), 0666, -1, -1); // -rw-rw-rw-
} catch (IOException e) {
Slog.w(TAG, "Unable to prepare ANR traces file: " + tracesPath, e);
return null;
}
dumpStackTraces(tracesPath, firstPids, processCpuTracker, lastPids, nativeProcs);
return tracesFile;
}
private static void dumpStackTraces(String tracesPath, ArrayList firstPids,
ProcessCpuTracker processCpuTracker, SparseArray lastPids, String[] nativeProcs) {
// Use a FileObserver to detect when traces finish writing.
// The order of traces is considered important to maintain for legibility.
//使用FileObserver监听SignalCatcher线程是否已经完成写入traces文件的操作
FileObserver observer = new FileObserver(tracesPath, FileObserver.CLOSE_WRITE) {
@Override
public synchronized void onEvent(int event, String path) {
notify(); //让其他阻塞在observer.wait()这行代码的线程继续往下执行.
}
};
try {
observer.startWatching();
//首先输出firstPids集合中指定的进程,这些也是对ANR问题来说最重要的进程
// First collect all of the stacks of the most important pids.
if (firstPids != null) {
try {
int num = firstPids.size();
for (int i = 0; i < num; i++) {
synchronized (observer) {
//前面提到的SIGNAL_QUIT, 由SignalCatcher线程处理Process.SIGNAL_QUIT信号, 并把进程中的各个线程输出到traces.txt文件中.
//具体代码可以查看/dalvik/vm/目录下的SignalCatcher.cpp::logThreadStacks函数和Thread.cpp:: dvmDumpAllThreadsEx函数。
Process.sendSignal(firstPids.get(i), Process.SIGNAL_QUIT);
observer.wait(200); // Wait for write-close, give up after 200msec
//block here, 直到另一个线程调用了observer.notify() or observer.notifyAll()方法, 再或是等待了200毫秒超时后, 才会继续往下执行代码.
}
}
} catch (InterruptedException e) {
Slog.wtf(TAG, e);
}
}
} finally {
observer.stopWatching();
}
}
Object.java
final void notify()
//Causes a thread which is waiting on this object's monitor (by means of calling one of the wait() methods) to be woken up.
final void notifyAll()
//Causes all threads which are waiting on this object's monitor (by means of calling one of the wait() methods) to be woken up.
final void wait()
//Causes the calling thread to wait until another thread calls the notify() or notifyAll() method of this object.
final void wait(long millis)
//Causes the calling thread to wait until another thread calls the notify() or notifyAll() method
//of this object or until the specified timeout expires.
refer to :
http://rayleeya.iteye.com/blog/1955657