ANR 原理简要分析

文章目录

    • 1. ANR问题简介
    • 2. 常见的ANR问题
    • 3. Service 如何检测 ANR 问题
      • 小结
    • 4. InputEvent 如何检测 ANR 问题
      • 小结
    • 5. Broadcast ANR 问题
      • 广播的大致流程:
      • ANR检测
      • 小结
    • 定位问题
      • dumpsys
      • logcat
      • traces.txt
      • 借助bugly等平台
    • 感谢

最近碰到ANR的问题,需要分析定位。可是ANR的问题是真的难受,有时候即使手握着trace.txt日志也无法看到端倪,因为一般ANR问题出现都伴随着高CPU、高内存占用,确实难以定位。

花了一些时间学习Android ANR 问题的引发和系统如何检测ANR问题,以下做个记录,方便以后追溯,好记性不如烂笔头。

1. ANR问题简介

ANR(App Not Respond)表示程序在一定时间内没有反应。

根本原因就是ui线程长时间无法处理消息或者处理消息时间过长。

2. 常见的ANR问题

主要分成三类

  • InputEvent输入事件: 5s
  • Service服务: 前台服务20s,后台服务200s
  • Broadcast: 前台队列10s,后台队列20s

3. Service 如何检测 ANR 问题

Service的监测ANR是利用定时消息处理的。

在学习Service的启动流程之后你应该知道,AMS是作为一个分发任务的角色,真正处理启动Service的是ActiveServices。

ActiveServices有一个scheduleServiceTimeoutLocked方法,当创建service时候会被调用。

// ActiveService
void scheduleServiceTimeoutLocked(ProcessRecord proc) {
    if (proc.executingServices.size() == 0 || proc.thread == null) {
        return;
    }
    Message msg = mAm.mHandler.obtainMessage(
            ActivityManagerService.SERVICE_TIMEOUT_MSG);
    msg.obj = proc;
    mAm.mHandler.sendMessageDelayed(msg,
            proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT);
}

在启动了service以后会不停发一个延迟消息ActivityManagerService.SERVICE_TIMEOUT_MSG

上述代码中,有两个变量:

mAm就是AcivityManagerService,他是由SystemServer启动,运行在独立线程。

mHanlder就是AcivityManagerService的Hanlder,并不运行在AMS的线程中,而是运行在
AMS启动的HandlerThread(名字是MainHandler)。

根据proc.execServiceFg 判断是前台服务还是后台服务,决定延迟时间

// ActiveServices
// 前台服务的ANR时间
static final int SERVICE_TIMEOUT = 20*1000;
// 后台服务的ANR时间
static final int SERVICE_BACKGROUND_TIMEOUT = SERVICE_TIMEOUT * 10;

我们看AMS如何处理ActivityManagerService.SERVICE_TIMEOUT_MSG消息

// ActiveServices
void serviceTimeout(ProcessRecord proc) {
    String anrMessage = null;
    synchronized(mAm) {
        // 当前进程没有运行需要检测的services
        if (proc.executingServices.size() == 0 || proc.thread == null) {
            return;
        }
        final long now = SystemClock.uptimeMillis();
        final long maxTime =  now -
                (proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT);
        ServiceRecord timeout = null;
        long nextTime = 0;
        for (int i=proc.executingServices.size()-1; i>=0; i--) {
            ServiceRecord sr = proc.executingServices.valueAt(i);
            // 遍历service并计算service是否超过anr时间
            if (sr.executingStart < maxTime) {
                // 找到了一个anr的service
                timeout = sr;
                break;
            }
            if (sr.executingStart > nextTime) {
                nextTime = sr.executingStart;
            }
        }
        if (timeout != null && mAm.mLruProcesses.contains(proc)) {
            Slog.w(TAG, "Timeout executing service: " + timeout);
            StringWriter sw = new StringWriter();
            PrintWriter pw = new FastPrintWriter(sw, false, 1024);
            pw.println(timeout);
            // dump出当前的service的信息,记录在sw中
            timeout.dump(paw, "    ");
            pw.close();
            mLastAnrDump = sw.toString();
            mAm.mHandler.removeCallbacks(mLastAnrDumpClearer);
            mAm.mHandler.postDelayed(mLastAnrDumpClearer, LAST_ANR_LIFETIME_DURATION_MSECS);
            anrMessage = "executing service " + timeout.shortName;
        } else {
            // 没有发生anr的service,继续发送 ActivityManagerService.SERVICE_TIMEOUT_MSG延迟消息
            Message msg = mAm.mHandler.obtainMessage(
                    ActivityManagerService.SERVICE_TIMEOUT_MSG);
            msg.obj = proc;
            mAm.mHandler.sendMessageAtTime(msg, proc.execServicesFg
                    ? (nextTime+SERVICE_TIMEOUT) : (nextTime + SERVICE_BACKGROUND_TIMEOUT));
        }
    }
    if (anrMessage != null) {
        // 发生了anr,交给了AppErrors处理
        mAm.mAppErrors.appNotResponding(proc, null, null, false, anrMessage);
    }
}

小结

Service的检测ANR是利用AMS的HandlerThread间隔时间发送消息、定时处理消息,如果在消息处理过程中,检测到Service处理时间超过限制就说明该Service已经ANR,记录下相关信息并交由AppError负责处理

4. InputEvent 如何检测 ANR 问题

Android的输入事件都是在native层的InputDispatcher的分发处理,包括输入事件的anr。

参考《ANR机制以及问题分析》
InputDispatcherThread是一个线程,它处理一次消息的派发
输入事件作为一个消息,需要排队等待派发,每一个Connection都维护两个队列:
* outboundQueue: 等待发送给窗口的事件。每一个新消息到来,都会先进入到此队列
* waitQueue: 已经发送给窗口的事件
publishKeyEvent完成后,表示事件已经派发了,就将事件从outboundQueue挪到了waitQueue

从java层面来说,native层发生了input的anr会调用InputManagerService的notifyANR()方法

// Native callback.
private long notifyANR(InputApplicationHandle inputApplicationHandle,
        InputWindowHandle inputWindowHandle, String reason) {
    return mWindowManagerCallbacks.notifyANR(
            inputApplicationHandle, inputWindowHandle, reason);
}

其中mWindowManagerCallbacks就是inputMonitor,notifyANR()方法被调用。

// InputMonitor
/* Notifies the window manager about an application that is not responding.
 * Returns a new timeout to continue waiting in nanoseconds, or 0 to abort dispatch.
 * Called by the InputManager.
 */
@Override
public long notifyANR(InputApplicationHandle inputApplicationHandle,
        InputWindowHandle inputWindowHandle, String reason) {
    AppWindowToken appWindowToken = null;
    WindowState windowState = null;
    ...
    if (appWindowToken != null && appWindowToken.appToken != null) {
        ...
        // 当前activity存在
        final boolean abort = controller != null
                && controller.keyDispatchingTimedOut(reason,
                        (windowState != null) ? windowState.mSession.mPid : -1);
        if (!abort) {
            return appWindowToken.mInputDispatchingTimeoutNanos;
        }
    } else if (windowState != null) {
        try {
            long timeout = ActivityManager.getService().inputDispatchingTimedOut(
                    windowState.mSession.mPid, aboveSystem, reason);
            if (timeout >= 0) {
                return timeout * 1000000L; // nanoseconds
            }
        } catch (RemoteException ex) {
        }
    }
    return 0; 
}

上述有两种ANR情况

  • keyDispatchingTimedOut:当前Activity存在,交由ActivityRecord处理
  • inputDispatchingTimedOut:当前Activity不存在,交由AMS处理

先看keyDispatchingTimedOut

// ActivityRecord
@Override
public boolean keyDispatchingTimedOut(String reason, int windowPid) {
    ActivityRecord anrActivity;
    ProcessRecord anrApp;
    boolean windowFromSameProcessAsActivity;
    synchronized (service) {
        anrActivity = getWaitingHistoryRecordLocked();
        anrApp = app;
        windowFromSameProcessAsActivity =
                app == null || app.pid == windowPid || windowPid == -1;
    }
    if (windowFromSameProcessAsActivity) {
        return service.inputDispatchingTimedOut(anrApp, anrActivity, this, false, reason);
    } else {
        return service.inputDispatchingTimedOut(windowPid, false /* aboveSystem */, reason) < 0;
    }
}

最后还是统一走到了AMS的inputDispatchingTimedOut()

 public boolean inputDispatchingTimedOut(final ProcessRecord proc,
            final ActivityRecord activity, final ActivityRecord parent,
            final boolean aboveSystem, String reason) {
        if (checkCallingPermission(android.Manifest.permission.FILTER_EVENTS)
                != PackageManager.PERMISSION_GRANTED) {
            throw new SecurityException("Requires permission "
                    + android.Manifest.permission.FILTER_EVENTS);
        }

        final String annotation;
        if (reason == null) {
            annotation = "Input dispatching timed out";
        } else {
            annotation = "Input dispatching timed out (" + reason + ")";
        }

        if (proc != null) {
            synchronized (this) {
                if (proc.debugging) {
                    return false;
                }

                if (proc.instr != null) {
                    Bundle info = new Bundle();
                    info.putString("shortMsg", "keyDispatchingTimedOut");
                    info.putString("longMsg", annotation);
                    finishInstrumentationLocked(proc, Activity.RESULT_CANCELED, info);
                    return true;
                }
            }
            mHandler.post(new Runnable() {
                @Override
                public void run() {
                    mAppErrors.appNotResponding(proc, activity, parent, aboveSystem, annotation);
                }
            });
        }

        return true;
    }

最后还是交由AppError进行处理

小结

InputEvent的ANR检测逻辑是在Native层的InputDispatcher,当检测发生anr时候调用了java层的InputManagerService的notifyANR()方法,最后还是由AppError收集信息,弹窗等

5. Broadcast ANR 问题

Broadcast发生ANR是因为在onReceive中处理的时间过长
跟ActiveService一样,负责真正管理广播的是BroadcastQueue

广播的大致流程:

在App进程创建时候,AMS会调用sendPendingBroadcastsLocked(app),
sendPendingBroadcastsLocked()会调用processCurBroadcastLocked(),
通过app.thread.scheduleReceiver(),发送到用户进程,完成广播流程

ANR检测

在processCurBroadcastLocked()中会调用 setBroadcastTimeoutLocked(timeoutTime);

final void setBroadcastTimeoutLocked(long timeoutTime) {
    if (! mPendingBroadcastTimeoutMessage) {
        Message msg = mHandler.obtainMessage(BROADCAST_TIMEOUT_MSG, this);
        mHandler.sendMessageAtTime(msg, timeoutTime);
        mPendingBroadcastTimeoutMessage = true;
    }
}

往mHandler发送一个BROADCAST_TIMEOUT_MSG,其实基本机制跟Service检测ANR差不多。

其中mHandler的消息处理也是在AMS创建的的HandlerThread中,跟Service的ActivityManagerService.SERVICE_TIMEOUT_MSG在同一线程中。

来看mHanlder如何处理BROADCAST_TIMEOUT_MSG的消息。

// BroadcastQueue.java
public void handleMessage(Message msg) {
    switch (msg.what) {
        case BROADCAST_INTENT_MSG: {
            if (DEBUG_BROADCAST) Slog.v(
                    TAG_BROADCAST, "Received BROADCAST_INTENT_MSG");
            processNextBroadcast(true);
        } break;
        case BROADCAST_TIMEOUT_MSG: {
            synchronized (mService) {
                broadcastTimeoutLocked(true);
            }
        } break;
    }
}

最后是Broadcast的broadcastTimeoutLocked()

final void broadcastTimeoutLocked(boolean fromMsg) {
    ...
    if (mOrderedBroadcasts.size() == 0) {
        return;
    }
    long now = SystemClock.uptimeMillis();
    BroadcastRecord r = mOrderedBroadcasts.get(0);
    if (fromMsg) {
        ...
        long timeoutTime = r.receiverTime + mTimeoutPeriod;
        // 没有超时,就继续发送延迟消息
        if (timeoutTime > now) {
            setBroadcastTimeoutLocked(timeoutTime);
            return;
        }
    }
    ...
    // 到这说明出现ANR,开始收集进程信息、广播信息
    r.receiverTime = now;
    if (!debugging) {
        r.anrCount++;
    }
    ProcessRecord app = null;
    String anrMessage = null;
    Object curReceiver;
    if (r.nextReceiver > 0) {
        curReceiver = r.receivers.get(r.nextReceiver-1);
        r.delivery[r.nextReceiver-1] = BroadcastRecord.DELIVERY_TIMEOUT;
    } else {
        curReceiver = r.curReceiver;
    }
    Slog.w(TAG, "Receiver during timeout of " + r + " : " + curReceiver);
    logBroadcastReceiverDiscardLocked(r);
    if (curReceiver != null && curReceiver instanceof BroadcastFilter) {
        BroadcastFilter bf = (BroadcastFilter)curReceiver;
        if (bf.receiverList.pid != 0
                && bf.receiverList.pid != ActivityManagerService.MY_PID) {
            synchronized (mService.mPidsSelfLocked) {
                app = mService.mPidsSelfLocked.get(
                        bf.receiverList.pid);
            }
        }
    } else {
        app = r.curApp;
    }
    if (app != null) {
        anrMessage = "Broadcast of " + r.intent.toString();
    }
    if (mPendingBroadcast == r) {
        mPendingBroadcast = null;
    }
    // 结束当前的receiver,继续处理下一个receiver
    finishReceiverLocked(r, r.resultCode, r.resultData,
            r.resultExtras, r.resultAbort, false);
    scheduleBroadcastsLocked();
    if (!debugging && anrMessage != null) {
        // 处理anr信息
        mHandler.post(new AppNotResponding(app, anrMessage));
    }
}
private final class AppNotResponding implements Runnable {
    ...
    @Override
    public void run() {
        mService.mAppErrors.appNotResponding(mApp, null, null, false, mAnnotation);
    }

跟service处理ANR的情况一样,最后交由AMS中的AppError去处理收集anr信息、弹窗等问题。

小结

broadcast检测anr的机制基本跟service一致,利用handler发送间隔消息,在AMS的handlerThread中检测是否anr,最后也是交由appError处理

定位问题

大概了解了ANR的几种情况是如何产生的,如何定位到问题呢?

dumpsys

  1. 采集当前cpu占用情况:
    adb shell dumpsys cpuinfo

  2. 采集当前memory占用情况:
    adb shell dumpsys memoryinfo

  3. 采集当前activity、service的情况:
    adb shell dumpsys activity
    adb shell service list

利用dumpsys对整体环境有个大致了解

logcat

检索am_anr关键字和anr关键字,如果你幸运的话是可以找到对应的anr进程
(我遇到好几次,界面显示anr可是却没有anr的logcat信息)

traces.txt

data/anr/traces.txt
一般来说,发生anr的进程会出现traces日志的最前面,会有线程的所有信息,包括堆栈信息。

借助bugly等平台

一般anr是因为我们的程序不够健壮,借助bugly可以让我们比较快速的定位到问题,但是也不是万能的。

感谢

非常感谢这篇文章的作者,让我对ANR有一个整体的学习

https://duanqz.github.io/2015-10-12-ANR-Analysis#service

你可能感兴趣的:(android)