ANR

ANR原理分析

什么是ANR

ANR(Application Not Responding)就是应用在规定的时间内没有响应用户输入或者系统服务。

ANR发生场景

这里以广播超时和输入事件为例讲述ANR发生场景,其余组件如Activity,Service,ContentProvider实现原理非常类似.

1. BroadcastReceiver 超时

在分析Broadcast ANR之前我们先简单了解下Broadcast。
Broadcast一般分为两类:

- Normal broadcasts (sent with Context.sendBroadcast) are completely asynchronous. All receivers of the broadcast are run in an undefined order,often at the same time. This is more efficient, but means that receivers cannot use the result or abort APIs included here.

- Ordered broadcasts (sent with Context.sendOrderedBroadcast) are delivered to one receiver at a time. As each receiver executes in turn, it can propagate a result to the next receiver, or it can completely abort the broadcast so that it won't be passed to other receivers. The order receivers run in can be controlled with the android:priority attribute of the matching intent-filter; receivers with the same priority will be run in an arbitrary order.

BroadcastReceiver有两种注册方式:

You can either dynamically register an instance of this class with Context.registerReceiver() or statically publish an implementation through the receiver tag in your AndroidManifest.xml.

下面分析Broadcast ANR的流程,如图所示:


![input.png](https://upload-images.jianshu.io/upload_images/11087999-e755aebafe88da10.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)

如果AMS将Broadcast发送给广播接收机后,在规定的时间内没有收到广播接收机
发送的finishReceiver的消息,就会触发BroadcastTimeout ANR。下面从broadcastIntentLocked开始分析。

   final int broadcastIntentLocked(ProcessRecord callerApp,
            String callerPackage, Intent intent, String resolvedType,
            IIntentReceiver resultTo, int resultCode, String resultData,
            Bundle resultExtras, String[] requiredPermissions, int appOp, Bundle bOptions,
            boolean ordered, boolean sticky, int callingPid, int callingUid, int userId) {
        ...
        // Figure out who all will receive this broadcast.
        List receivers = null;
        List registeredReceivers = null;
        // Need to resolve the intent to interested receivers...
        if ((intent.getFlags()&Intent.FLAG_RECEIVER_REGISTERED_ONLY)
                 == 0) {
            // 收集静态注册的广播接收机
            receivers = collectReceiverComponents(intent, resolvedType, callingUid, users);
        }
        if (intent.getComponent() == null) {
            if (userId == UserHandle.USER_ALL && callingUid == Process.SHELL_UID) {
                // Query one target user at a time, excluding shell-restricted users
                for (int i = 0; i < users.length; i++) {
                    if (mUserController.hasUserRestriction(
                            UserManager.DISALLOW_DEBUGGING_FEATURES, users[i])) {
                        continue;
                    }
                    List registeredReceiversForUser =
                            mReceiverResolver.queryIntent(intent,
                                    resolvedType, false, users[i]);
                    if (registeredReceivers == null) {
                        registeredReceivers = registeredReceiversForUser;
                    } else if (registeredReceiversForUser != null) {
                        registeredReceivers.addAll(registeredReceiversForUser);
                    }
                }
            } else {
                // 查找动态注册的广播接收机
                registeredReceivers = mReceiverResolver.queryIntent(intent,
                        resolvedType, false, userId);
            }
        }
        ...
        int NR = registeredReceivers != null ? registeredReceivers.size() : 0;
        if (!ordered && NR > 0) {
            // 发送普通广播到动态注册的广播接收机
            // If we are not serializing this broadcast, then send the
            // registered receivers separately so they don't wait for the
            // components to be launched.
            // 根据广播类型决定发送广播的队列,前台广播由前台广播对表处理;
            // 后台广播由后台广播队列处理
            final BroadcastQueue queue = broadcastQueueForIntent(intent);
            // 创建广播记录
            BroadcastRecord r = new BroadcastRecord(queue, intent, callerApp,
                    callerPackage, callingPid, callingUid, resolvedType, requiredPermissions,
                    appOp, brOptions, registeredReceivers, resultTo, resultCode, resultData,
                    resultExtras, ordered, sticky, false, userId);
            if (DEBUG_BROADCAST) Slog.v(TAG_BROADCAST, "Enqueueing parallel broadcast " + r);
            final boolean replaced = replacePending && queue.replaceParallelBroadcastLocked(r);
            if (!replaced) {
                // 将(前台\后台)普通广播放入(前台\后台)并行广播列表中
                queue.enqueueParallelBroadcastLocked(r);
                // 处理(前台\后台)并行广播列表中的广播
                queue.scheduleBroadcastsLocked();
            }
            registeredReceivers = null;
            NR = 0;
        }

        // Merge into one list.
        // 动态注册的广播接收机、静态注册的广播接收机按优先级排序(高->低),
        // 存放到receivers中
        int ir = 0;
        if (receivers != null) {
            ...
            int NT = receivers != null ? receivers.size() : 0;
            int it = 0;
            ResolveInfo curt = null;
            BroadcastFilter curr = null;
            while (it < NT && ir < NR) {
                if (curt == null) {
                    curt = (ResolveInfo)receivers.get(it);
                }
                if (curr == null) {
                    curr = registeredReceivers.get(ir);
                }
                if (curr.getPriority() >= curt.priority) {
                    // Insert this broadcast record into the final list.
                    receivers.add(it, curr);
                    ir++;
                    curr = null;
                    it++;
                    NT++;
                } else {
                        // Skip to the next ResolveInfo in the final list.
                    it++;
                    curt = null;
                }
            }
        }
        while (ir < NR) {
            if (receivers == null) {
                receivers = new ArrayList();
            }
            receivers.add(registeredReceivers.get(ir));
            ir++;
        }

        if ((receivers != null && receivers.size() > 0)
                || resultTo != null) {
            // 根据广播类型决定发送广播的队列,前台广播由前台广播对表处理;
            // 后台广播由后台广播队列处理
            BroadcastQueue queue = broadcastQueueForIntent(intent);
            BroadcastRecord r = new BroadcastRecord(queue, intent, callerApp,
                    callerPackage, callingPid, callingUid, resolvedType,
                    requiredPermissions, appOp, brOptions, receivers, resultTo, resultCode,
                    resultData, resultExtras, ordered, sticky, false, userId);

            if (DEBUG_BROADCAST) Slog.v(TAG_BROADCAST, "Enqueueing ordered broadcast " + r
                    + ": prev had " + queue.mOrderedBroadcasts.size());
            if (DEBUG_BROADCAST) Slog.i(TAG_BROADCAST,
                    "Enqueueing broadcast " + r.intent.getAction());

            boolean replaced = replacePending && queue.replaceOrderedBroadcastLocked(r);
            if (!replaced) {
                // 将(前台\后台)普通广播放入(前台\后台)有序广播列表中
                    queue.enqueueOrderedBroadcastLocked(r);
                // 处理(前台\后台)有序广播列表中的广播
                queue.scheduleBroadcastsLocked();
            }
        }
        ...
    }

broadcastIntentLocked在进行一系列的检查以及特殊情况的处理后,按广播的类型以及相应的广播接收机的类型进行分发。

下面分析分发函数scheduleBroadcastsLocked

android/frameworks/base/services/core/java/com/android/server/am/BroadcastQueue.java

    public void scheduleBroadcastsLocked() {
        if (DEBUG_BROADCAST) Slog.v(TAG_BROADCAST, "Schedule broadcasts ["
                + mQueueName + "]: current="
                + mBroadcastsScheduled);

        if (mBroadcastsScheduled) {
            return;
        }
        // 发送处理广播的Message
        mHandler.sendMessage(mHandler.obtainMessage(BROADCAST_INTENT_MSG, this));
        mBroadcastsScheduled = true;
    }

scheduleBroadcastsLocked只是简单的发送BROADCAST_INTENT_MSG消息,该消息的处理函数调用processNextBroadcast进行分发。

    final void processNextBroadcast(boolean fromMsg) {
        synchronized(mService) {
            BroadcastRecord r;

            ...
            mService.updateCpuStats();
            ...
            // First, deliver any non-serialized broadcasts right away.
            while (mParallelBroadcasts.size() > 0) {
                r = mParallelBroadcasts.remove(0);
                r.dispatchTime = SystemClock.uptimeMillis();
                r.dispatchClockTime = System.currentTimeMillis();
                final int N = r.receivers.size();
                if (DEBUG_BROADCAST_LIGHT) Slog.v(TAG_BROADCAST, "Processing parallel broadcast ["
                        + mQueueName + "] " + r);
                for (int i=0; i 0) {
                    // 1) 广播发送始于SystemReady之前,结束于SystemReady之后的超时检测
                    // 由于SystemReady之前的广播发送可能很慢,而且不检测,所以超时时间为
                    // 2 * mTimeoutPeriod * numReceivers
                    // 2) 广播发送过程中有dex2oat发生。
                    long now = SystemClock.uptimeMillis();
                    if ((numReceivers > 0) &&
                            (now > r.dispatchTime + (2*mTimeoutPeriod*numReceivers))) {
                        ...
                        broadcastTimeoutLocked(false); // forcibly finish this broadcast
                        forceReceive = true;
                        r.state = BroadcastRecord.IDLE;
                    }
                }
                ...
                if (r.receivers == null || r.nextReceiver >= numReceivers
                        || r.resultAbort || forceReceive) {
                    // No more receivers for this broadcast!  Send the final
                    // result if requested...
                    if (r.resultTo != null) {
                        // 广播发送完成,如果发送方需要结果,将结果反馈给发送方。
                        try {
                            if (DEBUG_BROADCAST) Slog.i(TAG_BROADCAST,
                                    "Finishing broadcast [" + mQueueName + "] "
                                    + r.intent.getAction() + " app=" + r.callerApp);
                            performReceiveLocked(r.callerApp, r.resultTo,
                                new Intent(r.intent), r.resultCode,
                                r.resultData, r.resultExtras, false, false, r.userId);
                            // Set this to null so that the reference
                                // (local and remote) isn't kept in the mBroadcastHistory.
                            r.resultTo = null;
                        } catch (RemoteException e) {
                            r.resultTo = null;
                            Slog.w(TAG, "Failure ["
                                    + mQueueName + "] sending broadcast result of "
                                    + r.intent, e);

                        }
                    }

                    if (DEBUG_BROADCAST) Slog.v(TAG_BROADCAST, "Cancelling BROADCAST_TIMEOUT_MSG");
                    // 一个广播的所有接收机发送完成,取消超时消息设置。
                    cancelBroadcastTimeoutLocked();

                    if (DEBUG_BROADCAST_LIGHT) Slog.v(TAG_BROADCAST,
                            "Finished with ordered broadcast " + r);

                    // ... and on to the next...
                    addBroadcastToHistoryLocked(r);
                    if (r.intent.getComponent() == null && r.intent.getPackage() == null
                            && (r.intent.getFlags()&Intent.FLAG_RECEIVER_REGISTERED_ONLY) == 0) {
                        // This was an implicit broadcast... let's record it for posterity.
                        mService.addBroadcastStatLocked(r.intent.getAction(), r.callerPackage,
                                r.manifestCount, r.manifestSkipCount, r.finishTime-r.dispatchTime);
                    }
                    // 从Ordered队列中移除发送完成的广播
                    mOrderedBroadcasts.remove(0);
                    r = null;
                    looped = true;
                    continue;
                }
            } while (r == null);

            // Get the next receiver...
            // 获取广播的下一个接收者(可能有多个)发送
            int recIdx = r.nextReceiver++;
    
            // Keep track of when this receiver started, and make sure there
            // is a timeout message pending to kill it if need be.
            r.receiverTime = SystemClock.uptimeMillis();
            if (recIdx == 0) {
                // 广播多个接收者中的第一个,记录分发时间
                r.dispatchTime = r.receiverTime;
                r.dispatchClockTime = System.currentTimeMillis();
                if (DEBUG_BROADCAST_LIGHT) Slog.v(TAG_BROADCAST, "Processing ordered broadcast ["
                        + mQueueName + "] " + r);
            }
            if (! mPendingBroadcastTimeoutMessage) {
                long timeoutTime = r.receiverTime + mTimeoutPeriod;
                if (DEBUG_BROADCAST) Slog.v(TAG_BROADCAST,
                        "Submitting BROADCAST_TIMEOUT_MSG ["
                        + mQueueName + "] for " + r + " at " + timeoutTime);
                // 如果没有设定广播发送超时时间,在这里设定
                setBroadcastTimeoutLocked(timeoutTime);
            }
            ...
            final Object nextReceiver = r.receivers.get(recIdx);

            if (nextReceiver instanceof BroadcastFilter) {
                // Simple case: this is a registered receiver who gets
                // a direct call.
                BroadcastFilter filter = (BroadcastFilter)nextReceiver;
                if (DEBUG_BROADCAST)  Slog.v(TAG_BROADCAST,
                        "Delivering ordered ["
                        + mQueueName + "] to registered "
                        + filter + ": " + r);
                // 如果是动态注册的广播接收机,直接发送
                deliverToRegisteredReceiverLocked(r, filter, r.ordered, recIdx);
                // 我的理解r.ordered == true ???
                if (r.receiver == null || !r.ordered) {
                    // The receiver has already finished, so schedule to
                    // process the next one.
                    if (DEBUG_BROADCAST) Slog.v(TAG_BROADCAST, "Quick finishing ["
                            + mQueueName + "]: ordered="
                            + r.ordered + " receiver=" + r.receiver);
                    r.state = BroadcastRecord.IDLE;
                    scheduleBroadcastsLocked();
                } else {
                        if (brOptions != null && brOptions.getTemporaryAppWhitelistDuration() > 0) {
                        scheduleTempWhitelistLocked(filter.owningUid,
                                brOptions.getTemporaryAppWhitelistDuration(), r);
                    }
                }
                return;
            }
            ...
            // Is this receiver's application already running?
            if (app != null && app.thread != null) {
                // 广播接收机Host进程已经运行,发送广播
                try {
                    app.addPackage(info.activityInfo.packageName,
                            info.activityInfo.applicationInfo.versionCode, mService.mProcessStats);
                    // 最终通过Binder IPC运行广播接收机        
                    processCurBroadcastLocked(r, app);
                    return;
                }
            }
            ...
            // 创建广播接收机Host进程
            if ((r.curApp=mService.startProcessLocked(targetProcess,
                    info.activityInfo.applicationInfo, true,
                    r.intent.getFlags() | Intent.FLAG_FROM_BACKGROUND,
                    "broadcast", r.curComponent,
                    (r.intent.getFlags()&Intent.FLAG_RECEIVER_BOOT_UPGRADE) != 0, false, false))
                            == null) {
                // Ah, this recipient is unavailable.  Finish it if necessary,
                // and mark the broadcast record as ready for the next.
                Slog.w(TAG, "Unable to launch app "
                        + info.activityInfo.applicationInfo.packageName + "/"
                        + info.activityInfo.applicationInfo.uid + " for broadcast "
                        + r.intent + ": process is bad");
                logBroadcastReceiverDiscardLocked(r);
                finishReceiverLocked(r, r.resultCode, r.resultData,
                        r.resultExtras, r.resultAbort, false);
                scheduleBroadcastsLocked();
                r.state = BroadcastRecord.IDLE;
                return;
            }

            mPendingBroadcast = r;
            }
    }

超时消息处理,超时时间到直接调用broadcastTimeoutLocked处理

161    private final class BroadcastHandler extends Handler {
162        public BroadcastHandler(Looper looper) {
163            super(looper, null, true);
164        }
165
166        @Override
167        public void handleMessage(Message msg) {
168            switch (msg.what) {
169                case BROADCAST_INTENT_MSG: {
170                    if (DEBUG_BROADCAST) Slog.v(
171                            TAG_BROADCAST, "Received BROADCAST_INTENT_MSG");
172                    processNextBroadcast(true);
173                } break;
174                case BROADCAST_TIMEOUT_MSG: {
175                    synchronized (mService) {
176                        broadcastTimeoutLocked(true);
177                    }
178                } break;
179            }
180        }
181    }
    final void broadcastTimeoutLocked(boolean fromMsg) {
        // fromMsg标记超时触发者,true表示超时消息触发
        // false表示直接调用超时处理
        if (fromMsg) {
            mPendingBroadcastTimeoutMessage = false;
        }

        if (mOrderedBroadcasts.size() == 0) {
            return;
        }

        long now = SystemClock.uptimeMillis();
        BroadcastRecord r = mOrderedBroadcasts.get(0);
        if (fromMsg) {
            if (mService.mDidDexOpt) {
                // Delay timeouts until dexopt finishes.
                mService.mDidDexOpt = false;
                long timeoutTime = SystemClock.uptimeMillis() + mTimeoutPeriod;
                setBroadcastTimeoutLocked(timeoutTime);
                return;
            }
            if (!mService.mProcessesReady) {
                // Only process broadcast timeouts if the system is ready. That way
                // PRE_BOOT_COMPLETED broadcasts can't timeout as they are intended
                // to do heavy lifting for system up.
                return;
            }

            long timeoutTime = r.receiverTime + mTimeoutPeriod;
            // 如果发送给当前广播接收机(可能多个)没有超时,则重新设定超时消息;从这里
            // 看出超时其实是针对单个广播接收机,如果多个广播接收机收发累计时间
            // 超时,并不会触发ANR。
            if (timeoutTime > now) {
                // We can  observe premature timeouts because we do not cancel and reset the
                // broadcast timeout message after each receiver finishes.  Instead, we set up
                // an initial timeout then kick it down the road a little further as needed
                // when it expires.
                if (DEBUG_BROADCAST) Slog.v(TAG_BROADCAST,
                        "Premature timeout ["
                        + mQueueName + "] @ " + now + ": resetting BROADCAST_TIMEOUT_MSG for "
                        + timeoutTime);
                setBroadcastTimeoutLocked(timeoutTime);
                return;
            }
        }
        ...
        // 触发广播超时ANR
        if (anrMessage != null) {
            // Post the ANR to the handler since we do not want to process ANRs while
            // potentially holding our lock.
            mHandler.post(new AppNotResponding(app, anrMessage));
        }
    }

broadcastTimeoutLocked根据参数fromMsg进一步判定是否确实广播超时ANR,这里需要注意并不是没发送一条广播就发送BROADCAST_TIMEOUT_MSG消息,而在每个receiver接收处理后才cancelBroadcastTimeoutLocked, 而是在每次timeout之后判断当前广播处理时间是否超时,没有超时会重新发送BROADCAST_TIMEOUT_MSG,用于延长timeout时间,这样可以避免比如100条有序广播就要发送100个BROADCAST_TIMEOUT_MSG消息以及100次cancelBroadcastTimeoutLocked

Broadcast2.png

2. 输入事件超时

/frameworks/native/services/inputflinger/InputDispatcher.cpp

int32_t InputDispatcher::findFocusedWindowTargetsLocked(nsecs_t currentTime,
        const EventEntry* entry, Vector& inputTargets, nsecs_t* nextWakeupTime) {
    int32_t injectionResult;
    std::string reason;

... ...

    // Check whether the window is ready for more input.
    reason = checkWindowReadyForMoreInputLocked(currentTime,
            mFocusedWindowHandle, entry, "focused");
    if (!reason.empty()) {
        injectionResult = handleTargetsNotReadyLocked(currentTime, entry,
                mFocusedApplicationHandle, mFocusedWindowHandle, nextWakeupTime, reason.c_str());
        goto Unresponsive;
    }

    // Success!  Output targets.
    injectionResult = INPUT_EVENT_INJECTION_SUCCEEDED;
    addWindowTargetLocked(mFocusedWindowHandle,
            InputTarget::FLAG_FOREGROUND | InputTarget::FLAG_DISPATCH_AS_IS, BitSet32(0),
            inputTargets);

    // Done.
Failed:
Unresponsive:
    nsecs_t timeSpentWaitingForApplication = getTimeSpentWaitingForApplicationLocked(currentTime);
    updateDispatchStatisticsLocked(currentTime, entry,
            injectionResult, timeSpentWaitingForApplication);
#if DEBUG_FOCUS
    ALOGD("findFocusedWindow finished: injectionResult=%d, "
            "timeSpentWaitingForApplication=%0.1fms",
            injectionResult, timeSpentWaitingForApplication / 1000000.0);
#endif
    return injectionResult;
}

checkWindowReadyForMoreInputLocked方法判断窗口是否准备号接受事件

std::string InputDispatcher::checkWindowReadyForMoreInputLocked(nsecs_t currentTime,
        const sp& windowHandle, const EventEntry* eventEntry,
        const char* targetType) {
    // If the window is paused then keep waiting.
    if (windowHandle->getInfo()->paused) {
        return StringPrintf("Waiting because the %s window is paused.", targetType);
    }
    //获取InputChannel在mConnectionsByFd中的索引,从而得到对应的Connection对象
    // If the window's connection is not registered then keep waiting.
    ssize_t connectionIndex = getConnectionIndexLocked(windowHandle->getInputChannel());
    if (connectionIndex < 0) {
        return StringPrintf("Waiting because the %s window's input channel is not "
                "registered with the input dispatcher.  The window may be in the process "
                "of being removed.", targetType);
    }

    // If the connection is dead then keep waiting.
    sp connection = mConnectionsByFd.valueAt(connectionIndex);
    if (connection->status != Connection::STATUS_NORMAL) {
        return StringPrintf("Waiting because the %s window's input connection is %s."
                "The window may be in the process of being removed.", targetType,
                connection->getStatusLabel());
    }
     //inputPublisher被block,比如窗口反馈慢,导致InputChannel被写满,inputPublisher就会被block
    // If the connection is backed up then keep waiting.
    if (connection->inputPublisherBlocked) {
        return StringPrintf("Waiting because the %s window's input channel is full.  "
                "Outbound queue length: %d.  Wait queue length: %d.",
                targetType, connection->outboundQueue.count(), connection->waitQueue.count());
    }
    //Key事件要求outboundQueue和waitQueue全部为空才继续分发
    // Ensure that the dispatch queues aren't too far backed up for this event.
    if (eventEntry->type == EventEntry::TYPE_KEY) {
        // If the event is a key event, then we must wait for all previous events to
        // complete before delivering it because previous events may have the
        // side-effect of transferring focus to a different window and we want to
        // ensure that the following keys are sent to the new window.
        //
        // Suppose the user touches a button in a window then immediately presses "A".
        // If the button causes a pop-up window to appear then we want to ensure that
        // the "A" key is delivered to the new pop-up window.  This is because users
        // often anticipate pending UI changes when typing on a keyboard.
        // To obtain this behavior, we must serialize key events with respect to all
        // prior input events.
        if (!connection->outboundQueue.isEmpty() || !connection->waitQueue.isEmpty()) {
            return StringPrintf("Waiting to send key event because the %s window has not "
                    "finished processing all of the input events that were previously "
                    "delivered to it.  Outbound queue length: %d.  Wait queue length: %d.",
                    targetType, connection->outboundQueue.count(), connection->waitQueue.count());
        }
    } else {
       //Touch事件只要求在0.5s内窗口接到反馈即可,这个用waitQueue头事件的处理事件作为判断
        // Touch events can always be sent to a window immediately because the user intended
        // to touch whatever was visible at the time.  Even if focus changes or a new
        // window appears moments later, the touch event was meant to be delivered to
        // whatever window happened to be on screen at the time.
        //
        // Generic motion events, such as trackball or joystick events are a little trickier.
        // Like key events, generic motion events are delivered to the focused window.
        // Unlike key events, generic motion events don't tend to transfer focus to other
        // windows and it is not important for them to be serialized.  So we prefer to deliver
        // generic motion events as soon as possible to improve efficiency and reduce lag
        // through batching.
        //
        // The one case where we pause input event delivery is when the wait queue is piling
        // up with lots of events because the application is not responding.
        // This condition ensures that ANRs are detected reliably.
        if (!connection->waitQueue.isEmpty()
                && currentTime >= connection->waitQueue.head->deliveryTime
                        + STREAM_AHEAD_EVENT_TIMEOUT) {
            return StringPrintf("Waiting to send non-key event because the %s window has not "
                    "finished processing certain input events that were delivered to it over "
                    "%0.1fms ago.  Wait queue length: %d.  Wait queue head age: %0.1fms.",
                    targetType, STREAM_AHEAD_EVENT_TIMEOUT * 0.000001f,
                    connection->waitQueue.count(),
                    (currentTime - connection->waitQueue.head->deliveryTime) * 0.000001f);
        }
    }
    return "";
}

checkWindowReadyForMoreInputLocked返回reason不为空,则继续调用handleTargetsNotReadyLocked处理

int32_t InputDispatcher::handleTargetsNotReadyLocked(nsecs_t currentTime,
        const EventEntry* entry,
        const sp& applicationHandle,
        const sp& windowHandle,
        nsecs_t* nextWakeupTime, const char* reason) {
        //系统没起来
    if (applicationHandle == NULL && windowHandle == NULL) {
        ... ...
        }
    } else {
    //ANR开始时间点记录, 因为mInputTargetWaitCause初始值为INPUT_TARGET_WAIT_CAUSE_NONE
        if (mInputTargetWaitCause != INPUT_TARGET_WAIT_CAUSE_APPLICATION_NOT_READY) {
#if DEBUG_FOCUS
            ALOGD("Waiting for application to become ready for input: %s.  Reason: %s",
                    getApplicationWindowLabelLocked(applicationHandle, windowHandle).c_str(),
                    reason);
#endif
            nsecs_t timeout;
            if (windowHandle != NULL) {
                timeout = windowHandle->getDispatchingTimeout(DEFAULT_INPUT_DISPATCHING_TIMEOUT);
            } else if (applicationHandle != NULL) {
                timeout = applicationHandle->getDispatchingTimeout(
                        DEFAULT_INPUT_DISPATCHING_TIMEOUT);
            } else {
                timeout = DEFAULT_INPUT_DISPATCHING_TIMEOUT;
            }

            mInputTargetWaitCause = INPUT_TARGET_WAIT_CAUSE_APPLICATION_NOT_READY;
            mInputTargetWaitStartTime = currentTime;
            mInputTargetWaitTimeoutTime = currentTime + timeout;
            mInputTargetWaitTimeoutExpired = false;
            mInputTargetWaitApplicationHandle.clear();

            if (windowHandle != NULL) {
                mInputTargetWaitApplicationHandle = windowHandle->inputApplicationHandle;
            }
            if (mInputTargetWaitApplicationHandle == NULL && applicationHandle != NULL) {
                mInputTargetWaitApplicationHandle = applicationHandle;
            }
        }
    }
    
    if (mInputTargetWaitTimeoutExpired) {
        return INPUT_EVENT_INJECTION_TIMED_OUT;
    }
    //再次窗口没准备好则会判断是否超过ANR时间,如果超过引发ANR
    if (currentTime >= mInputTargetWaitTimeoutTime) {
        onANRLocked(currentTime, applicationHandle, windowHandle,
                entry->eventTime, mInputTargetWaitStartTime, reason);

        // Force poll loop to wake up immediately on next iteration once we get the
        // ANR response back from the policy.
        *nextWakeupTime = LONG_LONG_MIN;
        return INPUT_EVENT_INJECTION_PENDING;
    } else {
        // Force poll loop to wake up when timeout is due.
        if (mInputTargetWaitTimeoutTime < *nextWakeupTime) {
            *nextWakeupTime = mInputTargetWaitTimeoutTime;
        }
        return INPUT_EVENT_INJECTION_PENDING;
    }
}

所以引发ANR的整体流程可以概况为:

inputdispacher派发线程在找到分发窗口后,首先判断窗口是否准备好了,如果准备好了(Key事件两个队列都为空,touch事件0.5秒内有反馈),那么直接写入InputChannel 分发给窗口, 分发线程进入休眠等待反馈,准备进行下次循环,如果窗口没有准备好,那么记录ANR开始时间,在下一次分发时如果窗口还没有准备好,那么判断时间,超时则抛出ANR

input.png

ANR Trace打印

1. AMS中如何处理ANR?

无论是哪种超时最终都会调用方法AppErrors.appNotResponding去处理ANR后续工作:
android/frameworks/base/services/core/java/com/android/server/am/AppErrors.java

final void appNotResponding(ProcessRecord app, ActivityRecord activity,
      ActivityRecord parent, boolean aboveSystem, final String annotation) {
      ArrayList firstPids = new ArrayList(5);
      SparseArray lastPids = new SparseArray(20);
      ... ...
             //1. 将persistent进程和treatLikeActivity(输入法)还有当前发生ANR 的进程放入firstPids中,其余正在运行的进程放入lastPids中
              for (int i = mService.mLruProcesses.size() - 1; i >= 0; i--) {
                  ProcessRecord r = mService.mLruProcesses.get(i);
                  if (r != null && r.thread != null) {
                      int pid = r.pid;
                      if (pid > 0 && pid != app.pid && pid != parentPid && pid != MY_PID) {
                          if (r.persistent) {
                              firstPids.add(pid);
                              if (DEBUG_ANR) Slog.i(TAG, "Adding persistent proc: " + r);
                          } else if (r.treatLikeActivity) {
                              firstPids.add(pid);
                              if (DEBUG_ANR) Slog.i(TAG, "Adding likely IME: " + r);
                          } else {
                              lastPids.put(pid, Boolean.TRUE);
                              if (DEBUG_ANR) Slog.i(TAG, "Adding ANR proc: " + r);
                          }
                      }
                  }
              }
          }
      }
  ...
      
        //2. 这里创建和打印ANR  trace文件
        File tracesFile = ActivityManagerService.dumpStackTraces(
                true, firstPids,
                (isSilentANR) ? null : processCpuTracker,
                (isSilentANR) ? null : lastPids,
                nativePids);
   //3. 打印CPUInfo
        String cpuInfo = null;
        if (ActivityManagerService.MONITOR_CPU_USAGE) {
            mService.updateCpuStatsNow();
            synchronized (mService.mProcessCpuTracker) {
                cpuInfo = mService.mProcessCpuTracker.printCurrentState(anrTime);
            }
            info.append(processCpuTracker.printCurrentLoad());
            info.append(cpuInfo);
        }

        info.append(processCpuTracker.printCurrentState(anrTime));

        Slog.e(TAG, info.toString());
        if (tracesFile == null) {
            // There is no trace file, so dump (only) the alleged culprit's threads to the log
            Process.sendSignal(app.pid, Process.SIGNAL_QUIT);
        }

       ... 
            
            // 4.显示ANR dialog
            Message msg = Message.obtain();
            msg.what = ActivityManagerService.SHOW_NOT_RESPONDING_UI_MSG;
            msg.obj = new AppNotRespondingDialog.Data(app, activity, aboveSystem);

            mService.mUiHandler.sendMessage(msg);
        }
    }

这里我们重点关心第二部打印ANR trace文件,我们在ANR trace文件中会看到很多进程都打出了他们的线程堆栈等信息,那么是怎么决定哪些进程可以打印出来呢,就是上面计算的firstPids和lastPids决定:

  • 发生ANR的进程第一个打印
  • 按最近使用进程排序,persistent和treatLikeActivity进程打印
  • 其余的最多使用CPU 的前5个进程打印

android/frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java

    public static File dumpStackTraces(ArrayList firstPids,
            ProcessCpuTracker processCpuTracker, SparseArray lastPids,
            ArrayList nativePids) {
        ArrayList extraPids = null;

        Slog.i(TAG, "dumpStackTraces pids=" + lastPids + " nativepids=" + nativePids);

        ... 
        //开始创建data/anr路径下的trace文件
        final File tracesDir = new File(ANR_TRACE_DIR);
        // Each set of ANR traces is written to a separate file and dumpstate will process
        // all such files and add them to a captured bug report if they're recent enough.
        maybePruneOldTraces(tracesDir);

        // NOTE: We should consider creating the file in native code atomically once we've
        // gotten rid of the old scheme of dumping and lot of the code that deals with paths
        // can be removed.
        File tracesFile = createAnrDumpFile(tracesDir);
        if (tracesFile == null) {
            return null;
        }
   
        dumpStackTraces(tracesFile.getAbsolutePath(), firstPids, nativePids, extraPids);
        return tracesFile;
    }

    public static void dumpStackTraces(String tracesFile, ArrayList firstPids,
            ArrayList nativePids, ArrayList extraPids) {

        Slog.i(TAG, "Dumping to " + tracesFile);

        // We don't need any sort of inotify based monitoring when we're dumping traces via
        // tombstoned. Data is piped to an "intercept" FD installed in tombstoned so we're in full
        // control of all writes to the file in question.

        // We must complete all stack dumps within 20 seconds.
        long remainingTime = 20 * 1000;

        // First collect all of the stacks of the most important pids.
        if (firstPids != null) {
            int num = firstPids.size();
            for (int i = 0; i < num; i++) {
                Slog.i(TAG, "Collecting stacks for pid " + firstPids.get(i));
                final long timeTaken = dumpJavaTracesTombstoned(firstPids.get(i), tracesFile,
                                                                remainingTime);

                remainingTime -= timeTaken;
                if (remainingTime <= 0) {
                    Slog.e(TAG, "Aborting stack trace dump (current firstPid=" + firstPids.get(i) +
                           "); deadline exceeded.");
                    return;
                }

                if (DEBUG_ANR) {
                    Slog.d(TAG, "Done with pid " + firstPids.get(i) + " in " + timeTaken + "ms");
                }
            }
        }

      ... 
        Slog.i(TAG, "Done dumping");
    }

这个方法的最终目标是中向各个java/native 进程通过标准Linux接口sigqueue发送SIGQUIT.

2. 应用进程在trace中的信息是如何打印的?

catcher.png

Zygote fork 子进程的时候会调用InitNonZygoteOrPostFork方法,在这里创建SignalCatcher线程,而这个线程就是用于监控ANR并打印trace

/art/runtime/runtime.cc

void Runtime::InitNonZygoteOrPostFork(
    JNIEnv* env,
    bool is_system_server,
    NativeBridgeAction action,
    const char* isa,
    bool profile_system_server) {
         ... 
         StartSignalCatcher();
         ... 
}

void Runtime::StartSignalCatcher() {
  if (!is_zygote_) {
    signal_catcher_ = new SignalCatcher(stack_trace_file_, use_tombstoned_traces_);
  }
}

接下来看SignalCatcher构造方法:

/art/runtime/signal_catcher.cc

SignalCatcher::SignalCatcher(const std::string& stack_trace_file,
                             bool use_tombstoned_stack_trace_fd)
    : stack_trace_file_(stack_trace_file),
      use_tombstoned_stack_trace_fd_(use_tombstoned_stack_trace_fd),
      lock_("SignalCatcher lock"),
      cond_("SignalCatcher::cond_", lock_),
      thread_(nullptr) {
  ...

  SetHaltFlag(false);

  // Create a raw pthread; its start routine will attach to the runtime.
  CHECK_PTHREAD_CALL(pthread_create, (&pthread_, nullptr, &Run, this), "signal catcher thread");

}

这里使用pthread_create系统调用创建一个linux线程并开始执行其run方法

void* SignalCatcher::Run(void* arg) {
  SignalCatcher* signal_catcher = reinterpret_cast(arg);
  CHECK(signal_catcher != nullptr);

  Runtime* runtime = Runtime::Current();
  //(1)将当前线程链接到JavaVM,是Linux线程拥有java线程状态可以打印堆栈等
  CHECK(runtime->AttachCurrentThread("Signal Catcher", true, runtime->GetSystemThreadGroup(),
                                     !runtime->IsAotCompiler()));

  Thread* self = Thread::Current();
  DCHECK_NE(self->GetState(), kRunnable);
  {
    MutexLock mu(self, signal_catcher->lock_);
    signal_catcher->thread_ = self;
    signal_catcher->cond_.Broadcast(self);
  }

  // Set up mask with signals we want to handle.
  SignalSet signals;
  //(2) 接收SIGQUIT信号
  signals.Add(SIGQUIT);
  signals.Add(SIGUSR1);
  //(3)循环等待SIGQUIT信号,当有信号到来时,WaitForSignal方法会返回
  while (true) {
    //
    int signal_number = signal_catcher->WaitForSignal(self, signals);
    if (signal_catcher->ShouldHalt()) {
      runtime->DetachCurrentThread();
      return nullptr;
    }

    switch (signal_number) {
    case SIGQUIT:
      signal_catcher->HandleSigQuit();
      break;
    case SIGUSR1:
      signal_catcher->HandleSigUsr1();
      break;
    default:
      LOG(ERROR) << "Unexpected signal %d" << signal_number;
      break;
    }
  }
}
  • AttachCurrentThread

    可参考JNI Tips:所有的线程都是Linux线程,都有linux内核统一调度。他们通常都是由托管代码(Java或Kotlin)启动(通过使用Thread.start),但它们也能够在其他任何地方创建,然后连接(attach)到JavaVM。例如,一个用pthread_create启动的线程能够使用JNI AttachCurrentThread 或 AttachCurrentThreadAsDaemon函数连接到JavaVM。在一个线程成功连接(attach)之前,它没有JNIEnv,不能够调用JNI函数。

接受到SIGQUIT信号后在HandleSigQuit方法中去处理,这里主要调用DumpForSigQuit

/art/runtime/runtime.cc

void Runtime::DumpForSigQuit(std::ostream& os) {
  GetClassLinker()->DumpForSigQuit(os);//打印ClassLoader信息
  GetInternTable()->DumpForSigQuit(os);
  GetJavaVM()->DumpForSigQuit(os);//虚拟机信息
  GetHeap()->DumpForSigQuit(os);//堆信息,对象分配情况
  oat_file_manager_->DumpForSigQuit(os);
  if (GetJit() != nullptr) {
    GetJit()->DumpForSigQuit(os);
  } else {
    os << "Running non JIT\n";
  }
  DumpDeoptimizations(os);
  TrackedAllocators::Dump(os);
  os << "\n";

  thread_list_->DumpForSigQuit(os);//线程调用栈,这里首先suspendall 再打印堆栈
  BaseMutex::DumpAll(os);

  // Inform anyone else who is interested in SigQuit.
  {
    ScopedObjectAccess soa(Thread::Current());
    callbacks_->SigQuit();
  }
}

打印ClassLoader,堆,GC以及线程堆栈信息,除了我们平时最关心的线程堆栈,其实还有很多信息,我们以堆为例看下在anr文件中是什么样子的
/art/runtime/gc/heap.cc

void Heap::DumpForSigQuit(std::ostream& os) {
  os << "Heap: " << GetPercentFree() << "% free, " << PrettySize(GetBytesAllocated()) << "/"
     << PrettySize(GetTotalMemory()) << "; " << GetObjectsAllocated() << " objects\n";
  DumpGcPerformanceInfo(os);
}
----- pid 18841 at 2020-07-01 11:54:25 -----
Cmd line: com.dunkinbrands.otgo
Build fingerprint: 'TCL/Alcatel_5002R/Seoul_ATT:10/QP1A.190711.020/vGZA4-0:user/release-keys'
ABI: 'arm'
Build type: optimized
Zygote loaded classes=9140 post zygote classes=8201
Dumping registered class loaders
#0 dalvik.system.PathClassLoader: [], parent #1
#1 java.lang.BootClassLoader: [], no parent
#2 dalvik.system.PathClassLoader: [/data/app/com.dunkinbrands.otgo--NP2B6q6I0_yAuykgwOpBA==/base.apk:/data/app/com.dunkinbrands.otgo--NP2B6q6I0_yAuykgwOpBA==/base.apk!classes2.dex], parent #1
#3 dalvik.system.InMemoryDexClassLoader: [/data/user/0/com.dunkinbrands.otgo/[email protected]], parent #2
#4 dalvik.system.InMemoryDexClassLoader: [/data/user/0/com.dunkinbrands.otgo/[email protected]], parent #2
#5 dalvik.system.DexClassLoader: [/data/user/0/com.dunkinbrands.otgo/cache/generated-3F92340A3D311257BAA79F7EDB761B9E6E665EE8.jar], parent #2
#6 dalvik.system.PathClassLoader: [/data/app/com.google.android.gms-7yZdufzTzp3vlSkd2gcjDQ==/base.apk:/data/app/com.google.android.gms-7yZdufzTzp3vlSkd2gcjDQ==/base.apk!classes2.dex:/data/app/com.google.android.gms-7yZdufzTzp3vlSkd2gcjDQ==/base.apk!classes3.dex:/data/app/com.google.android.gms-7yZdufzTzp3vlSkd2gcjDQ==/base.apk!classes4.dex:/data/app/com.google.android.gms-7yZdufzTzp3vlSkd2gcjDQ==/base.apk!classes5.dex:/data/app/com.google.android.gms-7yZdufzTzp3vlSkd2gcjDQ==/base.apk!classes6.dex], parent #1
#7 dalvik.system.PathClassLoader: [/data/app/com.google.android.trichromelibrary_410410680-CHXkMAKnKoz7r9p8q3qhLw==/base.apk], parent #1
#8 dalvik.system.PathClassLoader: [/data/app/com.google.android.webview-3b4RqE2Pf-rc8rUq7I6Mag==/base.apk], parent #1
#9 dalvik.system.PathClassLoader: [/system/framework/org.apache.http.legacy.jar], parent #1
#10 com.google.android.gms.dynamite.zzh: [/data/user_de/0/com.google.android.gms/app_chimera/m/0000000c/DynamiteLoader.apk], parent #0
#11 dalvik.system.DelegateLastClassLoader: [/data/user_de/0/com.google.android.gms/app_chimera/m/00000010/MapsDynamite.apk], parent #2
#12 dalvik.system.DelegateLastClassLoader: [/data/user_de/0/com.google.android.gms/app_chimera/m/0000000f/GoogleCertificates.apk], parent #2
Done dumping class loaders
Intern table: 49095 strong; 2443 weak
JNI: CheckJNI is off; globals=983 (plus 110 weak)
Libraries: /data/app/com.dunkinbrands.otgo--NP2B6q6I0_yAuykgwOpBA==/lib/arm/libag3.so /data/app/com.google.android.gms-7yZdufzTzp3vlSkd2gcjDQ==/lib/arm/libconscrypt_gmscore_jni.so /data/app/com.google.android.trichromelibrary_410410680-CHXkMAKnKoz7r9p8q3qhLw==/base.apk!/lib/armeabi-v7a/libmonochrome.so /data/user/0/com.dunkinbrands.otgo/files/libcrashreport.so /system/lib/libwebviewchromium_plat_support.so libandroid.so libcompiler_rt.so libdcfdecoderjni.so libjavacore.so libjavacrypto.so libjnigraphics.so libmedia_jni.so libopenjdk.so libsoundpool.so libwebviewchromium_loader.so (15)
Heap: 6% free, 38MB/41MB; 657564 objects
... ...

总结

本文主要主要介绍了如下内容:

  • 以广播超时和输入时间处理超时为例介绍了ANR发生场景,这里简单介绍了有序和无序广播实现原理,而只有有序广播才会存在触发ANR的情况,输入事件超时利用的是判断waitQueue中事件处理的时间,即应用处理事件反馈时长
  • 符合触发ANR 条件后,AMS是如何处理的,主要包括Log中打印ANR原因以及CPU Info,创建和打印trace文件,弹出dialog杀应用进程等,这里重点关注AMS通知应用进程dump进程信息是通过信号通信方式,及发送SIGQUIT
  • 应用进程创建时会创建一个SignalCatcher线程,SignalCatcher线程监听到SIGQUIT,首先suspendall 再打印线程堆栈等信息

你可能感兴趣的:(ANR)