ANR mechanism and problem analysis

Keywords: Android

https://duanqz.github.io/2015-10-12-ANR-Analysis
1. Overview

ANR(Application Not Responding) is a simple definition, but it covers many design ideas of Android system.

Firstly, ANR belongs to the category of application, which is different from SNR (system not responding). The problem reflected by SNR is that the system_server has lost its response ability, and ANR clearly delineates the problem in the application. SNR is guaranteed by the Watchdog mechanism. For details, please refer to Watchdog mechanism and problem analysis ; ANR is guaranteed by the message processing mechanism. Android implements a sophisticated mechanism to discover ANR at the system layer. The core principle is message scheduling and timeout processing.

Secondly, the main body of anr mechanism is implemented at the system layer. All messages related to anr are scheduled by the system_server, and then sent to the application process to complete the actual processing of messages. At the same time, the system process designs different timeout limits to track the processing of messages. Once the application processes messages improperly, the timeout limit works. It collects some system status, such as CPU/IO usage, process function call stack, and reports whether the user has a process response (ANR dialog box).

Then, the essence of ANR problem is a performance problem. The ANR mechanism actually restricts the main thread of the application. It requires the main thread to complete some of the most common operations (starting services, processing broadcasts and processing inputs) within a limited time. If the processing times out, it is considered that the main thread has lost the ability to respond to other operations. Time consuming operations in the main thread, such as intensive CPU operation, a large amount of IO, complex interface layout, etc., will reduce the responsiveness of the application.

Finally, some ANR problems are difficult to analyze. Sometimes the message scheduling fails due to some effects at the bottom of the system, and the problem scenario is difficult to reproduce. This kind of ANR problem often takes a lot of time to understand some behaviors of the system, which is beyond the scope of ANR mechanism itself.

2. ANR mechanism

To analyze some primary ANR problems, you only need to simply understand the final output log. However, for some ANRS caused by system problems (such as excessive CPU load and process deadlock), you need to understand the whole ANR mechanism in order to locate the cause of the problem.

The ANR mechanism can be divided into two parts:

  • Monitoring of ANR. Android has a set of monitoring mechanisms for different anr types (Broadcast, Service, InputEvent).

  • Report of ANR. After anr is monitored, the anr dialog box and output log (process function call stack, CPU usage, etc. when anr occurs) need to be displayed.

The code of the whole ANR mechanism also spans several layers of Android:

  • App layer: processing logic of application main thread

  • Framework layer: the core of ANR mechanism

    <ul>
      <li><a href="https://android.googlesource.com/platform/frameworks/base/+/master/services/core/java/com/android/server/am/ActivityManagerService.java">frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java</a></li>
      <li><a href="https://android.googlesource.com/platform/frameworks/base/+/master/services/core/java/com/android/server/am/BroadcastQueue.java">frameworks/base/services/core/java/com/android/server/am/BroadcastQueue.java</a></li>
      <li><a href="https://android.googlesource.com/platform/frameworks/base/+/master/services/core/java/com/android/server/am/ActiveServices.java">frameworks/base/services/core/java/com/android/server/am/ActiveServices.java</a></li>
      <li><a href="https://android.googlesource.com/platform/frameworks/base/+/master/services/core/java/com/android/server/input/InputManagerService.java">frameworks/base/services/core/java/com/android/server/input/InputManagerService.java</a></li>
      <li><a href="https://android.googlesource.com/platform/frameworks/base/+/master/services/core/java/com/android/server/wm/InputMonitor.java">frameworks/base/services/core/java/com/android/server/wm/InputMonitor.java</a></li>
      <li><a href="https://android.googlesource.com/platform/frameworks/base/+/master/core/java/android/view/InputChannel.java">frameworks/base/core/java/android/view/InputChannel</a></li>
      <li><a href="https://android.googlesource.com/platform/frameworks/base/+/master/core/java/com/android/internal/os/ProcessCpuTracker.java">frameworks/base/services/core/java/com/android/internal/os/ProcessCpuTracker</a></li>
    </ul>
    
  • Native layer: input event dispatch mechanism. ANR for InputEvent type

    <ul>
      <li><a href="https://android.googlesource.com/platform/frameworks/base/+/master/services/core/jni/com_android_server_input_InputManagerService.cpp">frameworks/base//services/core/jni/com_android_server_input_InputManagerService.cpp</a></li>
      <li><a href="https://android.googlesource.com/platform/frameworks/native/+/master/services/inputflinger/InputDispatcher.cpp">frameworks/native/services/inputflinger/InputDispatcher.cpp</a></li>
    </ul>
    
  • Next, we will go deep into the source code and analyze the monitoring and reporting process of ANR.

    2.1 monitoring mechanism of anr

    2.1.1 Service processing timeout

    The Service runs on the main thread of the application. If the execution time of the Service exceeds 20 seconds, ANR will be raised.

    When a Service ANR occurs, you can generally check whether there are time-consuming operations, such as complex operations and IO operations, in the Service Lifecycle functions (onCreate(), onStartCommand(), etc.). If the code logic of the application does not find any problems, it is necessary to deeply check the current system status: CPU usage, system service status, etc. to judge whether the ANR process at that time is affected by the abnormal operation of the system.

    How to detect Service timeout? Android is implemented by setting timing messages. Timing messages are processed by the message queue of AMS (ActivityManager thread of system_server). AMS has context information for Service running, so it is reasonable to set up a set of timeout detection mechanism in AMS.

    The Service ANR mechanism is relatively simple, and the main body is implemented in ActiveServices Yes. When the Service life cycle begins, bumpServiceExecutingLocked() will be called, followed by scheduleServiceTimeoutLocked():

    void scheduleServiceTimeoutLocked(ProcessRecord proc) {
        ...
        Message msg = mAm.mHandler.obtainMessage(
                ActivityManagerService.SERVICE_TIMEOUT_MSG);
        msg.obj = proc;
        // Throw a timing message through AMS.MainHandler
        mAm.mHandler.sendMessageAtTime(msg,
             proc.execServicesFg ? (now+SERVICE_TIMEOUT) : (now+ SERVICE_BACKGROUND_TIMEOUT));
    }

    The above method throws a timing message service through AMS.MainHandler_ TIMEOUT_ MSG:

    • Execute Service in the foreground process, and the timeout is SERVICE_TIMEOUT(20 seconds)
    • Execute Service in the background process, and the timeout is SERVICE_BACKGROUND_TIMEOUT(200 seconds)

    When the Service life cycle ends, the serviceDoneExecutingLocked() method will be called, and the previously thrown SERVICE_TIMEOUT_MSG messages are cleared in this method. If within the timeout period, Service_ TIMEOUT_ If MSG is not cleared, AMS.MainHandler will respond to this message:

    case SERVICE_TIMEOUT_MSG: {
        // Judge whether the dexopt operation is being performed. The operation is time-consuming and can be extended by another 20 seconds
        if (mDidDexOpt) {
            mDidDexOpt = false;
            Message nmsg = mHandler.obtainMessage(SERVICE_TIMEOUT_MSG);
            nmsg.obj = msg.obj;
            mHandler.sendMessageDelayed(nmsg, ActiveServices.SERVICE_TIMEOUT);
            return;
        }
        mServices.serviceTimeout((ProcessRecord)msg.obj);
    } break;

    If you are not doing the dexopt operation, ActiveServices.serviceTimeout() will be called:

    void serviceTimeout(ProcessRecord proc) {
        ...
        final long maxTime =  now -
                  (proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT);
        ...
        // Looking for a Service that timed out
        for (int i=proc.executingServices.size()-1; i>=0; i--) {
            ServiceRecord sr = proc.executingServices.valueAt(i);
            if (sr.executingStart < maxTime) {
                timeout = sr;
                break;
            }
           ...
        }
        ...
        // Judge whether the process executing the Service timeout is in the recently running process list. If not, ignore this ANR
        if (timeout != null && mAm.mLruProcesses.contains(proc)) {
            anrMessage = "executing service " + timeout.shortName;
        }
        ...
        if (anrMessage != null) {
            mAm.appNotResponding(proc, null, null, false, anrMessage);
        }
    }

    The above method will find the Service whose current process has timed out. After some judgment, it decides to report ANR, and finally calls AMS.appNotResponding() method. At this stage, the ANR mechanism has completed the task of monitoring and reporting, and the remaining task is the output of ANR results, which we call the ANR reporting mechanism. The reporting mechanism of ANR is completed through AMS. Appnotresponding(). ANRS of Broadcast and InputEvent types will eventually call this method. We will expand it in detail later.

    So far, we have analyzed the ANR mechanism of Service:

    Track the operation of the Service through the timing message. When the timing message is responded, it indicates that the Service has not been completed, which means Service ANR.

    2.1.2 Broadcast processing timeout

    The application can register the broadcast receiver and implement the BroadcastReceiver.onReceive() method to complete the broadcast processing. Usually, this method is executed in the main thread. Android limits its execution time to no more than 10 seconds. Otherwise, ANR will be triggered.

    onReceive() can also be scheduled to be executed on other threads. Register the broadcast receiver through the method Context.registerReceiver(BroadcastReceiver, IntentFilter, String, Handler). You can specify a Handler to handle and schedule onReceive() to be executed on a non main thread.

    Here's the question:

    1. How does Android deliver broadcasts to applications?
    2. How does Android detect broadcast processing timeout?

    Scheduling of broadcast messages

    AMS maintains two broadcast queues BroadcastQueue:

    • Foreground queue, the timeout of the foreground queue is 10 seconds
    • Background queue, the timeout time of the background queue is 60 seconds

    There are two because of the different timeout times to be distinguished. All sent broadcasts will enter the queue for scheduling. When sending broadcasts, you can use intent.flag_ RECEIVER_ The foreround parameter posts the broadcast to the foreground queue. AMS thread will continuously take out broadcast messages from the queue and send them to each receiver (BroadcastReceiver). When broadcasting is to be dispatched, AMS will call the broadcastqueue. Scheduleroadcastlocked() method:

    public void scheduleBroadcastsLocked() {
        ...
        if (mBroadcastsScheduled) {
            return;
        }
        mHandler.sendMessage(mHandler.obtainMessage(BROADCAST_INTENT_MSG, this));
        mBroadcastsScheduled = true;
    }

    In the above method, the broadcast is sent to the message queue of the AMS thread_ INTENT_ MSG message, from which you can also see that the AMS thread (the ActivityManager thread in the system_server process) actually sends the broadcast. Since the above methods may be called concurrently, the variable mBroadcastsScheduled is used to identify the broadcast_ INTENT_ Whether MSG has been received by AMS thread. When the thrown message has not been accepted, it does not need to be thrown again. The processing logic after the message is received is as follows:

    public void handleMessage(Message msg) {
        switch (msg.what) {
            case BROADCAST_INTENT_MSG: {
                ...
                processNextBroadcast(true);
            } break;
            ...
        }
    }

    Directly call the BroadcastQueue.processNextBroadcast() method. If the fromMsg parameter is true, it means that this is from broadcast_ INTENT_ Dispatch request of MSG message. BroadcastQueue.processNextBroadcast() is the core function for sending broadcast messages. Naturally, the amount of code is not small. We will analyze it in several parts:

    // processNextBroadcast Part 1: processing non serial broadcast messages
    final void  processNextBroadcast(boolean fromMsg) {
        ...
        // 1. Set mBroadcastsScheduled
        if (fromMsg) {
            mBroadcastsScheduled = false;
        }
        // 2. Process "parallel broadcast message"
        while (mParallelBroadcasts.size() > 0) {
            ...
            final int N = r.receivers.size();
            for (int i=0; i<N; i++) {
                Object target = r.receivers.get(i);
                deliverToRegisteredReceiverLocked(r, (BroadcastFilter)target, false);
            }
            addBroadcastToHistoryLocked(r);
        }
        // 3. Handle blocked broadcast messages
        if (mPendingBroadcast != null) {
            ...
            if (!isDead) {
                // isDead indicates the survival status of the process currently broadcasting messages
                // If you are still alive, return this function and continue to wait for the next dispatch
                return;
            }
            ...
        }
    //Unfinished to be continued

    The first part deals with non serial broadcast messages, which has the following steps:

    1. Set mBroadcastsScheduled. As mentioned earlier, this variable controls the BROADCAST_INTENT_MSG. If it is in response to the dispatch call of BROADCAST_INTENT_MSG, set mBroadcastsScheduled to false, indicating that this BROADCAST_INTENT_MSG has been processed, and you can continue to throw the next BROADCAST_INTENT_MSG message

    2. Process "parallel broadcast messages". Broadcast receivers can be divided into "dynamic" and "static". Broadcast receivers registered through Context.registerReceiver() are "dynamic" and broadcast receivers registered through AndroidManifest.xml are "static". Broadcast messages can be divided into "parallel" and "serial", and "parallel broadcast messages" will be sent to "dynamic" receivers, "Serial broadcast message" will be distributed to two receivers according to the actual situation. We will not explore why Android is designed like this, but only focus on the difference between the two broadcast message distribution. Two queues are maintained in the BroadcastQueue:

      <ul>
        <li>
          <p><strong>mParallelBroadcasts</strong>,"Parallel broadcast messages will be queued in this queue. Parallel broadcast messages can be distributed at one time, that is, broadcast will be distributed to all "dynamic" receivers in a cycle</p>
        </li>
        <li>
          <p><strong>mOrderedBroadcasts</strong>,"Serial broadcast messages are queued in this queue. "Serial broadcast message" needs to be sent in turn. It will be thrown after a receiver has processed it BROADCAST_INTENT_MSG News,
      

    Enter BroadcastQueue.processNextBroadcast() again to process the next



  • Handle blocked broadcast messages. Sometimes a broadcast message cannot be distributed, and the broadcast message will be saved in the mPendingBroadcast variable. When a new round of distribution starts, it will judge whether the process receiving the message is still alive. If the process receiving the message is still alive, it will continue to wait. Otherwise, abandon the broadcast message

  • The next is the most complex part, which deals with "serial broadcast messages". Anr monitoring mechanism only plays a role in this kind of broadcast messages, that is to say, "parallel broadcast messages" will not have ANR.

    // processNextBroadcast section 2: extract "serial broadcast message" from queue
        do {
            r = mOrderedBroadcasts.get(0);
            // 1. The first ANR monitoring mechanism for broadcast messages
            if (mService.mProcessesReady && r.dispatchTime > 0) {
                if ((numReceivers > 0) &&
                    (now > r.dispatchTime + (2*mTimeoutPeriod*numReceivers))) {
                    broadcastTimeoutLocked(false); // forcibly finish this broadcast
                    ...
            }
            // 2. Judge whether the broadcast message has been processed
            if (r.receivers == null || r.nextReceiver >= numReceivers ||
                r.resultAbort || forceReceive) {
                ...
                cancelBroadcastTimeoutLocked();
                ...
                mOrderedBroadcasts.remove(0);
                continue;
            }
    
    <span class="o">}</span> <span class="k">while</span> <span class="o">(</span><span class="n">r</span> <span class="o">==</span> <span class="kc">null</span><span class="o">);</span>
    

    //Unfinished to be continued

    This part is a do while loop. Each time, the first broadcast message is taken from the M ordered broadcasts queue for processing. The first Broadcast ANR monitoring mechanism has finally appeared:

    1. Determine whether the current time has exceeded r.dispatchTime + 2 × mTimeoutPeriod × numReceivers:

      <ul>
        <li>
          <p>dispatchTime Indicates the time when this series of broadcast messages begin to be distributed. The "serial broadcast message" is distributed one by one. After one receiver completes processing, it starts processing the next message distribution.
      

    dispatchTime is the time when the first receiver is dispatched. The dispatchTime will not be set until the broadcast message is sent, that is, when you enter processNextBroadcast() for the first time,
    dispatchTime=0, and the condition judgment will not be entered




  • mTimeoutPeriod is determined by the type of the current BroadcastQueue (forground is 10 seconds and background is 60 seconds). This time is set when the BroadcastQueue is initialized,
    The original intention is to limit the time for each Receiver to process the broadcast. Here, it is used to calculate the timeout



  • <p>Suppose a broadcast message has two receivers, mTimeoutPeriod It's 10 seconds when 2×10×2=40 Seconds later, the broadcast message is called before it is processed<strong>broadcastTimeoutLocked()</strong>method,
    

    This method will judge whether ANR has occurred at present, and we will analyze it later.

  • If the broadcast message has been processed, it is removed from the msordered broadcasts and recycled to process the next one; Otherwise, it will jump out of the loop.

  • The main task completed by the above code block is to take a "serial broadcast message" from the queue, and then prepare for distribution:

    // processNextBroadcast Part 3: second ANR monitoring mechanism for serial broadcast messages
        r.receiverTime = SystemClock.uptimeMillis();
        ...
        if (! mPendingBroadcastTimeoutMessage) {
            long timeoutTime = r.receiverTime + mTimeoutPeriod;
            ...
            setBroadcastTimeoutLocked(timeoutTime);
        }
    //Unfinished to be continued

    After taking out the "serial broadcast message", once the distribution is to start, the second ANR detection mechanism appears. The mPendingBroadcastTimeoutMessage variable is used to identify whether there is a blocked timeout message. If not, call BroadcastQueue.setBroadcastTimeoutLocked():

    final void setBroadcastTimeoutLocked(long timeoutTime) {
        if (! mPendingBroadcastTimeoutMessage) {
            Message msg = mHandler.obtainMessage(BROADCAST_TIMEOUT_MSG, this);
            mHandler.sendMessageAtTime(msg, timeoutTime);
            mPendingBroadcastTimeoutMessage = true;
        }
    }

    By setting a timing message BROADCAST_TIMEOUT_MSG to track the execution of the current broadcast message. This timeout monitoring mechanism is very similar to Service ANR and is also a message queue thrown to AMS thread. If all receivers have finished processing, cancelBroadcastTimeoutLocked() will be called to clear the message; Otherwise, the message will respond and call broadcastTimeoutLocked(). This method has been called in the first ANR monitoring mechanism, and the second ANR monitoring mechanism will also be called. We will leave it for later analysis.

    After setting the timing message, the broadcast message will be sent. First, the "dynamic" receiver:

    // processNextBroadcast section 4: dispatch broadcast messages to "dynamic" receivers
        final Object nextReceiver = r.receivers.get(recIdx);
        // The type of dynamic receivers is BroadcastFilter
        if (nextReceiver instanceof BroadcastFilter) {
            BroadcastFilter filter = (BroadcastFilter)nextReceiver;
            deliverToRegisteredReceiverLocked(r, filter, r.ordered);
            ...
            return;
        }
    //Unfinished to be continued

    The carrier process of a "dynamic" receiver is generally running, so sending messages to this type of receiver is relatively simple. Call BroadcastQueue.deliverToRegisteredReceiverLocked() to complete the next work. However, the "static" receiver is registered in AndroidManifest.xml. When it is distributed, the carrier process of the broadcast receiver may not be started, so this scenario will be much more complex.

    // processNextBroadcast section 5: dispatch broadcast messages to "static" receivers
        // The type of static receivers is ResolveInfo
        ResolveInfo info = (ResolveInfo)nextReceiver;
        ...
        // 1. Permission check
        ComponentName component = new ComponentName(
                    info.activityInfo.applicationInfo.packageName,
                    info.activityInfo.name);
        int perm = mService.checkComponentPermission(info.activityInfo.permission,
                    r.callingPid, r.callingUid, info.activityInfo.applicationInfo.uid,
                    info.activityInfo.exported);
        ...
        // 2. Get the process where the receiver is located
        ProcessRecord app = mService.getProcessRecordLocked(targetProcess,
                    info.activityInfo.applicationInfo.uid, false);
        // 3. The process has started
        if (app != null && app.thread != null) {
           ...
           processCurBroadcastLocked(r, app);
           return;
        }
        // 4. The process has not started yet
        if ((r.curApp=mService.startProcessLocked(targetProcess,
                    info.activityInfo.applicationInfo, true,
                    r.intent.getFlags() | Intent.FLAG_FROM_BACKGROUND,
                    "broadcast", r.curComponent,
                    (r.intent.getFlags()&Intent.FLAG_RECEIVER_BOOT_UPGRADE) != 0, false, false))
                            == null) {
            ...
            scheduleBroadcastsLocked();
            return;
        }
        // 5. Process startup failed
        mPendingBroadcast = r;
        mPendingBroadcastRecvIndex = recIdx;
    }
    // End of processNextBroadcast
    1. The "static" receiver is ResolveInfo. You need to obtain package information through PackageManager to check permissions. The content of permission check is very large. The table is not here.

    2. After a series of complex permission checks, it can finally be distributed to the target receiver. Get the process information of the broadcast receiver through AMS.getProcessRecordLocked()

    3. If app.thread= null, the process has been started, and you can call BroadcastQueue.processCurBroadcastLocked() for the next dispatch processing

    4. If the process has not been started, you need to start the process through AMS.startProcessLocked(). The current message has not been sent. Call broadcastqueue. Scheduleroadcastlocked() to enter the next scheduling

    5. If the process fails to start, the current message will be recorded as mPendingBroadcast, that is, a blocked broadcast message, waiting for processing at the next scheduling

    The huge processNextBroadcast() is finally finished. Its function is to schedule broadcast messages. This method is designed to deal with the processing of different broadcast messages and receivers.

    Cross process delivery of broadcast messages

    Scheduling is complete. Next, let's analyze how the scheduled broadcast message reaches the application. In the above analysis, there are finally two methods to distribute broadcast messages: BroadcastQueue.deliverToRegisteredReceiverLocked() and BroadcastQueue.processCurBroadcastLocked().

    Let's not expand the logic of these two functions, but imagine that we want to broadcast messages from the system where the AMS thread is located_ How to implement the process passed from the server process to the application process? Naturally, cross process calls are needed. The most common method in Android is the Binder mechanism. Yes, that's how broadcast messages are sent to application processes.

    When the application has been started (app. Thread! = null), a cross process call will be initiated through iaapplicationthread. The calling relationship is as follows:

    ActivityThread.ApplicationThread.scheduleReceiver()
    └── ActivityThread.handleReceiver()
        └── BroadcastReceiver.onReceive()
    

    If the application is not started, the IIntentReceiver will be called by calling the cross process call. The implementation of the application process in LoadedApk.ReceiverDispatcher.IntentReceiver will be as follows:

    LoadedApk.ReceiverDispatcher.IntentReceiver.performReceive()
    └── LoadedApk.ReceiverDispatcher.performReceiver()
        └── LoadedApk.ReceiverDispatcher.Args.run()
            └── BroadcastReceiver.onReceive()
    

    Finally, it will call BroadcastReceiver.onReceive() to perform the specific action of receiving broadcast messages in the application process. For "serial broadcast message", the system needs to be notified after execution_ The server process can continue to send broadcast messages to the next receiver, which requires cross process calls. After processing the broadcast message, that is, after the execution of BroadcastReceiver.onReceive(), the application process will call BroadcastReceiver.PendingResult.finish(), and the next call relationship is as follows:

    BroadcastReceiver.PendingResult.finish()
    └── BroadcastReceiver.PendingResult.sendFinished()
        └── IActivityManager.finishReceiver()
            └── ActivityManagerService.finishReceiver()
                └── BroadcastQueue.processNextBroadcat()
    

    Through IActivityManager, an application process to system is initiated_ The call of the server process finally goes to BroadcastQueue.processNextBroadcat() in the AMS thread to start the next round of scheduling.

    broadcastTimeoutLocked() method

    As mentioned earlier, both ANR mechanisms will eventually call the BroadcastQueue.broadcastTimeoutLocked() method. When the first ANR monitoring takes effect, it will set fromMsg to false; When the second type of ANR monitoring takes effect, the fromMsg parameter will be set to True, indicating that it is currently responding to BROADCAST_TIMEOUT_MSG message.

    final void broadcastTimeoutLocked(boolean fromMsg) {
        // 1. Set mPendingBroadcastTimeoutMessage
        if (fromMsg) {
            mPendingBroadcastTimeoutMessage = false;
        }
        ...
        // 2. Judge whether the second ANR mechanism times out
        BroadcastRecord r = mOrderedBroadcasts.get(0);
        if (fromMsg) {
            long timeoutTime = r.receiverTime + mTimeoutPeriod;
            if (timeoutTime > now) {
                setBroadcastTimeoutLocked(timeoutTime);
                return;
            }
        }
        ...
        // 3. If it has timed out, end the current receiver and start a new round of scheduling
        finishReceiverLocked(r, r.resultCode, r.resultData,
                    r.resultExtras, r.resultAbort, false);
        scheduleBroadcastsLocked();
    
    <span class="c1">// 4. Throw the message of drawing ANR dialog box</span>
    <span class="k">if</span> <span class="o">(</span><span class="n">anrMessage</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">)</span> <span class="o">{</span>
        <span class="n">mHandler</span><span class="o">.</span><span class="na">post</span><span class="o">(</span><span class="k">new</span> <span class="n">AppNotResponding</span><span class="o">(</span><span class="n">app</span><span class="o">,</span> <span class="n">anrMessage</span><span class="o">));</span>
    <span class="o">}</span>
    

    }

    1. mPendingBroadcastTimeoutMessage identifies whether there is an unhandled BROADCAST_TIMEOUT_MSG message, set it to false, and allow to continue to throw BROADCAST_TIMEOUT_MSG message

    2. Every time the broadcast is sent to the receiver, the r.receiverTime will be updated. If it is judged that the broadcast has not timed out, another broadcast will be thrown_ TIMEOUT_ MSG message. Normally, the broadcast will not be cleared until all receivers are processed_ TIMEOUT_ MSG; Otherwise, the broadcast will be thrown every time the broadcast message is scheduled_ TIMEOUT_ MSG message

    3. If it is judged that the timeout has expired, indicating that the current broadcast receiver has not finished processing, the current receiver will be terminated and a new round of broadcast scheduling will be started

    4. Finally, the message to draw the ANR dialog box is issued

    So far, we have answered the two questions raised above:

    AMS maintains the broadcast queue BroadcastQueue. AMS threads continuously take messages from the queue for scheduling to complete the distribution of broadcast messages. When distributing "serial broadcast message", a timing message broadcast will be thrown_ TIMEOUT_ MSG, AMS will clear the timing message after the broadcast receiver completes processing. If broadcast_ TIMEOUT_ When MSG gets a response, it will judge whether the broadcast message processing times out, and finally notify the ANR of the occurrence.

    2.1.3 Input processing timeout

    The application can receive input events (keys, touch screen, trackball, etc.), and ANR will be raised when it is not processed within 5 seconds.

    If Broadcast ANR is the same, we throw out several questions about Input ANR:

    1. What processes do input events go through before they can be sent to the application interface?
    2. How do I detect an input time processing timeout?

    Input events are initially initiated by hardware devices (such as keys or touch screens). Android has a set of input subsystem to find various input events, which will eventually be deleted InputDispatcher Distribute to each window that needs to receive events. So how does the window tell the InputDispatcher that it needs to handle input events? Android pass InputChannel Connect the InputDispatcher and the window. The InputChannel is actually a encapsulated Linux pipe. Each window will have an independent InputChannel, which needs to be registered in the InputDispatcher:

    status_t InputDispatcher::registerInputChannel(const sp<InputChannel>& inputChannel,
            const sp<InputWindowHandle>& inputWindowHandle, bool monitor) {
        ...
        sp<Connection> connection = new Connection(inputChannel, inputWindowHandle, monitor);
        int fd = inputChannel->getFd();
        mConnectionsByFd.add(fd, connection);
        ...
        mLooper->addFd(fd, 0, ALOOPER_EVENT_INPUT, handleReceiveCallback, this);
        ...
        mLooper->wake();
        return OK;
    }

    For InputDispatcher, each registered InputChannel is regarded as a Connection, which is distinguished by file descriptor. InputDispatcher is a message processing loop. When there is a new Connection, it needs to wake up the message loop queue for processing.

    There are many types of input events, such as keys, trackball, touch screen, etc. Android classifies these events, and the window handling these events is also given a type (targetType): Focused or Touched. If the current input event is a key type, look for a Focused window; If the current input event type is a touch type, look for a window of the Touched type. InputDispatcher needs to go through the following complex calling relationships before sending an input event (the calling relationship takes key events as an example, and the calling relationship of touch screen events is similar):

    InputDispatcherThread::threadLoop()
    └── InputDispatcher::dispatchOnce()
        └── InputDispatcher::dispatchOnceInnerLocked()
            └── InputDispatcher::dispatchKeyLocked()
                └── InputDispatcher::dispatchEventLocked()
                    └── InputDispatcher::prepareDispatchCycleLocked()
                        └── InputDispatcher::enqueueDispatchEntriesLocked()
                            └── InputDispatcher::startDispatchCycleLocked()
                                └── InputPublisher::publishKeyEvent()
    

    The implementation logic of each function is not shown here. We extracted several key points:

    • Inputdispatcher thread is a thread that handles the dispatch of a message
    • As a message, the input event needs to be queued for dispatch. Each Connection maintains two queues:
      • outboundQueue: wait for an event to be sent to the window. Each new message will enter this queue first
      • waitQueue: event that has been sent to the window
    • After the publishKeyEvent is completed, it indicates that the event has been dispatched, and the event is moved from outboundQueue to waitQueue

    After such a round of processing, even if the event is sent from the InputDispatcher, whether the event is received by the window still needs to wait for the "finished" Notification from the receiver. When registering InputChannel with InputDispatcher, a callback function handleReceiveCallback() will be registered at the same time:

    int InputDispatcher::handleReceiveCallback(int fd, int events, void* data) {
        ...
        for (;;) {
            ...
            status = connection->inputPublisher.receiveFinishedSignal(&seq, &handled);
            if (status) {
                break;
            }
            d->finishDispatchCycleLocked(currentTime, connection, seq, handled);
            ...
        }
        ...
        d->unregisterInputChannelLocked(connection->inputChannel, notify);
    }

    When the received status is OK, finishDispatchCycleLocked() will be called to complete the processing of a message:

    InputDispatcher::finishDispatchCycleLocked()
    └── InputDispatcher::onDispatchCycleFinishedLocked()
        └── InputDispatcher::doDispatchCycleFinishedLockedInterruptible()
            └── InputDispatcher::startDispatchCycleLocked()
    

    When calling the doDispatchCycleFinishedLockedInterruptible() method, the successfully dispatched messages will be removed from the waitQueue. Further calls will startDispatchCycleLocked to start dispatching new events.

    So far, we have answered the first question:

    A normal input event will go through the process of moving from outboundQueue to waitQueue, indicating that the message has been sent; After the process of removing from the waitQueue, it indicates that the message has been received by the window. As the hub, InputDispatcher keeps delivering input events. When an event cannot be handled, InputDispatcher cannot die, otherwise the system will collapse too easily. The strategy of InputDispatcher is to give up the events that cannot be handled, and send a notification (this notification mechanism is ANR) to continue the processing of the next round of messages.

    To understand the input event distribution model, we can give an example in life:
    Each input event can be compared to a courier. InputDispatcher is like a courier transit station, the window is like a recipient, and InputChannel is like a courier. All express will be processed in the transfer station. The transfer station needs to know who the recipient of each express is and send the express to the specific recipient through the courier. There are many scenarios in which the express delivery cannot be delivered in time: for example, the recipient cannot be contacted; There are many express, and the courier will be too busy; The courier is injured and on vacation... At this time, the courier needs to inform the transfer station that there is an express that cannot be delivered in time. After receiving the notice from the courier, the transfer station continues to send other express while reporting to the superior.

    After understanding the input event distribution model, we can see the ANR mechanism. When dispatching events, dispatchKeyLocked() and dispatchMotionLocked() need to find the current focus window, which is the final place to receive events. The process of finding the window will judge whether ANR has occurred:

    InputDispatcher::findFocusedWindowTargetsLocked()
    InputDispatcher::findTouchedWindowTargetsLocked()
    └── InputDispatcher::handleTargetsNotReadyLocked()
        └── InputDispatcher::onANRLocked()
            └── InputDispatcher::doNotifyANRLockedInterruptible()
                └── NativeInputManager::notifyANR()
    
    • First, findFocusedWindowTargetsLocked() or findTouchedWindowTargetsLocked() is called to find the window that receives the input event.

      <p>When the window is found, the<a href="https://android.googlesource.com/platform/frameworks/native/+/master/services/inputflinger/InputDispatcher.cpp#1633"><strong>checkWindowReadyForMoreInputLocked()</strong></a>
      

    Check whether the window is capable of receiving new input events, and a series of scenarios will hinder the continuous distribution of events:

    <ul>
      <li>
        <p><strong>Scenario 1:</strong> Window in paused Status, unable to process input events</p>
    
        <p>"Waiting because the [targetType] window is paused."</p>
      </li>
      <li>
        <p><strong>Scenario 2:</strong> The window has not been opened InputDispatcher Registration, unable to dispatch events to the window</p>
    
        <p>"Waiting because the [targetType] window's input channel is not
    

    registered with the input dispatcher. The window may be in the process
    of being removed."




  • Scenario 3: the connection between the window and the InputDispatcher has been interrupted, that is, the InputChannel cannot work normally

        <p>"Waiting because the [targetType] window's input connection is [status].
    
  • The window may be in the process of being removed."




  • Scenario 4: InputChannel is saturated and cannot process new events

        <p>"Waiting because the [targetType] window's input channel is full.
    
  • Outbound queue length: %d. Wait queue length: %d."




  • Scenario 5: for the input event of key type (KeyEvent), you need to wait for the last event to be processed

        <p>"Waiting to send key event because the [targetType] window has not
    
  • finished processing all of the input events that were previously
    delivered to it. Outbound queue length: %d. Wait queue length: %d."




  • Scenario 6: for the input event of touch type (TouchEvent), it can be immediately dispatched to the current window, because touchevents occur in the window currently visible to the user. But in one case,
    If the current application has an ANR due to too many input events waiting for distribution in the queue, the TouchEvent event needs to be queued for distribution.

        <p>"Waiting to send non-key event because the %s window has not
    
  • finished processing certain input events that were delivered to it over
    %0.1fms ago. Wait queue length: %d. Wait queue head age: %0.1fms."



  • Then, if any of the above scenarios occurs, the input event needs to continue to wait, and then handleTargetsNotReadyLocked() will be called to judge whether the wait has timed out:

  • int32_t InputDispatcher::handleTargetsNotReadyLocked(nsecs_t currentTime,
            const EventEntry* entry,
            const sp<InputApplicationHandle>& applicationHandle,
            const sp<InputWindowHandle>& windowHandle,
            nsecs_t* nextWakeupTime, const char* reason) {
        ...
        if (currentTime >= mInputTargetWaitTimeoutTime) {
            onANRLocked(currentTime, applicationHandle, windowHandle,
                entry->eventTime, mInputTargetWaitStartTime, reason);
            *nextWakeupTime = LONG_LONG_MIN;
            return INPUT_EVENT_INJECTION_PENDING;
        }
        ...
    }
    • Finally, if the current event dispatch has timed out, it indicates that ANR has been detected. Call onANRLocked() method, and then set nextWakeupTime to the minimum value to start the next round of scheduling immediately. stay onANRLocked() Method, some state information of ANR will be saved, and doNotifyANRLockedInterruptible() will be called to the JNI layer NativeInputManager::notifyANR() Method, its main function is to connect the Native layer and the Java layer, and directly call the InputManagerService.notifyANR() method of the Java layer.
    nsecs_t NativeInputManager::notifyANR(
        const sp<InputApplicationHandle>& inputApplicationHandle,
        const sp<InputWindowHandle>& inputWindowHandle,
        const String8& reason) {
        ...
        JNIEnv* env = jniEnv();
    
    <span class="c1">// Convert the application handle, window handle and ANR reason string to the object of Java layer
    

    jobject inputApplicationHandleObj =
    getInputApplicationHandleObjLocalRef(env, inputApplicationHandle);
    jobject inputWindowHandleObj =
    getInputWindowHandleObjLocalRef(env, inputWindowHandle);
    jstring reasonObj = env->NewStringUTF(reason.string());

    <span class="c1">// Call the InputManagerService.notifyANR() method of the Java layer
    

    jlong newTimeout = env->CallLongMethod(mServiceObj,
    gServiceClassInfo.notifyANR, inputApplicationHandleObj, inputWindowHandleObj,
    reasonObj);
    ...
    return newTimeout;
    }

    At this point, the processing logic of ANR is transferred to the Java layer. Once the bottom layer (Native) finds that there is an input event dispatch timeout, it will notify the upper layer (Java). After receiving the ANR notification, the upper layer will decide whether to terminate the dispatch of the current input event.

    When an ANR occurs, the initial entry of the Java layer is InputManagerService.notifyANR(), which is called directly by the Native layer. Let's close the Java layer calls of ANR:

    InputManagerService.notifyANR()
    └── InputMonitor.notifyANR()
        ├── IApplicationToken.keyDispatchingTimedOut()
        │   └── ActivityRecord.keyDispatchingTimedOut()
        │       └── AMS.inputDispatchingTimedOut()
        │           └── AMS.appNotResponding()
        │
        └── AMS.inputDispatchingTimedOut()
            └── AMS.appNotResponding()
    
    • InputManagerService.notifyANR() only defines an interface for the Native layer, which directly calls InputMonitor.notifyANR(). If the return value of the method is equal to 0, the current input event is discarded; If it is greater than 0, it indicates the time to continue waiting.
    public long notifyANR(InputApplicationHandle inputApplicationHandle,
          InputWindowHandle inputWindowHandle, String reason) {
        ...
        if (appWindowToken != null && appWindowToken.appToken != null) {
            // appToken is actually the current ActivityRecord.
            // If the Activity with ANR still exists, the event dispatch timeout will be notified directly through ActivityRecord
            boolean abort = appWindowToken.appToken.keyDispatchingTimedOut(reason);
            if (! abort) {
                return appWindowToken.inputDispatchingTimeoutNanos;
            }
        } else if (windowState != null) {
            // If the ANR Activity has been destroyed, the event dispatch timeout will be notified through AMS
            long timeout = ActivityManagerNative.getDefault().inputDispatchingTimedOut(
                            windowState.mSession.mPid, aboveSystem, reason);
             if (timeout >= 0) {
                 return timeout;
             }
        }
        return 0; // abort dispatching
    }
    • There are two different calling methods in the above methods, but they will eventually be handled by AMS.inputDispatchingTimedOut(). AMS has an overloaded inputDispatchingTimedOut() method with different parameters. When the ActivityRecord is called, more information can be passed in (which interface currently has an ANR).
    @Override
    public long inputDispatchingTimedOut(int pid, final boolean aboveSystem, String reason) {
        // 1. Obtain ProcessRecord according to the process number
        proc = mPidsSelfLocked.get(pid);
        ...
        // 2. Get timeout
        // The timeout in the test environment is instruction_ KEY_ DISPATCHING_ Timeout (60 seconds),
        // The timeout in normal environment is KEY_DISPATCHING_TIMEOUT(5 seconds)
        timeout = getInputDispatchingTimeoutLocked(proc);
        // Call the overloaded function. If True is returned, it means that the current event distribution needs to be interrupted;
        if (!inputDispatchingTimedOut(proc, null, null, aboveSystem, reason)) {
            return -1;
        }
        // 3. Return the waiting time. This value will be passed to the Native layer
        return timeout;
    }
    

    public boolean inputDispatchingTimedOut(final ProcessRecord proc,
    final ActivityRecord activity, final ActivityRecord parent,
    final boolean aboveSystem, String reason) {
    ...
    //1. The ANR process is in debugging state and no interrupt event is required
    if (proc.debugging) {
    return false;
    }
    //2. The dexopt operation is currently in progress, which is time-consuming and does not need to be interrupted
    if (mDidDexOpt) {
    // Give more time since we were dexopting.
    mDidDexOpt = false;
    return false;
    }
    //3. The process in which ANR occurs is a test process and needs to be interrupted, but ANR information judgment is not displayed on the UI interface
    if (proc.instrumentationClass != null) {
    ...
    finishInstrumentationLocked(proc, Activity.RESULT_CANCELED, info);
    return true;
    }

    <span class="c1">// 4. The notification UI displays ANR information</span>
    <span class="n">mHandler</span><span class="o">.</span><span class="na">post</span><span class="o">(</span><span class="k">new</span> <span class="n">Runnable</span><span class="o">()</span> <span class="o">{</span>
        <span class="nd">@Override</span>
        <span class="kd">public</span> <span class="kt">void</span> <span class="nf">run</span><span class="o">()</span> <span class="o">{</span>
            <span class="n">appNotResponding</span><span class="o">(</span><span class="n">proc</span><span class="o">,</span> <span class="n">activity</span><span class="o">,</span> <span class="n">parent</span><span class="o">,</span> <span class="n">aboveSystem</span><span class="o">,</span> <span class="n">annotation</span><span class="o">);</span>
        <span class="o">}</span>
    <span class="o">});</span>
    <span class="o">...</span>
    <span class="k">return</span> <span class="kc">true</span><span class="o">;</span>
    

    }

    So far, we have answered the second question:

    When InputDispatcher dispatches input events, it will look for the window to receive events. If it cannot be dispatched normally, it may cause the event to be dispatched to timeout (5 seconds by default). If the Native layer finds that it has timed out, it will notify the Java layer. After some processing, the Java layer will feed back to the Native layer to continue to wait or discard the currently distributed events.

    2.1.4 summary

    The ANR monitoring mechanism consists of three types:

    • Service ANR, the service life cycle in the foreground process cannot exceed 20 seconds, and the service life cycle in the background process cannot exceed 200 seconds. When the service is started, the timing message service is thrown_ TIMEOUT_ MSG or SERVICE_BACKGOURND_TIMEOUT_MSG, if the timing message responds, it indicates that an ANR has occurred

    • Broadcast ANR, the "serial broadcast message" of the foreground must be processed within 10 seconds, and the "serial broadcast message" of the background must be processed within 60 seconds. Each time a serial broadcast message is sent to a receiver, a timing message broadcast will be thrown_ TIMEOUT_ MSG, if the timing message responds, judge whether the broadcast message processing times out. The timeout indicates that an ANR has occurred

    • Input ANR, the input event must be processed within 5 seconds. When sending an input event, it will judge whether the current input event needs to wait. If it needs to wait, it will judge whether the waiting has timed out. The timeout indicates that ANR has occurred

    ANR monitoring mechanism is actually a requirement for the main thread of the application program, which requires the main thread to complete the response to several operations within a limited time; Otherwise, the main thread of the application can be considered unresponsive.

    From the three monitoring mechanisms of ANR, we can see the design of different timeout mechanisms:

    Both Service and Broadcast are scheduled by AMs. Using Handler and Looper, a TIMEOUT message is designed to be processed by AMS thread. The whole TIMEOUT mechanism is implemented in Java layer; Inputevents are scheduled by InputDispatcher, and the input events to be processed will enter the queue to wait. A judgment of waiting TIMEOUT is designed, and the TIMEOUT mechanism is implemented in the Native layer.

    2.2 reporting mechanism of anr

    No matter what type of ANR occurs, AMS.appNotResponding() method will eventually be called, which is called "the same goal by different means". The function of this method is to report the occurrence of ANR to users or developers. The final expression is: pop up a dialog box to tell the user that a program is not responding; Input a large number of ANR related logs to facilitate developers to solve problems.

    We have seen many final forms, but not everyone knows the principle of outputting logs. Let's learn how to output ANR logs.

    final void appNotResponding(ProcessRecord app, ActivityRecord activity,
            ActivityRecord parent, boolean aboveSystem, final String annotation) {
        // app: the process in which ANR currently occurs
        // activity: the interface where ANR occurs
        // parent: the upper level interface of the interface where ANR occurs
        // aboveSystem:
        // annotation: reason for ANR
        ...
        // 1. Update CPU usage information. First CPU information sampling of ANR
        updateCpuStatsNow();
        ...
        // 2. Fill the firstPids and lastPids arrays. Select from last recently used processes:
        //    firstPids is used to save the ANR process and its parent process, system_server process and persistent process (such as Phone process)
        //    lastPids is used to save processes other than firstPids
        firstPids.add(app.pid);
        int parentPid = app.pid;
        if (parent != null && parent.app != null && parent.app.pid > 0)
            parentPid = parent.app.pid;
        if (parentPid != app.pid) firstPids.add(parentPid);
        if (MY_PID != app.pid && MY_PID != parentPid) firstPids.add(MY_PID);
    
    <span class="k">for</span> <span class="o">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="n">mLruProcesses</span><span class="o">.</span><span class="na">size</span><span class="o">()</span> <span class="o">-</span> <span class="mi">1</span><span class="o">;</span> <span class="n">i</span> <span class="o">&gt;=</span> <span class="mi">0</span><span class="o">;</span> <span class="n">i</span><span class="o">--)</span> <span class="o">{</span>
        <span class="n">ProcessRecord</span> <span class="n">r</span> <span class="o">=</span> <span class="n">mLruProcesses</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="n">i</span><span class="o">);</span>
        <span class="k">if</span> <span class="o">(</span><span class="n">r</span> <span class="o">!=</span> <span class="kc">null</span> <span class="o">&amp;&amp;</span> <span class="n">r</span><span class="o">.</span><span class="na">thread</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">)</span> <span class="o">{</span>
            <span class="kt">int</span> <span class="n">pid</span> <span class="o">=</span> <span class="n">r</span><span class="o">.</span><span class="na">pid</span><span class="o">;</span>
            <span class="k">if</span> <span class="o">(</span><span class="n">pid</span> <span class="o">&gt;</span> <span class="mi">0</span> <span class="o">&amp;&amp;</span> <span class="n">pid</span> <span class="o">!=</span> <span class="n">app</span><span class="o">.</span><span class="na">pid</span> <span class="o">&amp;&amp;</span> <span class="n">pid</span> <span class="o">!=</span> <span class="n">parentPid</span> <span class="o">&amp;&amp;</span> <span class="n">pid</span> <span class="o">!=</span> <span class="n">MY_PID</span><span class="o">)</span> <span class="o">{</span>
                <span class="k">if</span> <span class="o">(</span><span class="n">r</span><span class="o">.</span><span class="na">persistent</span><span class="o">)</span> <span class="o">{</span>
                    <span class="n">firstPids</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="n">pid</span><span class="o">);</span>
                <span class="o">}</span> <span class="k">else</span> <span class="o">{</span>
                    <span class="n">lastPids</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">pid</span><span class="o">,</span> <span class="n">Boolean</span><span class="o">.</span><span class="na">TRUE</span><span class="o">);</span>
                <span class="o">}</span>
            <span class="o">}</span>
        <span class="o">}</span>
    <span class="o">}</span>
    <span class="o">...</span>
    <span class="c1">// 3. Print call stack</span>
    <span class="n">File</span> <span class="n">tracesFile</span> <span class="o">=</span> <span class="n">dumpStackTraces</span><span class="o">(</span><span class="kc">true</span><span class="o">,</span> <span class="n">firstPids</span><span class="o">,</span> <span class="n">processCpuTracker</span><span class="o">,</span> <span class="n">lastPids</span><span class="o">,</span>
                <span class="n">NATIVE_STACKS_OF_INTEREST</span><span class="o">);</span>
    <span class="o">...</span>
    <span class="c1">// 4. Update CPU usage information. Second CPU usage information sampling of ANR</span>
    <span class="n">updateCpuStatsNow</span><span class="o">();</span>
    <span class="o">...</span>
    <span class="c1">// 5. Display the ANR dialog box</span>
    <span class="n">Message</span> <span class="n">msg</span> <span class="o">=</span> <span class="n">Message</span><span class="o">.</span><span class="na">obtain</span><span class="o">();</span>
    <span class="n">HashMap</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Object</span><span class="o">&gt;</span> <span class="n">map</span> <span class="o">=</span> <span class="k">new</span> <span class="n">HashMap</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Object</span><span class="o">&gt;();</span>
    <span class="n">msg</span><span class="o">.</span><span class="na">what</span> <span class="o">=</span> <span class="n">SHOW_NOT_RESPONDING_MSG</span><span class="o">;</span>
    <span class="o">...</span>
    <span class="n">mHandler</span><span class="o">.</span><span class="na">sendMessage</span><span class="o">(</span><span class="n">msg</span><span class="o">);</span>
    

    }

    The main logic of this method can be divided into five parts:

    1. Update CPU statistics. This is the first sampling of CPU usage information when ANR occurs. The sampling data will be saved in the mProcessStats variable

    2. Populate the firstPids and lastPids arrays. The current ANR application will be added to firstPids first, so that when printing the function stack, the current process will always be at the front of the trace file

    3. Print the function call stack (StackTrace). The specific implementation is completed by the dumpStackTraces() function

    4. Update CPU statistics. This is the second sampling of CPU usage information when an ANR occurs. The data sampled twice correspond to the CPU usage before and after the ANR occurs

    5. The ANR dialog box is displayed. Throw SHOW_NOT_RESPONDING_MSG message, AMS.MainHandler will process this message and display AppNotRespondingDialog

    Of course, in addition to the main logic, various types of logs will be output when ANR occurs:

    • event log by retrieving "am"_ "ANR" keyword, you can find the application where ANR occurs
    • main log: the ANR information can be found by searching the keyword "ANR in". The context of the log will include the CPU usage
    • dropbox, you can find the information of anr by retrieving the "anr" type
    • traces, function call stack information of each process when ANR occurs

    When we analyze ANR problems, we often start with the CPU usage in the main log and the function call stack in traces. Therefore, updating CPU usage information, updateCpuStatsNow() method and printing function stack dumpStackTraces() method are the key points for the system to report ANR problems.

    2.2.1 CPU usage

    The implementation of AMS.updateCpuStatsNow() method is not listed here. You only need to know that the minimum interval for updating CPU usage information is 5 seconds, that is, if updateCpuStatsNow() method is called continuously within 5 seconds, the CPU usage information is not updated.

    CPU usage information is provided by ProcessCpuTracker This class maintains that every time the ProcessCpuTracker.update() method is called, the file under the device node / proc will be read to update the CPU usage information. The specific dimensions are as follows:

    • CPU usage time: read / proc/stat

      <ul>
        <li>user:  User process CPU Usage time</li>
        <li>nice:  Of processes with reduced priority CPU Usage time. Linux All processes have priority, which can be dynamically adjusted. For example, the value of the initial priority of the process is set to 10,Reduced to 8 during operation,So, the correction value-2 Is defined as nice. 
         Android take user and nice These two times are classified as user</li>
        <li>sys:  Kernel process CPU Usage time</li>
        <li>idle:  CPU Free time</li>
        <li>wait:  CPU wait for IO Time</li>
        <li>hw irq:  Time of hardware interruption. If the peripheral (such as hard disk) fails, it needs to be notified through the hardware terminal CPU Save the scene, and the time of context switching is CPU Hardware interrupt time</li>
        <li>sw irg:  Time of software interruption. As with hardware interrupts, if the software requires CPU Interrupt, the context switching time is CPU Software interrupt time</li>
      </ul>
      
  • CPU load: read / proc/loadavg and count the average number of active processes of CPU in the last 1 minute, 5 minutes and 15 minutes. The CPU load can be compared to the supermarket cashier load. If one person is paying and two people are queuing, the cashier load is 3. When the cashier works, there will be people who pay the bill and queue up. The load can be counted at a fixed time interval (for example, every 5 seconds), so the average load over a period of time can be counted.

  • Page error information: CPU utilization of the process. The last output "faults: xxx minor/major" indicates the number of page errors. When the number is 0, it is not displayed. Major refers to major page fault (MPF). When reading data, the kernel will successively look for the cache and physical memory of the CPU. If it cannot be found, it will send an MPF message to request the data to be loaded into memory. Minor refers to minor page fault (MnPF). After disk data is loaded into memory, the kernel will send a MnPF message when it is read again. When a file is read and written for the first time, there will be a lot of MPFs. After being cached in memory, there will be few MPFs to access again, and there will be more MnPF. This is the result of the caching technology adopted by the kernel to reduce inefficient disk I/O operations.

  • 2.2.2 function call stack

    AMS.dumpStackTraces() method is used to print the function call stack of the process. The main logic of this method is as follows:

    private static void dumpStackTraces(String tracesPath, ArrayList<Integer> firstPids,
                ProcessCpuTracker processCpuTracker, SparseArray<Boolean> lastPids, String[] nativeProcs) {
        ...
        // 1. Send signal to the process in the firstPids array_ QUIT. 
        //    Process received signal at_ After quit, the function call stack is printed
        int num = firstPids.size();
        for (int i = 0; i < num; i++) {
            synchronized (observer) {
                Process.sendSignal(firstPids.get(i), Process.SIGNAL_QUIT);
                observer.wait(200);  // Wait for write-close, give up after 200msec
            }
        }
        ...
        // 2. Print the function call stack of Native process
        int[] pids = Process.getPidsForCommands(nativeProcs);
        if (pids != null) {
            for (int pid : pids) {
                Debug.dumpNativeBacktraceToFile(pid, tracesPath);
            }
        }
        ...
        // 3. Update CPU usage
        processCpuTracker.init();
        System.gc();
        processCpuTracker.update();
        processCpuTracker.wait(500); // measure over 1/2 second.
        processCpuTracker.update();
    
    <span class="c1">// 4. Send signal to the process in lastPids array_ QUIT</span>
    <span class="c1">//    Only the working lastPids process will receive SIGNAL_QUIT, print function call stack</span>
    <span class="kd">final</span> <span class="kt">int</span> <span class="n">N</span> <span class="o">=</span> <span class="n">processCpuTracker</span><span class="o">.</span><span class="na">countWorkingStats</span><span class="o">();</span>
    <span class="kt">int</span> <span class="n">numProcs</span> <span class="o">=</span> <span class="mi">0</span><span class="o">;</span>
    <span class="k">for</span> <span class="o">(</span><span class="kt">int</span> <span class="n">i</span><span class="o">=</span><span class="mi">0</span><span class="o">;</span> <span class="n">i</span><span class="o">&lt;</span><span class="n">N</span> <span class="o">&amp;&amp;</span> <span class="n">numProcs</span><span class="o">&lt;</span><span class="mi">5</span><span class="o">;</span> <span class="n">i</span><span class="o">++)</span> <span class="o">{</span>
    <span class="n">ProcessCpuTracker</span><span class="o">.</span><span class="na">Stats</span> <span class="n">stats</span> <span class="o">=</span> <span class="n">processCpuTracker</span><span class="o">.</span><span class="na">getWorkingStats</span><span class="o">(</span><span class="n">i</span><span class="o">);</span>
    <span class="k">if</span> <span class="o">(</span><span class="n">lastPids</span><span class="o">.</span><span class="na">indexOfKey</span><span class="o">(</span><span class="n">stats</span><span class="o">.</span><span class="na">pid</span><span class="o">)</span> <span class="o">&gt;=</span> <span class="mi">0</span><span class="o">)</span> <span class="o">{</span>
        <span class="n">numProcs</span><span class="o">++;</span>
        <span class="n">Process</span><span class="o">.</span><span class="na">sendSignal</span><span class="o">(</span><span class="n">stats</span><span class="o">.</span><span class="na">pid</span><span class="o">,</span> <span class="n">Process</span><span class="o">.</span><span class="na">SIGNAL_QUIT</span><span class="o">);</span>
        <span class="n">observer</span><span class="o">.</span><span class="na">wait</span><span class="o">(</span><span class="mi">200</span><span class="o">);</span>  <span class="c1">// Wait for write-close, give up after 200msec</span>
    <span class="o">}</span>
    

    }

    This method has several important logic (the function call stack of the Native process is not shown here):

    • Send signal to process_ Quit signal. After receiving this signal, the process will print the function call stack and output it to the / data/anr/traces.txt file by default. Of course, the system attribute dalvik.vm.stack-trace-file can also be configured to specify the location of the output function call stack

    • The traces file contains the function call stacks of many processes, which are controlled by the firstPids and lastPids arrays. In the final traces file, the processes in firstPids are printed first, and the process in which ANR currently occurs is the first in firstPids. Therefore, when we open the traces file, the first thing we see is the application process in which ANR currently occurs

    3. Problem analysis method

    There are three sharp tools for analyzing ANR problems: Logcat, traces and StrictMode. stay StrictMode mechanism In this article, we introduced the implementation mechanism and purpose of StrictMode. This paper does not discuss the use of StrictMode to solve ANR problems, but readers need to be aware of this. stay Watchdog mechanism and problem analysis In this article, we introduced the use of logcat and traces. Like the Watchdog problem, analyzing the ANR problem requires three steps: log acquisition, problem location and scene restoration.

    3.1 log acquisition

    As we analyzed above, the important function of ANR reporting mechanism is to output logs. How to get these logs? See Log acquisition

    3.2 problem location

    By retrieving am in the event log_ ANR keyword, you can find the process in which ANR occurs, such as the following logs:

    10-16 00:48:27 820 907 I am_anr: [0,29533,com.android.systemui,1082670605,Broadcast of Intent { act=android.intent.action.TIME_TICK flg=0x50000114 (has extras) }]
    

    It means that at 10-16 00:48:27, an ANR occurs in the process with PID 29533, and the process name is com.android.systemui.

    Next, you can retrieve the ANR in keyword in the system log to find the CPU usage before and after ANR:

    10-16 00:50:10 820 907 E ActivityManager: ANR in com.android.systemui, time=130090695
    10-16 00:50:10 820 907 E ActivityManager: Reason: Broadcast of Intent { act=android.intent.action.TIME_TICK flg=0x50000114 (has extras) }
    10-16 00:50:10 820 907 E ActivityManager: Load: 30.4 / 22.34 / 19.94
    10-16 00:50:10 820 907 E ActivityManager: Android time :[2015-10-16 00:50:05.76] [130191,266]
    10-16 00:50:10 820 907 E ActivityManager: CPU usage from 6753ms to -4ms ago:
    10-16 00:50:10 820 907 E ActivityManager:   47% 320/netd: 3.1% user + 44% kernel / faults: 14886 minor 3 major
    10-16 00:50:10 820 907 E ActivityManager:   15% 10007/com.sohu.sohuvideo: 2.8% user + 12% kernel / faults: 1144 minor
    10-16 00:50:10 820 907 E ActivityManager:   13% 10654/hif_thread: 0% user + 13% kernel
    10-16 00:50:10 820 907 E ActivityManager:   11% 175/mmcqd/0: 0% user + 11% kernel
    10-16 00:50:10 820 907 E ActivityManager:   5.1% 12165/app_process: 1.6% user + 3.5% kernel / faults: 9703 minor 540 major
    10-16 00:50:10 820 907 E ActivityManager:   3.3% 29533/com.android.systemui: 2.6% user + 0.7% kernel / faults: 8402 minor 343 major
    10-16 00:50:10 820 907 E ActivityManager:   3.2% 820/system_server: 0.8% user + 2.3% kernel / faults: 5120 minor 523 major
    10-16 00:50:10 820 907 E ActivityManager:   2.5% 11817/com.netease.pomelo.push.l.messageservice_V2: 0.7% user + 1.7% kernel / faults: 7728 minor 687 major
    10-16 00:50:10 820 907 E ActivityManager:   1.6% 11887/com.android.email: 0.5% user + 1% kernel / faults: 6259 minor 587 major
    10-16 00:50:10 820 907 E ActivityManager:   1.4% 11854/com.android.settings: 0.7% user + 0.7% kernel / faults: 5404 minor 471 major
    10-16 00:50:10 820 907 E ActivityManager:   1.4% 11869/android.process.acore: 0.7% user + 0.7% kernel / faults: 6131 minor 561 major
    10-16 00:50:10 820 907 E ActivityManager:   1.3% 11860/com.tencent.mobileqq: 0.1% user + 1.1% kernel / faults: 5542 minor 470 major
    ...
    10-16 00:50:10 820 907 E ActivityManager:  +0% 12832/cat: 0% user + 0% kernel
    10-16 00:50:10 820 907 E ActivityManager:  +0% 13211/zygote64: 0% user + 0% kernel
    10-16 00:50:10 820 907 E ActivityManager: 87% TOTAL: 3% user + 18% kernel + 64% iowait + 0.5% softirq
    

    This log is too familiar for Android developers. It contains a huge amount of information:

    • The time when the ANR occurred. In the event log, the time of ANR is 00:48:27. Because AMS.appNotResponding() will print the event log first and then the system log, the time of finding ANR in the system log is 00:50:10. You can restore the running state of the system when ANR occurs from the logs before this point in time

    • The process that prints the ANR log. ANR logs are in the system_ Printed by AMS thread of server process, 820 and 907 can be seen in event log and system log, so system_ The PID of server is 802 and the TID of AMS thread is 907. The ANR monitoring mechanism is implemented in the AMS thread. To analyze some ANRS affected by the system, you need to know the system_ Running status of the server process

    • The process in which ANR occurred. The ANR in keyword indicates that the current ANR process is com.android.system.ui. Through the event log, we know that the PID of the process is 29533

    • Causes of ANR. The Reason keyword indicates that the Reason for the current anr is to process TIME_TICK broadcast message timed out. The implied meaning is TIME_TICK is a serial broadcast message. In the main thread of 29533, the BroadcastReceiver.onReceive() method has been executed for more than 10 seconds

    • CPU Load. The Load keyword indicates that the CPU Load in the last 1 minute, 5 minutes and 15 minutes is 30.4, 22.3 and 19.94 respectively. The Load of the CPU in the last minute is the most valuable reference, because the timeout limit of ANR is basically within 1 minute, which can be roughly understood as that the CPU has an average of 30.4 tasks to process in the last minute, and this Load value is relatively high

    • CPU usage statistics time period. The CPU usage from XX to XX ago keyword indicates that this is the CPU statistics for a period of time before ANR occurs. Similarly, the CPU usage from XX to XX after keyword indicates the CPU statistics within a period of time after ANR occurs

    • CPU usage of each process. Let's take the CPU utilization of the com.android.systemui process as an example, which contains the following information:

      <ul>
        <li>
          <p>Overall CPU Utilization rate: 3.3%,among systemui Process in user state CPU The utilization rate is 2.6%,The utilization rate in kernel state is 0.7%</p>
        </li>
        <li>
          <p>Page missing times fault: <strong>8402 minor</strong>Indicates the number of page misses in the cache,<strong>343 major</strong>Indicates the number of page misses in memory. minor It can be understood that the process is accessing memory, major It can be understood that the process is doing IO Operation.
      

    The current minor and major values are relatively high, which reflects that before ANR, the systemui process has more memory access operations and causes more IO times




  • "+" before CPU utilization. There is a "+" sign in front of the CPU utilization of some processes, such as cat and zygote64, which indicates that these processes have not been run in the time segment of the last CPU statistics, and these processes have been run in the time segment of the CPU statistics this time.
    Similarly, there is a "-" sign, which indicates that these processes die out when the CPU counts time segments twice



  • CPU Usage Summary. The TOTAL keyword indicates the summary of CPU usage. 87% is the TOTAL CPU usage. One iowait indicates that the CPU is waiting for IO, accounting for 64%, indicating that there are a large number of IO operations before ANR. app_process, system_ The major values of server, com.android.systemui and other processes are relatively large, indicating that these processes have frequent IO operations, which increases the overall iowait time

  • The amount of information is so huge that we have to draw a conclusion: the CPU spends a lot of time waiting for IO, resulting in that the systemui process cannot allocate CPU time, so the main thread timed out processing broadcast messages and ANR occurred.

    For a rigorous developer, this conclusion is a little early because there are too many questions:

    • The systemui process is also allocated some CPU time (3.3%). Can't the BroadcastReceiver.onReceive() method be executed all the time?

    • Why does iowait take so much time, and the major values of multiple processes are very high?

    Next, you still need to restore the scene of ANR from other logs.

    3.3 scenario restoration

    3.3.1 first hypothesis and verification

    With the first question raised above, let's make a hypothesis: if the systemui process is executing the BroadcatReceiver.onReceive() method, you should see that the function call stack of the main thread is executing this method from the traces.txt file.

    Next, we first find the function call stack information of sysemtui process when ANR occurs (00:48:27) from the traces file.

    ----- pid 29533 at 2015-10-16 00:48:06 -----
    Cmd line: com.android.systemui
    

    DALVIK THREADS (53):
    "main" prio=5 tid=1 Native
    | group="main" sCount=1 dsCount=0 obj=0x75bd5818 self=0x7f8549a000
    | sysTid=29533 nice=0 cgrp=bg_non_interactive sched=0/0 handle=0x7f894bbe58
    | state=S schedstat=( 288625433917 93454573244 903419 ) utm=20570 stm=8292 core=3 HZ=100
    | stack=0x7fdffda000-0x7fdffdc000 stackSize=8MB
    | held mutexes=
    native: #00 pc 00060b0c /system/lib64/libc.so (__epoll_pwait+8)
    native: #01 pc 0001bb54 /system/lib64/libc.so (epoll_pwait+32)
    native: #02 pc 0001b3d8 /system/lib64/libutils.so (android::Looper::pollInner(int)+144)
    native: #03 pc 0001b75c /system/lib64/libutils.so (android::Looper::pollOnce(int, int*, int*, void**)+76)
    native: #04 pc 000d7194 /system/lib64/libandroid_runtime.so (android::NativeMessageQueue::pollOnce(_JNIEnv*, int)+48)
    at android.os.MessageQueue.nativePollOnce(Native method)
    at android.os.MessageQueue.next(MessageQueue.java:148)
    at android.os.Looper.loop(Looper.java:151)
    at android.app.ActivityThread.main(ActivityThread.java:5718)
    at java.lang.reflect.Method.invoke!(Native method)
    at java.lang.reflect.Method.invoke(Method.java:372)
    at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:975)
    at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:770)

    ----- pid 29533 at 2015-10-16 00:48:29 -----
    Cmd line: com.android.systemui

    DALVIK THREADS (54):
    "main" prio=5 tid=1 Blocked
    | group="main" sCount=1 dsCount=0 obj=0x75bd5818 self=0x7f8549a000
    | sysTid=29533 nice=0 cgrp=bg_non_interactive sched=0/0 handle=0x7f894bbe58
    | state=S schedstat=( 289080040422 93461978317 904874 ) utm=20599 stm=8309 core=0 HZ=100
    | stack=0x7fdffda000-0x7fdffdc000 stackSize=8MB
    | held mutexes=
    at com.mediatek.anrappmanager.MessageLogger.println(SourceFile:77)

    • waiting to lock <0x26b337a3> (a com.mediatek.anrappmanager.MessageLogger) held by thread 49
      at android.os.Looper.loop(Looper.java:195)
      at android.app.ActivityThread.main(ActivityThread.java:5718)
      at java.lang.reflect.Method.invoke!(Native method)
      at java.lang.reflect.Method.invoke(Method.java:372)
      at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:975)
      at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:770)
      ...
      "Binder_5" prio=5 tid=49 Native
      | group="main" sCount=1 dsCount=0 obj=0x136760a0 self=0x7f7e453000
      | sysTid=6945 nice=0 cgrp=default sched=0/0 handle=0x7f6e3ce000
      | state=S schedstat=( 5505571091 4567508913 30743 ) utm=264 stm=286 core=4 HZ=100
      | stack=0x7f6b83f000-0x7f6b841000 stackSize=1008KB
      | held mutexes=
      native: #00 pc 00019d14 /system/lib64/libc.so (syscall+28)
      native: #01 pc 0005b5d8 /system/lib64/libaoc.so (???)
      native: #02 pc 002c6f18 /system/lib64/libaoc.so (???)
      native: #03 pc 00032c40 /system/lib64/libaoc.so (???)
      at libcore.io.Posix.getpid(Native method)
      at libcore.io.ForwardingOs.getpid(ForwardingOs.java:83)
      at android.system.Os.getpid(Os.java:176)
      at android.os.Process.myPid(Process.java:754)
      at com.mediatek.anrappmanager.MessageLogger.dump(SourceFile:219)
    • locked <0x26b337a3> (a com.mediatek.anrappmanager.MessageLogger)
      at com.mediatek.anrappmanager.ANRAppManager.dumpMessageHistory(SourceFile:65)
      at android.app.ActivityThread$ApplicationThread.dumpMessageHistory(ActivityThread.java:1302)
      at android.app.ApplicationThreadNative.onTransact(ApplicationThreadNative.java:682)
      at android.os.Binder.execTransact(Binder.java:451)

    Finally, we found two function call stacks near the ANR time (00:48:27) of the systemui process:

    1. Before the ANR occurs (00:48:06), the function call stack of the main thread is in a normal state: the message is in the message queue and the message is processed in the loop

    2. Two seconds after the ANR occurs (00:48:29), the main thread is in the Blocked state, waiting for a lock held by thread 49. Thread 49 is a Binder thread, and anrappmanager is doing dump operation.

    The logs analyzed by the author are generated by MTK platform, so we can see the class com.mediatek.anrappmanager.MessageLogger from the function call stack. It is an extension of MTK on AOSP and is used to print ANR logs.

    So far, we have found the direct cause of ANR in the systemui process. The systemui process is printing traces, and there are long-term IO operations, resulting in the main thread blocking and unable to process TIME_TICK broadcast message, so ANR occurred.

    To avoid ANR in this scenario, we need to break the Blocked logic in the main thread. In fact, this example is because MTK extends the function of printing message queue in android.os.Looper.loop() of AOSP. This function has design defects, which will lead to lock waiting.

    3.3.2 second hypothesis and verification

    We further explore the reason why we are printing traces before ANR occurs in the systemui. With the second question raised above, let's make another assumption: iowait is high, and the major of multiple processes is high. It may be because AMS.dumpStackTraces() method is being called, and many processes need to write their own function call stack to the traces file, so IO will be high. If AMS. Dumpstacktraces() method is currently being called, it indicates that an exception has occurred in the system at that time, either an ANR or an SNR has occurred

    From the event log, we retrieved another ANR:

    10-16 00:47:58 820 907 I am_anr  : [0,10464,com.android.settings,1086864965,Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it.  Outbound queue length: 0.  Wait queue length: 1.)]
    

    At 00:47:58, an ANR occurs in the com.android.settings process, and the ANR time is before systemui (00:48:27). At this point, we can find the evidence. It is precisely because the settings process has an ANR first and calls AMS.dumpStackTraces(), so many processes begin to print traces. Therefore, the whole iowait of the system is relatively high, the major value of a large number of processes is also relatively high, and the systemui is in its column. Under the influence of MTK logic, printing the ANR log will cause the main thread to block, which will trigger the ANR of other applications.

    In the system log, we retrieved the CPU usage information of the settings process ANR:

    10-16 00:48:12 820 907 E ActivityManager: ANR in com.android.settings (com.android.settings/.SubSettings), time=130063718
    10-16 00:48:12 820 907 E ActivityManager: Reason: Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it.  Outbound queue length: 0.  Wait queue length: 1.)
    10-16 00:48:12 820 907 E ActivityManager: Load: 21.37 / 19.25 / 18.84
    10-16 00:48:12 820 907 E ActivityManager: Android time :[2015-10-16 00:48:12.24] [130077,742]
    10-16 00:48:12 820 907 E ActivityManager: CPU usage from 0ms to 7676ms later:
    10-16 00:48:12 820 907 E ActivityManager:   91% 820/system_server: 16% user + 75% kernel / faults: 13192 minor 167 major
    10-16 00:48:12 820 907 E ActivityManager:   3.2% 175/mmcqd/0: 0% user + 3.2% kernel
    10-16 00:48:12 820 907 E ActivityManager:   2.9% 29533/com.android.systemui: 2.3% user + 0.6% kernel / faults: 1352 minor 10 major
    10-16 00:48:12 820 907 E ActivityManager:   2.2% 1736/com.android.phone: 0.9% user + 1.3% kernel / faults: 1225 minor 1 major
    10-16 00:48:12 820 907 E ActivityManager:   2.2% 10464/com.android.settings: 0.7% user + 1.4% kernel / faults: 2801 minor 105 major
    10-16 00:48:12 820 907 E ActivityManager:   0% 1785/com.meizu.experiencedatasync: 0% user + 0% kernel / faults: 3478 minor 2 major
    10-16 00:48:12 820 907 E ActivityManager:   1.8% 11333/com.meizu.media.video: 1% user + 0.7% kernel / faults: 3843 minor 89 major
    10-16 00:48:12 820 907 E ActivityManager:   1.5% 332/mobile_log_d: 0.5% user + 1% kernel / faults: 94 minor 1 major
    10-16 00:48:12 820 907 E ActivityManager:   1% 11306/com.meizu.media.gallery: 0.7% user + 0.2% kernel / faults: 2204 minor 55 major
    ...
    10-16 00:48:12 820 907 E ActivityManager:  +0% 11397/sh: 0% user + 0% kernel
    10-16 00:48:12 820 907 E ActivityManager:  +0% 11398/app_process: 0% user + 0% kernel
    10-16 00:48:12 820 907 E ActivityManager: 29% TOTAL: 5.1% user + 15% kernel + 9.5% iowait + 0% softirq
    

    We will not repeat the specific meaning, but only focus on the reasons for ANR:

    Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it.
    Outbound queue length: 0. Wait queue length: 1.)

    The previous analysis of the Input ANR mechanism has been used for a long time. We can easily know the reason for this ANR. Wait queue length: 1 indicates that the previous input event has been sent to the Settings process, but the Settings process has not finished processing. The new KeyEvent event has been waiting for more than 5 seconds, so anr is generated.

    Next, we need to find the traces of Settings and analyze the reason why the main thread of Settings timed out processing input events. Let's stop.

    4. Summary

    This paper makes an in-depth analysis of Android ANR mechanism:

    • Anr monitoring mechanism, starting from the source code implementation of three different anr monitoring mechanisms: Service, Broadcast and InputEvent, this paper analyzes how Android finds all kinds of ANRS. When starting the Service, sending Broadcast messages and entering events, implant timeout detection to discover ANR

    • ANR reporting mechanism to analyze how Android outputs ANR logs. After the ANR is discovered, two important log outputs are CPU usage and function call stack of the process. These two types of logs are our sharp tools to solve the ANR problem

    • The solution of ANR, through a case, deeply interprets the ANR log, and combs the ideas and ways to analyze the ANR problem

    Finally, to all readers, start from the log to solve the ANR problem, understand the implementation principle behind the ANR mechanism, and don't panic when encountering the most difficult ANR problem.

            </article>
            <hr class="boundary">
            <p class="post-tag">
                <span class="octicon octicon-list-unordered">&nbsp;More reading</span>
            </p>
                        <div class="content-navigation-list">
                <ul>
                    
    
                    
    
                    
                </ul>
            </div>
        </div>
        <div class="pad-min"></div>
        <div id="post-comment" class="sheet post v" data-class="v"><div class="vpanel"><div class="vwrap"><p class="cancel-reply text-right" style="display:none;" title="Cancel reply"><svg class="vicon cancel-reply-btn" viewBox="0 0 1024 1024" version="1.1" xmlns="http://www.w3.org/2000/svg" p-id="4220" width="22" height="22"><path d=" M796.454 985H227.545c-50.183 0-97.481-19.662-133.183-55.363-35.7-35.701-55.362-83-55.362-133.183V227.545c0-50.183 19.662-97.481 55.363-133.183 35.701-35.7 83-55.362 133.182-55.362h568.909c50.183 0 97.481 19.662 133.183 55.363 35.701 35.702 55.363 83 55.363 133.183v568.909c0 50.183-19.662 97.481-55.363 133.183S846.637 985 796.454 985zM227.545 91C15 2.254 91 91 152.254 91 227.545v568.909C91 871.746 152.254 933 227.545 933h568.909C871.746 933 933 871.746 933 796.454V227.545C933 152.254 871.746 91 796.454 91H227.545z" p-id="4221"></path><path d=" M568.569 512l170.267-170.267c15.556-15.556 15.556-41.012 0-56.569s-41.012-15.556-56.569 0L512 455.431 341.733 285.165c-15.556-15.556-41.012-15.556-56.569 0s-15.556 41.012 0 56.569L455.431 512 285.165 682.267c-15.556 15.556-15.556 41.012 0 56.569 15.556 15.556 41.012 15.556 56.569 0L512 568.569l170.267 170.267c15.556 15.556 41.012 15.556 56.569 0 1 5.556-15.556 15.556-41.012 0-56.569l568.569 512z "p-id =" 4222 "> < / Path > < / SVG > < / P > < div class =" vheader Item3 "> < input name =" Nick "placeholder =" nickname "class =" vnick vinput "type =" text "> < input name =" mail "placeholder =" email "class =" Vmail vinput "type =" email "> < input name =" link "placeholder =" website (http: / /) "class =" VLink vinput "type =" text "> < / div > < div class =" VEDIT "> < textarea id =" veditor "class =" veditor vinput "placeholder =" come on, have fun! "> < / textarea > < div class =" vrow "> < div class =" VCOL vcol-60 status bar "> < / div > < div class =" vcol-40 Vctrl text right "> < span title =" expression "class =" vicon vemoji BTN "> < SVG viewbox =" 0 0 1024 1024 "version =" 1.1 "xmlns =" http://www.w3.org/2000/svg "  p-id="16172" width="22" height="22"><path d="M512 1024a512 512 0 1 1 512-512 512 512 0 0 1-512 512zM512 56.888889a455.111111 455.111111 0 1 0 455.111111 455.111111 455.111111 455.111111 0 0 0-455.111111-455.111111zM312.888889 512A85.333333 85.333333 0 1 1 398.222222 426.666667 85.333333 85.333333 0 0 1 312.888889 512z" p-id="16173" ></path><path d="M512 768A142.222222 142.222222 0 0 1 369.777778 625.777778a28.444444 28.444444 0 0 1 56.888889 0 85.333333 85.333333 0 0 0 170.666666 0 28.444444 28.444444 0 0 1 56.888889 0A142.222222 142.222222 0 0 1 512 768z" p-id="16174"></path><path d= "M782.222222 391.964444l-113.777778 59.733334a29.013333 29.013333 0 0 1-38.684444-10.808889 28.444444 28.444444 0 0 1 10.24-38.684445l113.777778-56.888888a28.444444 28.444444 0 0 1 38.684444 10.24 28.444444 28.444444 0 0 1-10.24 36.408888z" p-id="16175"></path><path d=" M640.568889 451.697778l113.777778 56.888889a27.875556 27.875556 0 0 0 38.684444-10.24 27.875556 27.875556 0 0 0-10.24-38.684445l-113.777778-56.888889a28.44444444 28.444444 0 0 0-38.684444 10.808889 28.444444 28.4444 0 0 0 0 10.24 38.115556z "p-id =" 16176 "> < / Path > < / SVG > < / span > < span title =" Preview "class =" vicon vpreview BTN "> < SVG viewbox =" 0 1024 0 " version="1.1" xmlns=" http://www.w3.org/2000/svg " p-id="17688" width="22" height="22"><path d= " M502.390154 935.384615a29.538462 29.538462 0 1 1 0 59.076923H141.430154C79.911385 994.461538 29.538462 946.254769 29.538462 886.153846V137.846154C29.538462 77.745231 79.950769 29.538462 141.390769 29.538462h741.218462c61.44 0 111.852308 48.206769 111.852307 108.307692v300.268308a29.538462 29.538462 0 1 1-59.076923 0V137.846154c0-26.899692-23.35507 7-49.230769-52.775384-49.230769H141.390769c-29.420308 0-52.775385 22.331077-52.775384 49.230769v748.307692c0 26.899692 23.355077 49.230769 52.775384 49.230769h360.999385z" p-id="17689"></path><path d=" M196.923077 216.615385m29.538461 0l374.153847 0q29.538462 0 29.538461 29.538461l0 0q0 29.538462-29.538461 29.538462l-374.153847 0q-29.538462 0-29.538461-29.538462l0 0q0-29.538462 29.538461-29.538461Z" p-id="17690"></path><path d=" M649.846154 846.769231a216.615385 216.615385 0 1 0 0-433.230769 216.615385 216.615385 0 0 0 0 433.230769z m0 59.076923a275.692308 275.692308 0 1 1 0-551.384616 275.692308 275.692308 0 0 1 0 551.384616z" p-id="17691"></path><path d=" M807.398383 829.479768m20.886847-20.886846l0 0q20.886846-20.886846 41.773692 0l125.321079 125.321079q20.886846 20.886846 0 41.773693l0 0q-20.886846 20.886846-41.773693 0l-125.321078-125.321079q-20.886846-20.886846 0-41.773693Z" p-id="17692"></path></svg></span></div></div></div><div class="vrow"><div class="vcol vcol-30"><a alt=" Markdown is supported" href=" https://guides.github.com/features/mastering-markdown/ " class="vicon" target="_ blank"><svg class="markdown" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M14.85 3H1.15C.52 3 0 3.52 0 4.15v7.69C0 12.48.52 13 1.15 13h13.69c.64 0 1.15-.52 1.15-1.15v-7.7C16 3.52 15.48 3 14.85 3zM9 11H7V8L5.5 9.92 4 8v3H2V5h2l1.5 2L7 5h2v6zm2.99.5L9.5 8H11V5h2v3h1.5l-2.51 3.5z The < < Path > < < < < < < path / SVG < < < < < < < < < < < < < < < path path < < < < < < < < < < < < < < < < < < < < < < < < < < < path path path < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < path path path < < < < < < < < < < < < < < < < < < < < < SV / SV '' ''s < < < < < < < < < < < < < div > < div > < div > < div''s a video class > < where where it's nothing nothing nothing nothing in the text it ''s like your way way it' '' '''s been your path '' '' '' '' '' '' '' '' '' '''s been been your way way it '' '' ''s like it' '' '''s like it ''s a"vcount" style= "display:none; "> < span class =" vnum "> 0 < / span > comments < / div > < div class =" vload top text center "style =" display: none; "><i class="vspinner" style="width:30px; height:30px; "></i></div><div class="vcards"></div><div class="vload-bottom text-center" style="display:none; "><i class="vspinner" style="width:30px; height:30px; "></i></div><div class="vempty" style="display: block; "><pre style="text-align:left; "> code 504: the app is archived, please restore in console before use. < / pre > < / div > < div class =" vpage TXT center "style =" display: None "> < button type =" button "class =" vmore vbtn "> load more... < / button > < / div > < div class =" VPOWER TXT right "> powered by < a href =" " https://valine.js.org " target="_ blank">Valine</a><br>v1.4.14</div>
    
    
    
    
        </div>
    </div>
    

Posted by figuringout on Thu, 18 Nov 2021 21:52:14 -0800