Talk about Service Timeout in ANR

Keywords: Android Java Google

1. Core Source

Key Classes Path (/frameworks/base/)
ActiveServices.java services/core/java/com/android/server/am/ActiveServices.java
ActivityManagerService.java services/core/java/com/android/server/am/ActivityManagerService.java
AppErrors.java services/core/java/com/android/server/am/AppErrors.java


2. ANR Basic Cognition

What is 2.1 ANR?

ANR(Application Not Responding), an application that is unresponsive and simple in definition, covers many of the design ideas of Android systems.

First, ANR falls into the category of applications, which is different from SNR(System Not Respoding), which reflects the problem that system_server has lost its responsiveness and ANR explicitly pins the problem on the application.SNR is guaranteed by Watchdog mechanism and ANR is guaranteed by message processing mechanism. Android implements a set of sophisticated mechanisms at the system level to discover ANR. The core principle is message scheduling and timeout processing.

Secondly, the main body of ANR mechanism is implemented at the system level.All ANR-related messages are dispatched by the system_server process, which is then dispatched to the application process to complete the actual processing of the messages. At the same time, the system process designs different timeout limits to track the processing of the messages.Once the application processes the message incorrectly, the timeout limit works by collecting system states such as CPU/IO usage, process function call stack, and reporting whether the user has a process responded (ANR dialog box).

Then, the ANR problem is essentially a performance problem.The ANR mechanism actually restricts the main thread of the application by requiring the main thread to complete some of the most common operations (start the service, process broadcasts, process inputs) within a limited time frame and, if processing times out, assumes that the main thread has lost its ability to respond to other operations.Time-consuming operations in the main thread, such as intensive CPU operations, large amounts of IO, complex interface layouts, and so on, can reduce the responsiveness of applications.

Finally, some ANR problems are difficult to analyze. Sometimes, due to some influence at the bottom of the system, message scheduling fails and the scenario with problems is difficult to reproduce.This kind of ANR problem often takes a lot of time to understand some behaviors of the system, which is beyond the scope of the ANR mechanism itself.

2.2 ANR mechanism

Analyzing some initial ANR problems requires a simple understanding of the logs of the final output, but for some ANR problems caused by system problems (e.g., excessive CPU load, process jam), an understanding of the entire ANR mechanism is required to identify the cause of the problem.

The ANR mechanism can be divided into two parts:

Monitoring ANR: Android has a monitoring mechanism for different ANR types (Broadcast, Service, InputEvent).

ANR report: After ANR is monitored, ANR dialogs, output logs (stack of process function calls, CPU usage, etc. when ANR occurs) need to be displayed.

2.3 Reason for triggering ANR

As we have said before, an intuitive phenomenon after the appearance of ANR is that the system will display an ANR dialog box.

The reason for the ANR in Google documentation is described as follows:

Applications in Android systems are monitored by two system services, ActivityManagerService and WindowManagerService. The system will show ANR dialogs in the following two situations!

KeyDispatchTimeout: KeyDispatchTimeout: KeyDispatchTimeout keys or touch events do not respond within a specified time.
BroadcastTimeout (10 seconds): BroadcastReceiver cannot process completion for a specific time.
ServiceTimeout (20 seconds): Service cannot process completion within a specified time.


3. Service timeout monitoring mechanism

Service runs on the main thread of the application and causes an ANR if it takes longer than 20 seconds to execute.

When a Service ANR occurs, you can generally first check whether time-consuming operations, such as complex operations, IO operations, are performed in Service's life cycle functions.If the application's code logic is not faulty, you need to examine the current system's state: CPU usage, system service status, and so on, to determine if the ANR process was affected by the system's abnormal operation at that time.

So how does the system detect Service timeouts?Android is achieved by setting a timer message.Timed messages are processed by AMS's message queue, which has context information about how the service runs, so it is reasonable to set up a timeout detection mechanism in AMS.

The Service ANR mechanism is relatively simple and the principal is implemented in ActiveServices.

In the Service startup process, the realStartServiceLocked() method is called after the Service process attach es to the system_server process.

3.1 realStartServiceLocked

// frameworks/base/services/core/java/com/android/server/am/ActiveServices.java

public final class ActiveServices {

    private final void realStartServiceLocked(ServiceRecord r,
            ProcessRecord app, boolean execInFg) throws RemoteException {

        // Send a delay message (SERVICE_TIMEOUT_MSG)
        bumpServiceExecutingLocked(r, execInFg, "create");

        boolean created = false;
        try {
            
            // onCreate() method that ultimately executes the service
            app.thread.scheduleCreateService(r, r.serviceInfo, mAm.
                compatibilityInfoForPackageLocked(r.serviceInfo.applicationInfo),
                app.repProcState);

            ... ...
        }
    }
    
}

3.2 bumpServiceExecutingLocked

private final void bumpServiceExecutingLocked(...) {

    scheduleServiceTimeoutLocked(r.app);
    
}

3.3 scheduleServiceTimeoutLocked

void scheduleServiceTimeoutLocked(ProcessRecord proc) {
    if (proc.executingServices.size() == 0 || proc.thread == null) {
        return;
    }
    Message msg = mAm.mHandler.obtainMessage(
            ActivityManagerService.SERVICE_TIMEOUT_MSG);
    msg.obj = proc;
    // When the SERVICE_TIMEOUT_MSG message has not been remove d after the timeout,
    // Throw a timed message through AMS.MainHandler.
    mAm.mHandler.sendMessageDelayed(msg,
            proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT);
}

The above method throws a timed message SERVICE_TIMEOUT_MSG through AMS.MainHandler.

3.4 serviceDoneExecutingLocked

Service execution in foreground process, timeout is SERVICE_TIMEOUT (20 seconds)

    // How long we wait for a service to finish executing.
    static final int SERVICE_TIMEOUT = 20*1000;

Service execution in background process, timeout is SERVICE_BACKGROUND_TIMEOUT (200 seconds)

    // How long we wait for a service to finish executing.
    static final int SERVICE_BACKGROUND_TIMEOUT = SERVICE_TIMEOUT * 10;

When the Service's life cycle ends (no ANR), the serviceDoneExecutingLocked() method is called, in which the previously thrown SERVICE_TIMEOUT_MSG message is cleared.

void serviceDoneExecutingLocked(ServiceRecord r, int type, int startId, int res) {
    boolean inDestroying = mDestroyingServices.contains(r);
    if (r != null) {
        ... ...     
        serviceDoneExecutingLocked(r, inDestroying, inDestroying);
    }
}

private void serviceDoneExecutingLocked(ServiceRecord r, boolean inDestroying,
          boolean finishing) {  
    ... ...
    if (r.executeNesting <= 0) {
        if (r.app != null) {    
            ... ...
            // Clear SERVICE_TIMEOUT_MSG message if there is no service executing in the current service's process
            if (r.app.executingServices.size() == 0) {
                mAm.mHandler.removeMessages(
                             ActivityManagerService.SERVICE_TIMEOUT_MSG, r.app);            
                ... ...
            }
    ... ...
}

3.5 handleMessage

What if Remove doesn't drop SERVICE_TIMEOUT_MSG?Next let's look at the processing logic for ANR.

There is a Handler thread called ActivityManager in the system_server process.

If the SERVICE_TIMEOUT_MSG is not cleared within the timeout period, a message SERVICE_TIMEOUT_MSG is sent to the Handler thread.

final class MainHandler extends Handler {
    ... ...

    @Override
    public void handleMessage(Message msg) {
        switch (msg.what) {
            ... ...
            
            case SERVICE_TIMEOUT_MSG: {
                mServices.serviceTimeout((ProcessRecord)msg.obj);
            } break;
        ... ...
    }
}

3.6 serviceTimeout

void serviceTimeout(ProcessRecord proc) {
    String anrMessage = null;

    synchronized(mAm) {
        ... ...
        
        long nextTime = 0;

        // Find a Service that runs out of time
        for (int i = proc.executingServices.size() - 1; i >= 0; i--) {
            ServiceRecord sr = proc.executingServices.valueAt(i);
            if (sr.executingStart < maxTime) {
                timeout = sr;
                break;
            }
            if (sr.executingStart > nextTime) {
                nextTime = sr.executingStart;
            }
        }

        // Determines whether the process executing the Service timeout has recently run the list of processes, and if not, ignores this ANR
        if (timeout != null && mAm.mLruProcesses.contains(proc)) {
            Slog.w(TAG, "Timeout executing service: " + timeout);
            StringWriter sw = new StringWriter();
            PrintWriter pw = new FastPrintWriter(sw, false, 1024);
            pw.println(timeout);
            timeout.dump(pw, "    ");
            pw.close();
            mLastAnrDump = sw.toString();
            mAm.mHandler.removeCallbacks(mLastAnrDumpClearer);
            mAm.mHandler.postDelayed(mLastAnrDumpClearer, 
                                           LAST_ANR_LIFETIME_DURATION_MSECS);
            anrMessage = "executing service " + timeout.shortName;
        ... ...
    }

    if (anrMessage != null) {
        // appNotResponding is executed when a timeout service exists
        mAm.mAppErrors.appNotResponding(proc, null, null, false, anrMessage);
    }
}

The above method finds a Service whose current process has timed out, after some judgment, decides to report ANR and eventually calls the AMS.appNotResponding() method.

At this point, the ANR mechanism has completed the monitoring report task, and the remaining task is the output of ANR results, which we call the ANR reporting mechanism.The reporting mechanism for ANR is accomplished through AMS.appNotResponding(), which is also ultimately called by ANRs of Broadcast and InputEvent types.


4. ANR information collection process

Next let's look at the information collection process for Android ANR!

4.1 appNotResponding

// frameworks/base/services/core/java/com/android/server/am/AppErrors.java

class AppErrors {

    final void appNotResponding(ProcessRecord app, ActivityRecord activity,
            ActivityRecord parent, boolean aboveSystem, final String annotation) {
        ... ...

        long anrTime = SystemClock.uptimeMillis();
        if (ActivityManagerService.MONITOR_CPU_USAGE) {
            mService.updateCpuStatsNow();   // Update cpu statistics
        }

        boolean showBackground = Settings.Secure.
                getInt(mContext.getContentResolver(),
                           Settings.Secure.ANR_SHOW_BACKGROUND, 0) != 0;

        boolean isSilentANR;

        synchronized (mService) {
            if (mService.mShuttingDown) {
                return;
            } else if (app.notResponding) {
                return;
            } else if (app.crashing) {
                return;
            } else if (app.killedByAm) {
                return;
            } else if (app.killed) {
                return;
            }

            // In case we come through here for the same app before completing
            // this one, mark as anring now so we will bail out.
            app.notResponding = true;

            // Record ANR to EventLog
            EventLog.writeEvent(EventLogTags.AM_ANR, app.userId, app.pid,
                    app.processName, app.info.flags, annotation);

            // Add current process to firstPids
            firstPids.add(app.pid);

            // Don't dump other PIDs if it's a background ANR
            isSilentANR = !showBackground 
                                  && !isInterestingForBackgroundTraces(app);
            if (!isSilentANR) {
                int parentPid = app.pid;
                if (parent != null && parent.app != null && parent.app.pid > 0) {
                    parentPid = parent.app.pid;
                }
                if (parentPid != app.pid) firstPids.add(parentPid);

                // Add system_server process to firstPids
                if (MY_PID != app.pid 
                                && MY_PID != parentPid) firstPids.add(MY_PID);

                for (int i = mService.mLruProcesses.size() - 1; i >= 0; i--) {
                    ProcessRecord r = mService.mLruProcesses.get(i);
                    if (r != null && r.thread != null) {
                        int pid = r.pid;
                        if (pid > 0 && pid != app.pid 
                                       && pid != parentPid && pid != MY_PID) {
                            if (r.persistent) {
                                // Add persistent process to firstPids
                                firstPids.add(pid);
                            } else if (r.treatLikeActivity) {
                                firstPids.add(pid);
                            } else {
                                // Additional processes added to lastPids
                                lastPids.put(pid, Boolean.TRUE);
                            }
                        }
                    }
                }
            }
        }

        // Record ANR output to main log
        StringBuilder info = new StringBuilder();
        info.setLength(0);
        info.append("ANR in ").append(app.processName);
        if (activity != null && activity.shortComponentName != null) {
            info.append(" (").append(activity.shortComponentName).append(")");
        }
        info.append("\n");
        info.append("PID: ").append(app.pid).append("\n");
        if (annotation != null) {
            info.append("Reason: ").append(annotation).append("\n");
        }
        if (parent != null && parent != activity) {
            info.append("Parent: ").append(parent.shortComponentName).append("\n");
        }

        // Create CPU tracker object 
        ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true);

        ... ...

        // Output traces Information
        File tracesFile = ActivityManagerService.dumpStackTraces(
                true, firstPids,
                (isSilentANR) ? null : processCpuTracker,
                (isSilentANR) ? null : lastPids,
                nativePids);

        String cpuInfo = null;
        if (ActivityManagerService.MONITOR_CPU_USAGE) {
            mService.updateCpuStatsNow();
            synchronized (mService.mProcessCpuTracker) {
                cpuInfo = mService.mProcessCpuTracker.printCurrentState(anrTime);
            }
            // Log current CPU load
            info.append(processCpuTracker.printCurrentLoad());
            info.append(cpuInfo);
        }

        // Record Cpu usage from anr time
        info.append(processCpuTracker.printCurrentState(anrTime));

        // Output reason of current ANR, CPU usage, load information
        Slog.e(TAG, info.toString());
        if (tracesFile == null) {
            Process.sendSignal(app.pid, Process.SIGNAL_QUIT);
        }
        ... ...
                        
        // Save traces files and CPU usage information to dropbox, the data/system/dropbox directory
        mService.addErrorToDropBox("anr", app, app.processName,
                          activity, parent, annotation, cpuInfo, tracesFile, null);
        ... ...

        synchronized (mService) {
            mService.mBatteryStatsService.noteProcessAnr(app.processName, app.uid);

            // Background ANR situation, kill directly
            if (isSilentANR) {
                app.kill("bg anr", true);
                return;
            }

            // Set the ANR status of app, disease query error report receiver
            makeAppNotRespondingLocked(app,
                    activity != null ? activity.shortComponentName : null,
                    annotation != null ? "ANR " + annotation : "ANR",
                    info.toString());

            // Pop-up ANR dialog box
            Message msg = Message.obtain();
            msg.what = ActivityManagerService.SHOW_NOT_RESPONDING_UI_MSG;
            msg.obj = new AppNotRespondingDialog.Data(app, activity, aboveSystem);

            // Send a message to the ui thread containing SHOW_NOT_RESPONDING_MSG
            mService.mUiHandler.sendMessage(msg);
        }
    }
    
}

When an ANR occurs, it is executed in sequence:

1. Output ANR Reason information to EventLog, that is, the closest time when ANR triggers is the am_anr information output from EventLog;
2. Collect and output traces information for each thread in the important process list, which is time consuming;
3, output current CPU usage and CPU load for each process;
4. Save the traces file and CPU usage information to the dropbox, which is the data/system/dropbox directory;
5. Depending on the type of process, decide whether to kill directly in the background or to inform the user with a popup box.

ANR outputs traces information for important processes that include:

1, firstPids queue: the first is ANR process, the second is system_server, the remaining is all persistent processes;
2, Native Queue: refers to the mediaserver, sdcard, and surfaceflinger processes in the/system/bin/directory;
3, lastPids queue: refers to all processes in mLruProcesses that are not part of firstPids.

4.2 dumpStackTraces

Continue to see the dump process for tracing information:

// ActivityManagerService.java

    public static File dumpStackTraces(boolean clearTraces, ... ,nativePids) {
        ... ...

        if (tracesDirProp.isEmpty()) {
            // Default to data/anr/traces.txt
            String globalTracesPath = 
                          SystemProperties.get("dalvik.vm.stack-trace-file", null);

            tracesFile = new File(globalTracesPath);
            try {
                if (clearTraces && tracesFile.exists()) {
                    tracesFile.delete();      // Delete existing traces files
                }

                // This ensures that the contents of the data/anr/traces.txt file are completely new, not appended
                tracesFile.createNewFile();   // Create traces File
                FileUtils.setPermissions(globalTracesPath, 0666, -1, -1);
            } catch (IOException e) {
                Slog.w(TAG, "Unable to prepare ANR traces file: " + tracesFile, e);
                return null;
            }
        } else {
        }

        // Output trace content
        dumpStackTraces(tracesFile.getAbsolutePath(), firstPids, nativePids,
                                         extraPids, useTombstonedForJavaTraces);
        return tracesFile;
    }

4.3 dumpStackTraces

// ActivityManagerService.java

    private static void dumpStackTraces(String tracesFile, ...) {

        final DumpStackFileObserver observer;
        if (useTombstonedForJavaTraces) {
            observer = null;
        } else {
            observer = new DumpStackFileObserver(tracesFile);
        }

        // We must complete all stack dumps within 20 seconds.
        long remainingTime = 20 * 1000;
        try {
            if (observer != null) {
                observer.startWatching();
            }

            // First, get the stacks for the first Pids process
            if (firstPids != null) {
                int num = firstPids.size();
                for (int i = 0; i < num; i++) {
                    final long timeTaken;
                    if (useTombstonedForJavaTraces) {
                        timeTaken = dumpJavaTracesTombstoned(firstPids.get(i),
                                                   tracesFile, remainingTime);
                    } else {
                        timeTaken = observer.dumpWithTimeout(firstPids.get(i),
                                                               remainingTime);
                    }
                    ... ... 
                }
            }

            // Next, get the stacks of the native process
            if (nativePids != null) {
                for (int pid : nativePids) {
                    ... ...
                    
                    // Output trace for native process
                    Debug.dumpNativeBacktraceToFileTimeout(
                            pid, tracesFile, (int) (nativeDumpTimeoutMs / 1000));
                            
                    final long timeTaken = SystemClock.elapsedRealtime() - start;
                    ... ...
                }
            }

            // Lastly, dump stacks for all extra PIDs from the CPU tracker.
            if (extraPids != null) {
                ... ...
                }
            }
        } finally {
            if (observer != null) {
                observer.stopWatching();
            }
        }
    }

4.4 Summary

When an ANR is triggered, the system outputs key information:

1. Output am_anr information to EventLog;
2. Obtain important process trace information and save it to/data/anr/traces.txt;
3, ANR reason and CPU usage information, output to main log;
4, save CPU usage and process trace file information to/data/system/dropbox.


5. Summary

When an ANR occurs for the Service, the AMS.appNotResponding() method is finally called.

1. For foreground services, the timeout is SERVICE_TIMEOUT = 20s;

2. For background services, the timeout is SERVICE_BACKGROUND_TIMEOUT = 200s;

3, Service timeout detection mechanism: If no corresponding operation has been performed for more than a certain period of time to trigger the delay message, the ANR will be triggered;

Posted by Scott_J on Sat, 11 May 2019 10:42:13 -0700