Android ANR Trace 详解

本文总结一下 Signal Catcher 线程在收到 SIGQUIT(3)后,Dump 信息的流程。

最主要还是解析 ANR trace种每种数据的含义,让我们更清晰的认识 trace。

Android SourceCode: 6.0

Keyword:block signal,kRunnable,kSuspended,Checkpoint,traces.txt

1.ART中 block 信号


bool Runtime::Init(RuntimeArgumentMap&& runtime_options_in) {
void Runtime::BlockSignals() {
  SignalSet signals;
  // SIGQUIT is used to dump the runtime's state (including stack traces).
  // SIGUSR1 is used to initiate a GC.


在这里相当于把 SIGPIPE,SIGQUIT,SIGUSR1这三个信号都屏蔽了。


void Runtime::DidForkFromZygote(JNIEnv* env, NativeBridgeAction action, const char* isa) {

从这里可以看到 signal catcher线程的启动是从 DidForkFromZygote()函数启动的,

说明 zygote进程中并没有 signal catcher 线程的存在,可以使用ps -t 看一下。

void Runtime::StartSignalCatcher() {
  if (!is_zygote_) {
    signal_catcher_ = new SignalCatcher(stack_trace_file_);
SignalCatcher::SignalCatcher(const std::string& stack_trace_file)
    : stack_trace_file_(stack_trace_file),
      lock_("SignalCatcher lock"),
      cond_("SignalCatcher::cond_", lock_),
      thread_(nullptr) {
  // Create a raw pthread; its start routine will attach to the runtime.
  CHECK_PTHREAD_CALL(pthread_create, (&pthread_, nullptr, &Run, this), "signal catcher thread");
  Thread* self = Thread::Current();
  MutexLock mu(self, lock_);
  while (thread_ == nullptr) {

 创建线程执行 signal_catcher.cc的 Run方法:

void* SignalCatcher::Run(void* arg) {
  SignalCatcher* signal_catcher = reinterpret_cast(arg);
  CHECK(signal_catcher != nullptr);
  Runtime* runtime = Runtime::Current();
  CHECK(runtime->AttachCurrentThread("Signal Catcher", true, runtime->GetSystemThreadGroup(),
  Thread* self = Thread::Current();
  DCHECK_NE(self->GetState(), kRunnable);
    MutexLock mu(self, signal_catcher->lock_);
    signal_catcher->thread_ = self;
  // Set up mask with signals we want to handle.
  SignalSet signals;
  while (true) {
    int signal_number = signal_catcher->WaitForSignal(self, signals);
    if (signal_catcher->ShouldHalt()) {
      return nullptr;
    switch (signal_number) {
    case SIGQUIT:
    case SIGUSR1:
      LOG(ERROR) << "Unexpected signal %d" << signal_number;

这里在 Run函数中,Attach Thread之后,才执行signal_catcher->cond_.Broadcast(self);

是为了保证Signal Catcher构造完成后,signal catcher线程已经运行且已经attach到当前VM。

另外 SignalCatcher waitForSignal,调用了 sigwait函数,等待 SIGQUIT和SIGUSR1信号的到来;

所以大部分时候signal catcher线程都处在Sleep状态,等待这两个信号,直到其中一个信号到来,才会继续运行。



当收到SIGQUIT即 signal 3时,signal catcher 会调用HandleSignalQuit函数来进行一些信息的Dump;

void SignalCatcher::HandleSigQuit() {
  Runtime* runtime = Runtime::Current();
  std::ostringstream os;
  os << "\n"
      << "----- pid " << getpid() << " at " << GetIsoDate() << " -----\n";
  // Note: The strings "Build fingerprint:" and "ABI:" are chosen to match the format used by
  // debuggerd. This allows, for example, the stack tool to work.
  std::string fingerprint = runtime->GetFingerprint();
  os << "Build fingerprint: '" << (fingerprint.empty() ? "unknown" : fingerprint) << "'\n";
  os << "ABI: '" << GetInstructionSetString(runtime->GetInstructionSet()) << "'\n";
  os << "Build type: " << (kIsDebugBuild ? "debug" : "optimized") << "\n";
  // 在Android5.0之前的版本上,在Dump之前会先SuspendAll thread,等到 Dump后再调用 ResumeAll恢复运行;
  // 在之后的版本上,Dump Thread 是利用CheckPoint 来进行 Thread Dump。
  if ((false)) {
    std::string maps;
    if (ReadFileToString("/proc/self/maps", &maps)) {
      os << "/proc/self/maps:\n" << maps;
  os << "----- end " << getpid() << " -----\n";

首先,Dump 当前进程pid,时间,名称,fingerprint,ABI等,比如一个anr 信息 traces.txt 的开头:

----- pid 9723 at 2017-04-11 17:12:10 -----
Cmd line:
Build fingerprint: '××××07:userdebug/test-keys'
ABI: 'arm64'
Build type: optimized

接着调用 runtime->DumpForSigQuit(os); 来Dump当前进程的详细信息;


----- end 10814 -----

最后通过 Output(os.str()); 写入到 ANR trace 文件: /data/anr/traces.txt;


4.Rumtime DumpForSigQuit

void Runtime::DumpForSigQuit(std::ostream& os) {
  os << "\n";

可以看到处理 SigQuit时,还是有较多信息打印的:


Zygote loaded classes=4530 post zygote classes=849

Intern table: 41808 strong; 360 weak
  GetJavaVM()->DumpForSigQuit(os);Dump当前VM的相关信息,globa reference数量,weak global ref 数量,so库:

JNI: CheckJNI is off; globals=719 (plus 410 weak) // 这两个值超过 51200时进程会 Abort
Libraries: /system/lib64/ /system/lib64/ /system/lib64/


  TrackedAllocators::Dump(os);在kEnableTrackingAllocator开关打开的情况下,会Dump Native mem的使用信息,默认没有打开;


  thread_list_->DumpForSigQuit(os);这个是关键的Dump,就是我们thread的调用栈dump,比如signal catcher Thread信息Dump:

"Signal Catcher" daemon prio=5 tid=3 Runnable
  | group="system" sCount=0 dsCount=0 obj=0x32c050d0 self=0x7f97dd1400
  | sysTid=9729 nice=0 cgrp=default sched=0/0 handle=0x7fa200e450
  | state=R schedstat=( 217991249 1074429 82 ) utm=15 stm=6 core=4 HZ=100
  | stack=0x7fa1f14000-0x7fa1f16000 stackSize=1005KB
  | held mutexes= "mutator lock"(shared held)
  native: #00 pc 000000000047661c  /system/lib64/ (_ZN3art15DumpNativeStackERNSt3__113basic_ostreamIcNS0_11char_traitsIcEEEEiP12BacktraceMapPKcPNS_9ArtMethodEPv+220)
  native: #01 pc 0000000000476618  /system/lib64/ (_ZN3art15DumpNativeStackERNSt3__113basic_ostreamIcNS0_11char_traitsIcEEEEiP12BacktraceMapPKcPNS_9ArtMethodEPv+216)
  native: #02 pc 000000000044ae64  /system/lib64/ (_ZNK3art6Thread9DumpStackERNSt3__113basic_ostreamIcNS1_11char_traitsIcEEEEbP12BacktraceMap+472)
  native: #03 pc 0000000000462584  /system/lib64/ (_ZN3art14DumpCheckpoint3RunEPNS_6ThreadE+820)
  native: #04 pc 000000000045a864  /system/lib64/ (_ZN3art10ThreadList13RunCheckpointEPNS_7ClosureE+456)
  native: #05 pc 000000000045a474  /system/lib64/ (_ZN3art10ThreadList4DumpERNSt3__113basic_ostreamIcNS1_11char_traitsIcEEEEb+288)
  native: #06 pc 000000000045a310  /system/lib64/ (_ZN3art10ThreadList14DumpForSigQuitERNSt3__113basic_ostreamIcNS1_11char_traitsIcEEEE+804)
  native: #07 pc 00000000004364bc  /system/lib64/ (_ZN3art7Runtime14DumpForSigQuitERNSt3__113basic_ostreamIcNS1_11char_traitsIcEEEE+344)
  native: #08 pc 000000000043cb74  /system/lib64/ (_ZN3art13SignalCatcher13HandleSigQuitEv+2240)
  native: #09 pc 000000000043b69c  /system/lib64/ (_ZN3art13SignalCatcher3RunEPv+476)
  native: #10 pc 00000000000681a4  /system/lib64/ (_ZL15__pthread_startPv+196)
  native: #11 pc 000000000001db80  /system/lib64/ (__start_thread+16)
  (no managed stack frames)

这里会主要梳理thread backtrace的打印过程;


5.Threadlist DumpForSigQuit

看一下ThreadList 的Dump过程:

 void ThreadList::DumpForSigQuit(std::ostream& os) {
    ScopedObjectAccess soa(Thread::Current());
    // Only print if we have samples.
    if (suspend_all_historam_.SampleSize() > 0) { // 这个数据记录一次SuspendAll所花费的时间,如果记录里有数据就进行dump
      Histogram::CumulativeData data;
      suspend_all_historam_.PrintConfidenceIntervals(os, 0.99, data);  // Dump time to suspend.
  Dump(os); // Dump thread list
  DumpUnattachedThreads(os); // 对于当前进程中,没有Attach 的线程进行Dump
void ThreadList::Dump(std::ostream& os) {
    MutexLock mu(Thread::Current(), *Locks::thread_list_lock_);
    os << "DALVIK THREADS (" << list_.size() << "):\n";
  DumpCheckpoint checkpoint(&os); // 设置CheckPoint函数
  size_t threads_running_checkpoint = RunCheckpoint(&checkpoint); // 执行CheckPoint函数进行 thread Dump
  if (threads_running_checkpoint != 0) {
    checkpoint.WaitForThreadsToRunThroughCheckpoint(threads_running_checkpoint); // 等待所有线程执行完CheckPoint,线程的个数作为参数传递
class DumpCheckpoint FINAL : public Closure {
  explicit DumpCheckpoint(std::ostream* os)
      : os_(os), barrier_(0), backtrace_map_(BacktraceMap::Create(getpid())) {}
  void Run(Thread* thread) OVERRIDE {
    // Note thread and self may not be equal if thread was already suspended at the point of the
    // request.
    Thread* self = Thread::Current();
    std::ostringstream local_os;
      ScopedObjectAccess soa(self);
      thread->Dump(local_os, backtrace_map_.get()); // 可以看到真正的thread dump是在这里,所以每个线程的dump都是通过DumpCheckPoint的Run函数进行的;
    local_os << "\n";
      // Use the logging lock to ensure serialization when writing to the common ostream.
      MutexLock mu(self, *Locks::logging_lock_);
      *os_ << local_os.str();
    barrier_.Pass(self); // 每个线程在Run函数中Dump thread完成后,通知当前Barrier对其成员count减一,所以当Barrier的count为0时,说明所有的线程已经完成的dump
  void WaitForThreadsToRunThroughCheckpoint(size_t threads_running_checkpoint) {
    Thread* self = Thread::Current();
    ScopedThreadStateChange tsc(self, kWaitingForCheckPointsToRun);
    bool timed_out = barrier_.Increment(self, threads_running_checkpoint, kDumpWaitTimeout); // 初始化一个barrier,计数需要进行Dump的线程总个数count,这个个数由上面的调用传递;并设置Wait 超时;
    if (timed_out) { // 如果Wait超时,说明还有thread Dump没有完成,此时Barrier的计数器count的值应该值大于0的
      // Avoid a recursive abort.
      LOG((kIsDebugBuild && (gAborting == 0)) ? FATAL : ERROR)
          << "Unexpected time out during dump checkpoint.";
  // The common stream that will accumulate all the dumps.
  std::ostream* const os_;
  // The barrier to be passed through and for the requestor to wait upon.
  Barrier barrier_;
  // A backtrace map, so that all threads use a shared info and don't reacquire/parse separately.
  std::unique_ptr backtrace_map_;

即,Dump Thread list 是通过每个thread执行DumpCheckpoint来Dump 各个thread的状态和backtrace的;



size_t ThreadList::RunCheckpoint(Closure* checkpoint_function) {
  Thread* self = Thread::Current();
  if (kDebugLocking && gAborting == 0) {
    CHECK_NE(self->GetState(), kRunnable);
  std::vector suspended_count_modified_threads;
  size_t count = 0;
    // 第一步:Runnable线程和Suspended线程区分对待
    // Call a checkpoint function for each thread, threads which are suspend get their checkpoint
    // manually called.这里已经说明,让每个thread执行 CheckPoint函数,对于Suspend的线程,我们手动帮它们调用 CheckPoint函数;
    MutexLock mu(self, *Locks::thread_list_lock_);
    MutexLock mu2(self, *Locks::thread_suspend_count_lock_);
    count = list_.size();
    for (const auto& thread : list_) {
      if (thread != self) {
        while (true) {
          // 对于Runnable的线程,把checkpoint_function设置到当前线程的 CheckPoint function列表中,当线程执行到CheckPoint时,会执行该CheckPoint function
          if (thread->RequestCheckpoint(checkpoint_function)) {
            // This thread will run its checkpoint some time in the near future.
          } else {
            // We are probably suspended, try to make sure that we stay suspended.
            // The thread switched back to runnable.
            if (thread->GetState() == kRunnable) {
              // Spurious fail, try again.
            // 对于suspended线程,放到一个集合里,稍后单独处理,为了防止处理过成中线程状态改变,影响处理,在这里把线程的suspend count +1,
            // 这样即便线程原有的suspended Request结束时,suspend count仍然不为0,无法进入Runnable状态
            thread->ModifySuspendCount(self, +1, false);
  // Run the checkpoint on ourself while we wait for threads to suspend.
  checkpoint_function->Run(self); // 对于Signal Catcher线程,在这里进行 CheckPoint function的Run函数调用,进行Thread dump
  // Run the checkpoint on the suspended threads.
  for (const auto& thread : suspended_count_modified_threads) {
    if (!thread->IsSuspended()) {
      if (ATRACE_ENABLED()) {
        std::ostringstream oss;
        ATRACE_BEGIN((std::string("Waiting for suspension of thread ") + oss.str()).c_str());
      // Busy wait until the thread is suspended.
      const uint64_t start_time = NanoTime();
      do {
      } while (!thread->IsSuspended());
      const uint64_t total_delay = NanoTime() - start_time;
      // Shouldn't need to wait for longer than 1000 microseconds.
      constexpr uint64_t kLongWaitThreshold = MsToNs(1);
      if (UNLIKELY(total_delay > kLongWaitThreshold)) {
        LOG(WARNING) << "Long wait of " << PrettyDuration(total_delay) << " for "
            << *thread << " suspension!";
    // We know for sure that the thread is suspended at this point.
    checkpoint_function->Run(thread); // 对于第一步中统计的suspende线程,目前无法运行,我们手动对每个线程执行CheckPoint function的Run函数,传递的参数是将要进行dump的thread;
      MutexLock mu2(self, *Locks::thread_suspend_count_lock_);
      thread->ModifySuspendCount(self, -1, false); // 当前thread dump 完成后,将其suspend count -1,不在需要保持suspend状态了;
    // Imitate ResumeAll, threads may be waiting on Thread::resume_cond_ since we raised their
    // suspend count. Now the suspend_count_ is lowered so we must do the broadcast.
    MutexLock mu2(self, *Locks::thread_suspend_count_lock_);
    Thread::resume_cond_->Broadcast(self); // 通知那些suspended线程,可以Resume了;
  return count;




enum ThreadState {
  //                                   Thread.State   JDWP state
  kTerminated = 66,                 // TERMINATED     TS_ZOMBIE has returned, but Thread* still around
  kRunnable,                        // RUNNABLE       TS_RUNNING   runnable
  kTimedWaiting,                    // TIMED_WAITING  TS_WAIT      in Object.wait() with a timeout
  kSleeping,                        // TIMED_WAITING  TS_SLEEPING  in Thread.sleep()
  kBlocked,                         // BLOCKED        TS_MONITOR   blocked on a monitor
  kWaiting,                         // WAITING        TS_WAIT      in Object.wait()
  kWaitingForGcToComplete,          // WAITING        TS_WAIT      blocked waiting for GC
  kWaitingForCheckPointsToRun,      // WAITING        TS_WAIT      GC waiting for checkpoints to run
  kWaitingPerformingGc,             // WAITING        TS_WAIT      performing GC
  kWaitingForDebuggerSend,          // WAITING        TS_WAIT      blocked waiting for events to be sent
  kWaitingForDebuggerToAttach,      // WAITING        TS_WAIT      blocked waiting for debugger to attach
  kWaitingInMainDebuggerLoop,       // WAITING        TS_WAIT      blocking/reading/processing debugger events
  kWaitingForDebuggerSuspension,    // WAITING        TS_WAIT      waiting for debugger suspend all
  kWaitingForJniOnLoad,             // WAITING        TS_WAIT      waiting for execution of dlopen and JNI on load code
  kWaitingForSignalCatcherOutput,   // WAITING        TS_WAIT      waiting for signal catcher IO to complete
  kWaitingInMainSignalCatcherLoop,  // WAITING        TS_WAIT      blocking/reading/processing signals
  kWaitingForDeoptimization,        // WAITING        TS_WAIT      waiting for deoptimization suspend all
  kWaitingForMethodTracingStart,    // WAITING        TS_WAIT      waiting for method tracing to start
  kWaitingForVisitObjects,          // WAITING        TS_WAIT      waiting for visiting objects
  kWaitingForGetObjectsAllocated,   // WAITING        TS_WAIT      waiting for getting the number of allocated objects
  kStarting,                        // NEW            TS_WAIT      native thread started, not yet ready to run managed code
  kNative,                          // RUNNABLE       TS_RUNNING   running in a JNI native method
  kSuspended,                       // RUNNABLE       TS_RUNNING   suspended by GC or debugger


kRunnable, // 正在运行,可能会存在heap上的内存分配和 java函数跳转

kNative,  // 是指在执行 Jni Native method,不会影响Java堆 heap的分配和GC,不存在java函数跳转

kSuspended, //线程其实是在Runnable中 Wait,wait resume condition



如果有suspend Request,则进行wait,代码不在继续执行,线程变成kSuspended状态,直到 Suspend count发生变化,变为0后才会切换到Runnable状态;

这也是为什么GC的时候需要 SuspendAll线程,因为Suspend后,此时的heap是被锁定的,不存在对java heap的操作,以便来进行GC线程操作heap;



提到CheckPoint必须要提到safe point;



比如:当正在执行java代码的线程A执行到safepoint时,会执行CheckSuspend函数,在发现当前线程有 checkpoint request时,

会在这个点执行线程的CheckPoint函数;如果发现当前线程有suspend request时,会进行SuspendCheck,使得线程进入Suspend状态(暂停);

所以说,ART CheckPoint应该是safepoint的一个功能实现;




  • 主动safepoint:编译生成的代码里或者解释代码里有主动检查safepoint的动作,并在发现需要进入safepoint时跳转到相应的处理程序里。
    • ART的解释器安插主动safepoint的位置在循环的回跳处(backedge,具体来说是在跳转前的源头处)以及方法返回处(return / throw exception)。
    • ART Optimizing Compiler安插主动safepoint的位置在循环回跳处(backedge,具体来说是在跳转前的源头处)以及方法入口处(entry)。
  • 被动safepoint:所有未内联的方法调用点(call site)都是被动safepoint。这里并没有任何需要主动执行的代码,而就是个普通的方法调用。
    • 之所以要作为safepoint,是因为执行到方法调用点之后,控制就交给了被调用的方法,而被调用的方法可能会进入safepoint,safepoint中可能需要遍历栈帧,因此caller也必须处于safepoint。


          // 对于Runnable的线程,把checkpoint_function设置到当前线程的 CheckPoint function列表中,当线程执行到CheckPoint时,会执行该CheckPoint function
          if (thread->RequestCheckpoint(checkpoint_function)) { 

处于Runnable的线程,我们设置了checkpoint_function和 CheckPoint Request,那么这个线程终归要执行到CheckPoint,从而执行check_point function.





此处CheckPoint函数已经被设置了 DumpCheckPoint的Run()函数,从而进行thread dump;


至此,suspended 状态和 Runnable状态的线程的Dump调用点都说清楚了。


6.Dump thread

6.1 先详细看一下 Thread信息的Dump:

void Thread::Dump(std::ostream& os, BacktraceMap* backtrace_map) const {
  DumpState(os); //Dump thread 状态信息
  DumpStack(os, backtrace_map); // Dump thread kernel/native/java stack

thread 的状态信息如下一个例子:

"Signal Catcher" daemon prio=5 tid=3 Runnable
  | group="system" sCount=0 dsCount=0 obj=0x32c050d0 self=0x7f97dd1400
  | sysTid=9729 nice=0 cgrp=default sched=0/0 handle=0x7fa200e450
  | state=R schedstat=( 217991249 1074429 82 ) utm=15 stm=6 core=4 HZ=100
  | stack=0x7fa1f14000-0x7fa1f16000 stackSize=1005KB
  | held mutexes= "mutator lock"(shared held)

第1行:"Signal Catcher":线程名称,daemon:是否是daemon线程(如果不是,则不打印“daemon”),prio=5:java线程Thread对象中的优先级,tid=3:vm中对应的 threadid,Runnable:线程在虚拟机中的状态;(如果当前线程没有attach,则第一行显示: “name” prio=num (not attached));

第2行:group: ThreadGroup,sCount: Suspend count, dsCount: debugger suspend count(小于等于sCount),obj:对应java线程 java.lang.Thread对象,self:native 对应的 thread 指针;

第3行:sysTid:对应linux线程 tid, nice:线程调度执行优先级,cgrp: cgroup,cpu调度group,sched:调度策略和调度优先级,handle:当前线程对应的pthread_t


线程调度优先级(getpriority获取),-20 ~ 20 之间,越小,优先级越高, -1代表获取优先级失败;



cat /proc/self/task/%d/cgroup,





#define SCHED_NORMAL            0

#define SCHED_OTHER             0

#define SCHED_FIFO              1

#define SCHED_RR                2

  它是一种实时的先进先出调用策略,且只能在超级用户下运行。这种调用策略仅仅被使用于优先级大于0的线程。它意味着,使用SCHED_FIFO的可运行线程将一直抢占使用SCHED_OTHER的运行线程J。此外SCHED_FIFO是一个非分时的简单调度策略,当一个线程变成可运行状态,它将被追加到对应优先级队列的尾部((POSIX 1003.1)。当所有高优先级的线程终止或者阻塞时,它将被运行。对于相同优先级别的线程,按照简单的先进先运行的规则运行。我们考虑一种很坏的情况,如果有若干相同优先级的线程等待执行,然而最早执行的线程无终止或者阻塞动作,那么其他线程是无法执行的,除非当前线程调用如pthread_yield之类的函数,所以在使用SCHED_FIFO的时候要小心处理相同级别线程的动作。
  鉴于SCHED_FIFO调度策略的一些缺点,SCHED_RR对SCHED_FIFO做出了一些增强功能。从实质上看,它还是SCHED_FIFO调用策略。它使用最大运行时间来限制当前进程的运行,当运行时间大于等于最大运行时间的时候,当前线程将被切换并放置于相同优先级队列的最后。这样做的好处是其他具有相同级别的线程能在“自私“线程下执行。返回值  0表示设置成功 其他表示设置不成功

第4行:state:linux线程的状态,schedstat:线程调度情况,utm=15:线程在用户态运行的时间, stm=6:线程在内核态运行的时间, core=4:线程最后运行在哪个cpu上, HZ=100:系统时钟频率

state=R 任务的状态,R:running, S:sleeping (TASK_INTERRUPTIBLE), D:disk sleep (TASK_UNINTERRUPTIBLE), T: stopped, T:tracing stop,Z:zombie, X:dead

schedstat:cat /proc/self/task/%d/schedstat

schedstat=( 217991249 1074429 82 ) 表示:(累计运行的物理时间(ns)   累计在就绪队列里的等待时间   主动切换和被动切换的累计次数)

state,utm, stm等从 /proc/self/task/%d/stat 中获取

 * struct task_cputime - collected CPU time counts

 * @utime:        time spent in user mode, in &cputime_t units

 * @stime:        time spent in kernel mode, in &cputime_t units

 * @sum_exec_runtime:    total time spent on the CPU, in nanoseconds

utm,stm 单位是jiffies,时钟中断次数;

频率是周期的倒数,一般是一秒钟中断产生的次数,所以 1/100 = 0.01s = 10ms, 每10ms产生一次中断;


第5行:stack=0x7fa1f14000-0x7fa1f16000 stackSize=1005KB

线程栈的start 和 end,以及 stack size;


第6行:held mutexes= "mutator lock"(shared held)

线程持有的当前虚拟机中的mutex的名称,及持有方式:shared held: 共享锁,exclusive held:独占锁;

每个线程在完成suspend时,都会把 “mutator lock”释放;

实际上,Suspend所有线程时,判断是否suspend完成,就是通过获取"mutator lock"独占锁来判断的,

如果能获取独占锁,说明其他线程都不再 独占/共享 持有 "mutator lock" ,说明所有线程suspend已经完成。


6.2 接下来是 Thread Dump backtrace:


实际是从 /proc/self/task/%d/stack 读取kernel stack后,去除地址;

gemini:/ # cat /proc/10749/task/10749/stack

[<0000000000000000>] __switch_to+0x70/0x7c

[<0000000000000000>] SyS_epoll_wait+0x2ac/0x370

[<0000000000000000>] SyS_epoll_pwait+0xa4/0x118

[<0000000000000000>] el0_svc_naked+0x24/0x28

[<0000000000000000>] 0xffffffffffffffff



Backtrace->Unwind 来获取 backtrace并打印 pc offset和Method name:

  native: #00 pc 000000000001beec  /system/lib64/ (syscall+28)

  native: #01 pc 00000000000e6dd4  /system/lib64/ (_ZN3art17ConditionVariable16WaitHoldingLocksEPNS_6ThreadE+160)

  native: #02 pc 000000000031a354  /system/lib64/ (_ZN3art12ProfileSaver3RunEv+296)

  native: #03 pc 000000000031ba6c  /system/lib64/ (_ZN3art12ProfileSaver21RunProfileSaverThreadEPv+100)

  native: #04 pc 00000000000681a4  /system/lib64/ (_ZL15__pthread_startPv+196)

  native: #05 pc 000000000001db80  /system/lib64/ (__start_thread+16)



使用StackVisitor进行dump ;



1.Thread 信息的Dump是通过 CheckPoint 来实现的

2.kRunnable和kSuspended状态的线程 CheckPointFunction的调用有所不同


4.Thread state 信息的解读,backtrace的获取
