Android Stability - gdb和coredump

在分析Android Native Error这一类问题的时候,如果能抓到异常进程的coredump文件,那么对分析该问题是事半功倍的,但是由于在抓取coredump文件的时候,需要消耗很多的内存和CPU资源,并且保存的文件也都很大,所以用户最终使用的版本都是默认关闭的,即使在内部研发阶段,也只是某些特定的测试项里面才会打开,例如针对系统稳定性的monkey测试,所以有的时候稳定性问题其实不是很难分析,难的是获取有效的Log,抓取到了coredump文件,同时有这个固件对应的symbole的话就可以使用GDB这一个调试利器来分析问题了.

Coredump文件

coredump文件可以理解为是进程某个时刻的内存和寄存器快照,最终用ELF文件把这些内容包装一下,就可以使用GDB等工具来分析了,Kernel默认是支持Coredump的,但是在Android上面还有几个重要的因素影响到是否会抓取coredump.

Linux当中每个进程可以使用的资源是有限制的,可以通过查看/proc/$PID/limits这个文件来查看,例如


Android Stability - gdb和coredump_第1张图片
进程rlimit

从这个节点的信息可以看到,这个进程允许打开的文件个数是1024,而它的core file size是0,所以当前这个进程是即使收到了相关的信号,它也是无法抓取coredump的,所以一般要修改进程的rlimit.

/proc/sys/kernel/core_pattern设置coredump文件的保存路径,例如 echo " /data/corefile/core-%e-%p" > /proc/sys/kernel/core_pattern
另外还可能要执行 echo 1 > /proc/sys/fs/suid_dumpable.

进程只有在接收到某些特定的信号时,才会去抓coredump,比如SIGSEGV、SIGABRT、SIGBUS等等,同时要注意在抓取某个进程的coredump文件的时候,不能发送SIGKILL信号给该进程,SIGKILL会终止抓取动作,导致抓出来的coredump文件不完整,无法分析.

GDB
  • GDB在线调试环境

GDB,GNU Project Debugger,大名鼎鼎的调试利器,对于我们程序员来说,即使没用过但应该也不陌生吧,GDB它可以在线调试,也可以离线调试coredump等内存转储文件,在稳定性日常工作中,我们主要用它来离线分析coredump文件.

  • adb shell gdbserver remote:1234 --attach 4321
    1234是手机端的端口,4321是你要debug的进程PID.
  • adb forward tcp:1234 tcp:1234
    设置adb tcp端口转发,前一个tcp:1234是指PC端的端口,后一个是Target,也就是手机端的.
  • aarch64-linux-android-gdb
    aarch64-linux-android-gdb是针对ARM64的gdb客户端,相应的对于以AARCH32来执行的进程,需要选择相应的gdb客户端.
  • 在GDB命令行里面执行以下命令:
    (gdb) set solib-absolute-prefix out/target/product/general/symbols/
    (gdb) set solib-search-path out/target/product/general/symbols/
    (gdb) target remote :1234

更多的信息请见搭建Android GDB在线调试环境

  • GDB + Eclipse 离线调试

工欲善其事必先利其器,分析NE问题可以使用命令行形式的GDB工具,如果你熟悉GDB的各种命令,那么命令行的方式可以让你得心应手,另外也还可以使用GDB + Eclipse打造一个可视化的调试环境,虽然功能没有命令行强大,但是对我们分析简单的问题足矣,下面介绍如何搭建环境:
1、打开ADT之后,依次点击Run → Debug Configration,然后选择C/C++ Postmortem Debugger

Android Stability - gdb和coredump_第2张图片

2、点击左上角的 "+"符号,新建一个配置,并随机取一个名字,例如“android_gdb”, C/C++ Appliacation选择你的Coredump文件对应的可执行文件,例如SurfaceFlinger,可以选择/symbols/system/bin/surfaceflinger,但是由zygote派生出来的进程要选择/symbols/system/bin/app_process64, 同时 Post Mortem file type选择 Core file,点击Browse定位到Coredump文件.

Android Stability - gdb和coredump_第3张图片

3、切换到Debugger选项卡,GDB debugger选择对应平台的gdb可执行文件,GDB command file对应的文件是你想在打开coredump文件之后想要执行的gdb命令,我的gdbinit文件内容是: set solib-search-path /media/xxxx/SSD/tmp/Log/0622/symbols/system/lib64 设置GDB的lib库查找路径,这样GDB就可以把带符号信息的so库加载进去了

Android Stability - gdb和coredump_第4张图片

4、点击Debug按钮之后会出现完整的debug视图

Android Stability - gdb和coredump_第5张图片
  • GDB脚本

GDB脚本 gdb支持两种脚本:python脚本和命令脚本,在命令脚本中我们可以自定义命令,其形式类似于:

  define commandName 
   statement 
   ...... 
  end

其中 statement可以是任何有效的GDB命令,此外自定义命令还支持最多10个输入参数:$arg0,$arg1 …… $arg9,并且还用$argc来标明一共传入了多少参数,另外脚本也提供了if else等条件判断语句和while循环语句,可以直接在命令行里面编辑gdb脚本,也可以写到一个单独的文件里面,然后使用source命令加载进来.

  • GDB调试coredump示例
    在monkey测试过程中,发现有一台机器卡屏了,通过log分析到可能是system_server进程的ART虚拟机在抓取trace或者gc时候,调用SuspendAll的时候超时了,这种情况以前也遇到过,也是抓coredump文件来分析的,所以这一次我们也是直接发送了kill -11信号给system_server进程,然后抓到coredump文件.拿到了coredump文件之后,还需要这个固件对应的symbole文件分析.
[Linux@Linux w]$ls
core-system_server-3060  symbols  symbols.zip
[Linux@Linux w]$aarch64-linux-android-gdb ./symbols/system/bin/app_process64 ./core-system_server-3060 
GNU gdb (GDB) 7.7
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=x86_64-linux-gnu --target=aarch64-elf-linux".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./symbols/system/bin/app_process64...done.
[New LWP 3060]
[New LWP 3065]
[New LWP 3173]
[New LWP 3067]
[New LWP 3066]
......
......
[New LWP 3954]
[New LWP 3086]
[New LWP 3128]

warning: Could not load shared library symbols for 194 libraries, e.g. /system/bin/linker64.
Use the "info sharedlibrary" command to see the complete listing.
Do you need "set solib-search-path" or "set sysroot"?
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000007962221cac in ?? ()
(gdb) set solib-search-path ./symbols/system/lib64/
Reading symbols from /media/linux/SSD/tmp/Log/DoNotRemove/symbols/system/lib64/libcutils.so...done.
Loaded symbols for /media/linux/SSD/tmp/Log/DoNotRemove/symbols/system/lib64/libcutils.so
Reading symbols from /media/xxxx/SSD/tmp/Log/DoNotRemove/symbols/system/lib64/libutils.so...done.
Loaded symbols for /media/linux/SSD/tmp/Log/DoNotRemove/symbols/system/lib64/libutils.so
Reading symbols from /media/xxxx/SSD/tmp/Log/DoNotRemove/symbols/system/lib64/liblog.so...done.
Loaded symbols for /media/linux/SSD/tmp/Log/DoNotRemove/symbols/system/lib64/liblog.so
......
(gdb) bt
#0  syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
#1  0x000000795f0c63dc in futex (uaddr=0x795f6fa910, op=0, val=17669, val3=0, timeout=, uaddr2=) at art/runtime/base/mutex-inl.h:45
#2  art::ConditionVariable::WaitHoldingLocks (this=, self=) at art/runtime/base/mutex.cc:848
#3  0x000000795f3272e8 in TransitionFromSuspendedToRunnable (this=) at art/runtime/thread-inl.h:209
#4  ScopedThreadStateChange (self=, new_thread_state=art::kRunnable, this=) at art/runtime/scoped_thread_state_change.h:51
#5  ScopedObjectAccessUnchecked (this=, env=) at art/runtime/scoped_thread_state_change.h:224
#6  ScopedObjectAccess (this=, env=) at art/runtime/scoped_thread_state_change.h:255
#7  art::JNI::NewStringUTF (env=, utf=) at art/runtime/jni_internal.cc:1646
#8  0x0000007961acce64 in NewStringUTF (bytes=, this=0x795f63e180) at libnativehelper/include/nativehelper/jni.h:842
#9  android::android_content_AssetManager_getArrayStringResource (env=0x795f63e180, clazz=, arrayResId=) at frameworks/base/core/jni/android_util_AssetManager.cpp:1977
#10 0x00000000748f498c in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)

因为这个机器是虚拟机在suspend all的时候卡住的,分析代码,这里卡住的话一般是因为某些线程没有及时响应suspend flag,而不响应的话一般是这个线程的状态是mRunnable状态,注意这里指的是ART的线程状态不是Linux的R状态,这两个之间还是有区别的,那我们的思路就是要从coredump文件找出是哪个线程还在mRunnable状态,因为所有的Java线程对应的art::Thread对象都在ThreadList的list_域变量里面,所以我们只要把这个list_对象内容打印出来,就可以找到是哪个Java线程是mRunnable状态.

  // The actual list of all threads.
  std::list list_ GUARDED_BY(Locks::thread_list_lock_);

而要打印这个list_的内容的话,需要从上下文里面找到ThreadList对象,这个可以通过Runtime的全局变量推导出来,也可以找到哪个线程的调用堆栈上下文里面有这个ThreadList对象的,然后找出来,我们这里选用第二种方式,因为ThreadList::SuspendAllInternal的方法恰好就有this参数,通过this就很容易找到ThreadList对象,在虚拟机中调用这个的地方只有SignalCatcher 或者HeapTaskDaemon线程,他们一个负责打印trace,一个负责执行gc task,所以先从现场或者log里面找到这两个线程的pid,然后通过gdb来查看他们当前的堆栈. 我们找到这两个线程的pid分别为3065和3070.

(gdb) info threads
  Id   Target Id         Frame 
  180  LWP 3128          syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
  179  LWP 3086          syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
  178  LWP 3954          syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
  177  LWP 3218          syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
  ......
  127  LWP 3167          syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
  ---Type  to continue, or q  to quit---

因为GDB对线程重新编了号,所以我们要找到3065和3070对应的编号,而且我们看到在GDB里面有输出
“---Type to continue, or q to quit---”这样的内容,这个是因为GDB默认对于输出内容很长的做了截断,可以通过set pagination off来改变这种行为.

(gdb) set pagination off
(gdb) info threads
......
9    LWP 3070          syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
......
2    LWP 3065          syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41

(gdb) t 2
[Switching to thread 2 (LWP 3065)]
#0  syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
41  bionic/libc/arch-arm64/bionic/syscall.S: 没有那个文件或目录.
(gdb) bt
#0  syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
#1  0x000000795f0c63dc in futex (uaddr=0x795f6fa910, op=0, val=17669, val3=0, timeout=, uaddr2=) at art/runtime/base/mutex-inl.h:45
#2  art::ConditionVariable::WaitHoldingLocks (this=, self=) at art/runtime/base/mutex.cc:848
#3  0x000000795f120234 in TransitionFromSuspendedToRunnable (this=) at art/runtime/thread-inl.h:209
#4  ScopedThreadStateChange (new_thread_state=art::kRunnable, this=, self=) at art/runtime/scoped_thread_state_change.h:51
#5  ScopedObjectAccessUnchecked (this=, self=) at art/runtime/scoped_thread_state_change.h:231
#6  ScopedObjectAccess (self=, this=) at art/runtime/scoped_thread_state_change.h:261
#7  art::ClassLinker::DumpForSigQuit (this=, os=...) at art/runtime/class_linker.cc:7752
#8  0x000000795f415950 in art::Runtime::DumpForSigQuit (this=0x795f6ec000, os=...) at art/runtime/runtime.cc:1401
#9  0x000000795f41c27c in art::SignalCatcher::HandleSigQuit (this=) at art/runtime/signal_catcher.cc:145
#10 0x000000795f41ad3c in art::SignalCatcher::Run (arg=) at art/runtime/signal_catcher.cc:214
#11 0x000000796226e0f0 in __pthread_start (arg=) at bionic/libc/bionic/pthread_create.cpp:198
#12 0x0000007962223944 in __start_thread (fn=0x62, arg=0x795f6fa910) at bionic/libc/bionic/clone.cpp:41
#13 0x0000000000000000 in ?? ()

(gdb) t 9
[Switching to thread 9 (LWP 3070)]
#0  syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
41  in bionic/libc/arch-arm64/bionic/syscall.S
(gdb) bt
#0  syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
#1  0x000000795f43bb2c in futex (val3=0, uaddr=, op=, val=, timeout=, uaddr2=) at art/runtime/base/mutex-inl.h:45
#2  art::ThreadList::SuspendAllInternal (this=, self=, ignore1=, ignore2=, debug_suspend=) at art/runtime/thread_list.cc:586
#3  0x000000795f43c198 in art::ThreadList::SuspendAll (this=0x795f6fb000, cause=0x795f55e996 "ScopedPause", long_suspend=) at art/runtime/thread_list.cc:476
#4  0x000000795f1c6d4c in art::gc::collector::MarkSweep::RunPhases (this=) at art/runtime/gc/collector/mark_sweep.cc:153
#5  0x000000795f1bf490 in art::gc::collector::GarbageCollector::Run (this=0x795f687500, gc_cause=art::gc::kGcCauseBackground, clear_soft_references=false) at art/runtime/gc/collector/garbage_collector.cc:87
#6  0x000000795f1ef0a4 in art::gc::Heap::CollectGarbageInternal (this=, gc_type=, gc_cause=, clear_soft_references=) at art/runtime/gc/heap.cc:2719
#7  0x000000795f1f65dc in art::gc::Heap::ConcurrentGC (this=0x795f64b700, self=, force_full=true) at art/runtime/gc/heap.cc:3722
#8  0x000000795f1fd668 in art::gc::Heap::ConcurrentGCTask::Run (this=, self=0x0) at art/runtime/gc/heap.cc:3685
#9  0x000000795f21f2c4 in art::gc::TaskProcessor::RunAllTasks (this=, self=) at art/runtime/gc/task_processor.cc:124
#10 0x0000000072739114 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

(gdb) 

从上面gdb命令的执行结果来看,ThreadList对象的地址是0x795f6fb000,那么可以通过它找到保存了所有Thread对象的list_地址.

#3 0x000000795f43c198 in art::ThreadList::SuspendAll (this=0x795f6fb000, cause=0x795f55e996 "ScopedPause", long_suspend=) at art/runtime/thread_list.cc:476.
(gdb) set print pretty on
(gdb) f 3
#3  0x000000795f43c198 in art::ThreadList::SuspendAll (this=0x795f6fb000, cause=0x795f55e996 "ScopedPause", long_suspend=) at art/runtime/thread_list.cc:476
476 in art/runtime/thread_list.cc
(gdb) p *this
$2 = {
  static kMaxThreadId = 65535, 
  static kInvalidThreadId = 0, 
  static kMainThreadId = 1, 
  allocated_ids_ = {
    > = {
      static __bits_per_word = 64, 
      __first_ = {18446744073709551615, 18446744073709551615, 38654705663, 0 }
    }, 
    members of std::__1::bitset<65535>: 
    static __n_words = 1024
  }, 
  list_ = {
     >> = {
      __end_ = {
        __prev_ = 0x792df94ee0, 
        __next_ = 0x795f6fa9a0
      }, 
      __size_alloc_ = {
         >, 2>> = {
           >> = {}, 
          members of std::__1::__libcpp_compressed_pair_imp >, 2>: 
          __first_ = 161
        }, }
    }, }, 
  suspend_all_count_ = 1, 
  debug_suspend_all_count_ = 0, 
 ......
}

因为list_是一个很长的列表,所以这里先自定义一个GDB命令,用来自动打印每个Thread对象内容

(gdb) def dump_all_threads_state
Type commands for definition of "dump_all_threads_state".
End with a line saying just "end".
>    set $current = list_.__end_.__next_
>    while $current != 0
 >        p * $current.__value_
 >        set $current = $current.__next_
 >    end
>end 

(gdb) dump_all_threads_state
$3 = {
  static kStackOverflowImplicitCheckSize = 8192, 
  static kMaxCheckpoints = 3, 
  static kMaxSuspendBarriers = 3, 
  static is_started_ = true, 
  static pthread_key_self_ = -2147483634, 
  static resume_cond_ = 0x795f6fa900, 
  static is_sensitive_thread_hook_ = 0x7961a76f20 , 
  static jit_sensitive_thread_ = 0x0, 
  tls32_ = {
    state_and_flags = {
      as_struct = {
        flags = 1, 
        state = 89
      }, 
      as_atomic_int = {
        > = {
          > = {
            > = {
              __a_ = 5832705
            }, }, }, }, 
      as_int = 5832705
    }, 
    suspend_count = 1, 
    debug_suspend_count = 0, 
    thin_lock_thread_id = 1, 
    tid = 3060, 
    daemon = 0, 
    throwing_OutOfMemoryError = 0, 
    no_thread_suspension = 0, 
    thread_exit_check_count = 0, 
    handling_signal_ = 0, 
    suspended_at_suspend_check = 0, 
    ready_for_debug_invoke = 0, 
    debug_method_entry_ = 0, 
    is_gc_marking = 0, 
    weak_ref_access_enabled = 1, 
    disable_thread_flip_count = 0
  }, 
  tls64_ = {
    trace_clock_base = 0, 
    stats = {
      allocated_objects = 0, 
      allocated_bytes = 0, 
      freed_objects = 0, 
      freed_bytes = 0, 
      gc_for_alloc_count = 0, 
      class_init_count = 2682, 
      class_init_time_ns = 1043034049
    }
  }, 
  tlsPtr_ = {
    card_table = 0x795ad01070 "", 
    exception = 0x0, 
    stack_end = 0x7ffbc34000 "", 
    managed_stack = {
      top_quick_frame_ = 0x7ffc42d940, 
      link_ = 0x7ffc42e050, 
      top_shadow_frame_ = 0x0
    }, 
    suspend_trigger = 0x0, 
    jni_env = 0x795f63e180, 
    tmp_jni_env = 0x0, 
    self = 0x0, 
    opeer = 0x762523e8, 
    jpeer = 0x0, 
    stack_begin = 0x7ffbc32000 "", 
    stack_size = 8388608, 
    stack_trace_sample = 0x0, 
    wait_next = 0x0, 
    monitor_enter_object = 0x0, 
    top_handle_scope = 0x7ffc42d948, 
    class_loader_override = 0x10070a, 
    long_jump_context = 0x795f687c80, 
    instrumentation_stack = 0x795f716e90, 
    debug_invoke_req = 0x0, 
    single_step_control = 0x0, 
    stacked_shadow_frame_record = 0x0, 
    deoptimization_context_stack = 0x0, 
    frame_id_to_shadow_frame = 0x0, 
    name = 0x795f6fa980, 
    pthread_self = 521358133912, 
    last_no_thread_suspension_cause = 0x0, 
    checkpoint_functions = {0x0, 0x0, 0x0}, 
    active_suspend_barriers = {0x0, 0x0, 0x0}, 
    jni_entrypoints = {
      pDlsymLookup = 0x795f0b04d0 
    }, 
    quick_entrypoints = {
      pAllocArray = 0x795f0b4420 , 
      pAllocArrayResolved = 0x795f0b44e0 , 
      pAllocArrayWithAccessCheck = 0x795f0b45a0 , 
      pAllocObject = 0x795f0b9cc0 , 
      pAllocObjectResolved = 0x795f0b41e0 , 
      pAllocObjectInitialized = 0x795f0b42a0 , 
      pAllocObjectWithAccessCheck = 0x795f0b4360 , 
      pCheckAndAllocArray = 0x795f0b4660 , 
      pCheckAndAllocArrayWithAccessCheck = 0x795f0b4720 , 
      pAllocStringFromBytes = 0x795f0b47e0 , 
      pAllocStringFromChars = 0x795f0b48f0 , 
      pAllocStringFromString = 0x795f0b49b0 , 
      pInstanceofNonTrivial = 0x795f516374 , 
      pCheckCast = 0x795f0b17f0 , 
      pInitializeStaticStorage = 0x795f0b1a40 , 
      pInitializeTypeAndVerifyAccess = 0x795f0b1bc0 , 
      pInitializeType = 0x795f0b1b00 , 
      pResolveString = 0x795f0b2e80 , 
      pSet8Instance = 0x795f0b2a00 , 
      pSet8Static = 0x795f0b2700 , 
      pSet16Instance = 0x795f0b2ac0 , 
      pSet16Static = 0x795f0b27c0 , 
      pSet32Instance = 0x795f0b2b80 , 
      pSet32Static = 0x795f0b2880 , 
      pSet64Instance = 0x795f0b2c40 , 
      pSet64Static = 0x795f0b2dc0 , 
      pSetObjInstance = 0x795f0b2d00 , 
      pSetObjStatic = 0x795f0b2940 , 
      pGetByteInstance = 0x795f0b2280 , 
      pGetBooleanInstance = 0x795f0b21c0 , 
      pGetByteStatic = 0x795f0b1d40 , 
      pGetBooleanStatic = 0x795f0b1c80 , 
      pGetShortInstance = 0x795f0b2400 , 
      pGetCharInstance = 0x795f0b2340 , 
      pGetShortStatic = 0x795f0b1ec0 , 
      pGetCharStatic = 0x795f0b1e00 , 
      pGet32Instance = 0x795f0b24c0 , 
      pGet32Static = 0x795f0b1f80 , 
      pGet64Instance = 0x795f0b2580 , 
      pGet64Static = 0x795f0b2040 , 
      pGetObjInstance = 0x795f0b2640 , 
      pGetObjStatic = 0x795f0b2100 , 
      pAputObjectWithNullAndBoundCheck = 0x795f0b1870 , 
      pAputObjectWithBoundCheck = 0x795f0b1880 , 
      pAputObject = 0x795f0b18a0 , 
      pHandleFillArrayData = 0x795f0b1980 , 
      pJniMethodStart = 0x795f523c2c , 
      pJniMethodStartSynchronized = 0x795f523db0 , 
      pJniMethodEnd = 0x795f523dec , 
      pJniMethodEndSynchronized = 0x795f5240d4 , 
      pJniMethodEndWithReference = 0x795f5242f4 , 
      pJniMethodEndWithReferenceSynchronized = 0x795f5243a4 , 
      pQuickGenericJniTrampoline = 0x795f0ba500 , 
      pLockObject = 0x795f0b1430 , 
      pUnlockObject = 0x795f0b1610 , 
      pCmpgDouble = 0x0, 
      pCmpgFloat = 0x0, 
      pCmplDouble = 0x0, 
      pCmplFloat = 0x0, 
      pCos = 0x7961075168 , 
      pSin = 0x7961079e78 , 
      pAcos = 0x796106b978 , 
      pAsin = 0x796106c128 , 
      pAtan = 0x7961074400 , 
      pAtan2 = 0x796106c55c , 
      pCbrt = 0x7961074844 , 
      pCosh = 0x796106cbb4 , 
      pExp = 0x796106cd98 , 
      pExpm1 = 0x7961077cdc , 
      pHypot = 0x796106d688 , 
      pLog = 0x7961071960 , 
      pLog10 = 0x79610712c0 , 
      pNextAfter = 0x79610792c8 , 
      pSinh = 0x79610730f0 , 
      pTan = 0x796107a72c , 
      pTanh = 0x796107aefc , 
      pFmod = 0x796106d204 , 
      pL2d = 0x0, 
      pFmodf = 0x796106d4f4 , 
      pL2f = 0x0, 
      pD2iz = 0x0, 
      pF2iz = 0x0, 
      pIdivmod = 0x0, 
      pD2l = 0x0, 
      pF2l = 0x0, 
      pLdiv = 0x0, 
      pLmod = 0x0, 
      pLmul = 0x0, 
      pShlLong = 0x0, 
      pShrLong = 0x0, 
      pUshrLong = 0x0, 
      pIndexOf = 0x795f0ba930 , 
      pStringCompareTo = 0x795f0baa00 , 
      pMemcpy = 0x79622208c8 , 
      pQuickImtConflictTrampoline = 0x795f0ba290 , 
      pQuickResolutionTrampoline = 0x795f0ba3c0 , 
      pQuickToInterpreterBridge = 0x795f0ba650 , 
      pInvokeDirectTrampolineWithAccessCheck = 0x795f0b0a70 , 
      pInvokeInterfaceTrampolineWithAccessCheck = 0x795f0b0870 , 
      pInvokeStaticTrampolineWithAccessCheck = 0x795f0b0970 , 
      pInvokeSuperTrampolineWithAccessCheck = 0x795f0b0b70 , 
      pInvokeVirtualTrampolineWithAccessCheck = 0x795f0b0c70 , 
      pTestSuspend = 0x795f0ba090 , 
      pDeliverException = 0x795f0b0660 , 
      pThrowArrayBounds = 0x795f0b0760 , 
      pThrowDivZero = 0x795f0b0710 , 
      pThrowNoSuchMethod = 0x795f0b0810 , 
      pThrowNullPointer = 0x795f0b06c0 , 
      pThrowStackOverflow = 0x795f0b07c0 , 
      pDeoptimize = 0x795f0ba8d0 , 
      pA64Load = 0x795f4253b8 , 
      pA64Store = 0x795f4253b8 , 
      pNewEmptyString = 0x70e8c810, 
      pNewStringFromBytes_B = 0x70e8c848, 
      pNewStringFromBytes_BI = 0x70e8c880, 
      pNewStringFromBytes_BII = 0x70e8c8b8, 
      pNewStringFromBytes_BIII = 0x70e8c8f0, 
      pNewStringFromBytes_BIIString = 0x70e8c928, 
      pNewStringFromBytes_BString = 0x70e8c998, 
      pNewStringFromBytes_BIICharset = 0x70e8c960, 
      pNewStringFromBytes_BCharset = 0x70e8c9d0, 
      pNewStringFromChars_C = 0x70e8ca40, 
      pNewStringFromChars_CII = 0x70e8ca78, 
      pNewStringFromChars_IIC = 0x70e8ca08, 
      pNewStringFromCodePoints = 0x70e8cab0, 
      pNewStringFromString = 0x70e8cae8, 
      pNewStringFromStringBuffer = 0x70e8cb20, 
      pNewStringFromStringBuilder = 0x70e8cb58, 
      pReadBarrierJni = 0x795f523c28 *, art::Thread*)>, 
      pReadBarrierMark = 0x795f5235c4 , 
      pReadBarrierSlow = 0x795f5236e8 , 
      pReadBarrierForRootSlow = 0x795f5236f0 *)>
    }, 
    thread_local_objects = 0, 
    thread_local_start = 0x0, 
    thread_local_pos = 0x0, 
    thread_local_end = 0x0, 
    mterp_current_ibase = 0x795f0a0280 , 
    mterp_default_ibase = 0x795f0a0280 , 
    mterp_alt_ibase = 0x795f0a8280 , 
    rosalloc_runs = {0x795f5fcf08 , 0x13754000, 0x149d7000, 0x14e1f000, 0x14595000, 0x1457c000, 0x13e65000, 0x14186000, 0x14828000, 0x12f71000, 0x13f18000, 0x795f5fcf08 , 0x795f5fcf08 , 0x13c29000, 0x795f5fcf08 , 0x795f5fcf08 }, 
    thread_local_alloc_stack_top = 0x795a52b8b8, 
    thread_local_alloc_stack_end = 0x795a52ba00, 
    held_mutexes = {0x0 }, 
    nested_signal_state = 0x795f67f300, 
    flip_function = 0x0, 
    method_verifier = 0x0, 
    thread_local_mark_stack = 0x0
  }, 
  wait_mutex_ = 0x795f719080, 
  wait_cond_ = 0x795f6fa960, 
  wait_monitor_ = 0x0, 
  interrupted_ = false, 
  debug_disallow_read_barrier_ = 0 '\000'
}
...... //此处省略N个Thread对象的打印
Cannot access memory at address 0xa1
(gdb) 

从上面打印出来的N个Thread对象的内容来看,我们很容易找到处于kRunnable状态的线程,它的pid为3093,因为它的state = 67,也就是kRunnable.

enum ThreadState {
  //                                   Thread.State   JDWP state
  kTerminated = 66,                 // TERMINATED     TS_ZOMBIE    Thread.run has returned, but Thread* still around
  kRunnable,                        // RUNNABLE       TS_RUNNING   runnable
  kTimedWaiting,                    // TIMED_WAITING  TS_WAIT      in Object.wait() with a timeout
  kSleeping,                        // TIMED_WAITING  TS_SLEEPING  in Thread.sleep()
  kBlocked,                         // BLOCKED        TS_MONITOR   blocked on a monitor
  kWaiting,                         // WAITING        TS_WAIT      in Object.wait()
  kWaitingForGcToComplete,          // WAITING        TS_WAIT      blocked waiting for GC
  kWaitingForCheckPointsToRun,      // WAITING        TS_WAIT      GC waiting for checkpoints to run
  kWaitingPerformingGc,             // WAITING        TS_WAIT      performing GC
  kWaitingForDebuggerSend,          // WAITING        TS_WAIT      blocked waiting for events to be sent
  kWaitingForDebuggerToAttach,      // WAITING        TS_WAIT      blocked waiting for debugger to attach
  kWaitingInMainDebuggerLoop,       // WAITING        TS_WAIT      blocking/reading/processing debugger events
  kWaitingForDebuggerSuspension,    // WAITING        TS_WAIT      waiting for debugger suspend all
  kWaitingForJniOnLoad,             // WAITING        TS_WAIT      waiting for execution of dlopen and JNI on load code
  kWaitingForSignalCatcherOutput,   // WAITING        TS_WAIT      waiting for signal catcher IO to complete
  kWaitingInMainSignalCatcherLoop,  // WAITING        TS_WAIT      blocking/reading/processing signals
  kWaitingForDeoptimization,        // WAITING        TS_WAIT      waiting for deoptimization suspend all
  kWaitingForMethodTracingStart,    // WAITING        TS_WAIT      waiting for method tracing to start
  kWaitingForVisitObjects,          // WAITING        TS_WAIT      waiting for visiting objects
  kWaitingForGetObjectsAllocated,   // WAITING        TS_WAIT      waiting for getting the number of allocated objects
  kWaitingWeakGcRootRead,           // WAITING        TS_WAIT      waiting on the GC to read a weak root
  kWaitingForGcThreadFlip,          // WAITING        TS_WAIT      waiting on the GC thread flip (CC collector) to finish
  kStarting,                        // NEW            TS_WAIT      native thread started, not yet ready to run managed code
  kNative,                          // RUNNABLE       TS_RUNNING   running in a JNI native method
  kSuspended,                       // RUNNABLE       TS_RUNNING   suspended by GC or debugger
};

tls32_ = {
    state_and_flags = {
      as_struct = {
        flags = 5, 
        state = 67
      }, 
      as_atomic_int = {
        > = {
          > = {
            > = {
              __a_ = 4390917
            }, }, }, }, 
      as_int = 4390917
    }, 
    suspend_count = 1, 
    debug_suspend_count = 0, 
    thin_lock_thread_id = 20, 
    tid = 3093, 
    daemon = 0, 
    throwing_OutOfMemoryError = 0, 
    no_thread_suspension = 0, 
    thread_exit_check_count = 0, 
    handling_signal_ = 0, 
    suspended_at_suspend_check = 0, 
    ready_for_debug_invoke = 0, 
    debug_method_entry_ = 0, 
    is_gc_marking = 0, 
    weak_ref_access_enabled = 1, 
    disable_thread_flip_count = 0
  }
(gdb) t 176
[Switching to thread 176 (LWP 3093)]
#0  syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
41  bionic/libc/arch-arm64/bionic/syscall.S: 没有那个文件或目录.
(gdb) bt
#0  syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
#1  0x000000796226eb84 in __futex (op=, value=, timeout=0x0, bitset=-1, ftx=) at bionic/libc/private/bionic_futex.h:48
#2  __futex_wait_ex (value=, ftx=, shared=, use_realtime_clock=, abs_timeout=) at bionic/libc/private/bionic_futex.h:70
#3  __pthread_normal_mutex_lock (abs_timeout_or_null=, mutex=, shared=, use_realtime_clock=) at bionic/libc/bionic/pthread_mutex.cpp:327
#4  __pthread_mutex_lock_with_timeout (mutex=, use_realtime_clock=, abs_timeout_or_null=) at bionic/libc/bionic/pthread_mutex.cpp:430
#5  0x0000007961ad0354 in android::android_content_AssetManager_applyStyle (env=0x795127c740, themeToken=520810520368, defStyleAttr=, defStyleRes=16974731, xmlParserToken=1982366608, attrs=0x795f0bdcb4 , outValues=0x795f197c50 , outIndices=0x7942b9dee0, clazz=) at frameworks/base/core/jni/android_util_AssetManager.cpp:1430
#6  0x00000000748f3ecc in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

从这个堆栈来看它已经进入了JNI函数,按理来说它应该是kNative状态才对,但是这里却为kRunnable状态,有点奇怪,查看进入Jni函数的代码:

extern uint32_t JniMethodStart(Thread* self) {
  JNIEnvExt* env = self->GetJniEnv();
  DCHECK(env != nullptr);
  uint32_t saved_local_ref_cookie = env->local_ref_cookie;
  env->local_ref_cookie = env->locals.GetSegmentState();
  ArtMethod* native_method = *self->GetManagedStack()->GetTopQuickFrame();
  if (!native_method->IsFastNative()) { //如果这个Jni方法不是fast native方法,就改为suspend状态
    // When not fast JNI we transition out of runnable.
    self->TransitionFromRunnableToSuspended(kNative);
  }
  return saved_local_ref_cookie;
}

所以如果这个Native方法是fast native方法的话,那么它的状态就还是kRunnable,我们看android_content_AssetManager_applyStyle这个Jni函数注册的地方:

{ "applyStyle","!(JIIJ[I[I[I)Z",(void*) android_content_AssetManager_applyStyle }

注册的时候有加!号,所以这个函数的确是一个fast native方法,所以它的状态就是kRunnable,fast native方法应该是指能够很快返回的jni方法,所以可以不用转换状态,本来是一种优化措施,但是从上面的堆栈来看,这个fast native方法却在等锁,一旦等锁的话,就可能不是那么快执行完了,所以觉得这里把它置为fast native不是那么合适,而应该去掉前面的 !号,这样就可以在进入JNI之后变为kNative状态,ART也不会卡死.

你可能感兴趣的:(Android Stability - gdb和coredump)