硬件引起的crash问题分析

问题现象

最近处理客户反馈一个native crash问题.

  • 复现步骤
    • 手机(Android 6.0)
    • com.qihoo.browser应用低概率出现native crash

定位分析

  • 相关log
    • tombstone
      Revision: '0'
      ABI: 'arm'
      pid: 13055, tid: 13117, name: ss  >>> com.qihoo.browser:loader0 <<<
      signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xffffffc8
          r0 abcbcc40  r1 ffffffc8  r2 00000008  r3 b4d38c00
          r4 afa6b000  r5 b00ff820  r6 00000001  r7 00000000
          r8 afa6b000  r9 b4dac800  sl b4d47420  fp b4d3f384
          ip ab528018  sp a2057718  lr b4c27037  pc ffffffc8  cpsr 800f0010
          d0  0000000000000000  d1  0000000000000000
          d2  0000000000000000  d3  0000000000000000
          d4  0000000000000000  d5  0000000000000000
          d6  0000000000000000  d7  a205739400000000
          d8  0000000000000000  d9  0000000000000000
          d10 0000000000000000  d11 0000000000000000
          d12 0000000000000000  d13 0000000000000000
          d14 0000000000000000  d15 0000000000000000
          d16 0000000000000000  d17 ab535000abf271d4
          d18 abf2700cafa5ae00  d19 0000500000005000
          d20 0000000000000000  d21 0000000000000000
          d22 0000000000000000  d23 0000000000001000
          d24 0000010200000102  d25 0000000000000102
          d26 0000000000001000  d27 0000015e816a8ca5
          d28 0000000000000001  d29 0000000000000001
          d30 0000000000000002  d31 00000000003d0909
          scr 80000010
      
      backtrace:
          #00 pc ffffffc8  
          #01 pc 00338035  /system/lib/libart.so (_ZN3art6ThreadD1Ev+228)
          #02 pc 003478b9  /system/lib/libart.so (_ZN3art10ThreadList10UnregisterEPNS_6ThreadE+232)
          #03 pc 00340411  /system/lib/libart.so (_ZN3art6Thread14CreateCallbackEPv+600)
          #04 pc 0003f87b  /system/lib/libc.so (_ZL15__pthread_startPv+30)
          #05 pc 00019f95  /system/lib/libc.so (__start_thread+6)
      

初步分析

  • 反汇编相关so库
$ arm-linux-androideabi-addr2line -f -e /tim.zhang/tim_share/bugzilla/201709/749779/symbols/libart.so 00338035 003478b9 00340411
_ZN3art6ThreadD2Ev
/art/runtime/thread.cc:1449 (discriminator 1)
_ZN3art10ThreadList10UnregisterEPNS_6ThreadE
/art/runtime/thread_list.cc:1151 (discriminator 1)
_ZN3art6Thread14CreateCallbackEPv
/art/runtime/thread.cc:282 (discriminator 2)

  • 代码分析
    • #1相关的code

      Thread::~Thread() {
        ...
        delete wait_mutex_;
        ...
      }
      
    • #1对应的反汇编code

      ... ... 
      33802a:       f8d4 0454       ldr.w   r0, [r4, #1108] ; 0x454
      33802e:       b110            cbz     r0, 338036 <_ZN3art6ThreadD1Ev+0xe6>
      338030:       6803            ldr     r3, [r0, #0] 
      338032:       6919            ldr     r1, [r3, #16]
      338034:       4788            blx     r1  ==> crash
      ... ... 
      
    • 分析

      • wait_mutex_ = r0 = abcbcc40
        abcbcc40 b4d38c00 0000001d b4d177a0 65722e6b  .........w..k.re
        
        vptr = 0xb4d38c00
      • 查看vtable
        b4d38c00 00000000 b49e12bd b49e1391 ffffffc8  ................
        b4d38c10 ffffffc8 00000000 b49e1325 b49e13fd  ........%.......
        
        可看到,vtable中出现了多个非法函数指针.
        • 第1个虚函数art::Mutex::IsMutex变成了0x00000000
        • 第4和第5个虚函数art::Mutex::~Mutex()变成了0xffffffc8

对比实验

  • Android 6.0
    • wait_mutex_:
    (gdb) x /8xw 0xb7c3c2e0
    0xb7c3c2e0:     0xb4e6ac10      0x0000001d      0xb4e49404      0x0000fe01
    0xb7c3c2f0:     0x00000000      0x00000041      0x00000000      0x00000000
    
    • 查看vtable
    (gdb) x /8xw 0xb4e6ac10
    0xb4e6ac10 <_ZTVN3art5MutexE+8>:        0xb4b16965      0xb4b1314d      0xb4b1736d      0xb4b171ed
    0xb4e6ac20 <_ZTVN3art5MutexE+24>:       0xb4b18a45      0x00000000      0x00000000      0x00000000
    
  • Nexus 6p(Android 8.0)
    • 汇编code
      0x00000074da26e1b4 <+224>:   ldr     x0, [x19,#2432]
      0x00000074da26e1b8 <+228>:   cbz     x0, 0x74da26e1c8 
      0x00000074da26e1bc <+232>:   ldr     x8, [x0]
      0x00000074da26e1c0 <+236>:   ldr     x8, [x8,#48]
      0x00000074da26e1c4 <+240>:   blr     x8
      
    • wait_mutex_
      (gdb) x /4xg 0x00000074da521c00
      0x74da521c00:   0x00000074da3dc308      0x0000007400000024
      0x74da521c10:   0x00000074da361191      0x00000074da51f000
      
    • 查看vtable
      (gdb) x /32xg 0x00000074da3dc308
      0x74da3dc308 <_ZTVN3art5MutexE+16>:     0x00000074d9ee6068      0x00000074d9ee16bc
      0x74da3dc318 <_ZTVN3art5MutexE+32>:     0x00000074d9ee16bc      0x00000074d9ee3a58
      0x74da3dc328 <_ZTVN3art5MutexE+48>:     0x00000074d9ee3b58      0x00000074d9ee3004
      0x74da3dc338 <_ZTVN3art5MutexE+64>:     0x00000074d9ee323c      0x0000000000000000
      

Root Cause

0xb4d38c00 位于libart.so的虚拟地址空间的只读部分

b48fc000-b4d31fff r-x         0    436000  /system/lib/libart.so (BuildId: 0a4c3feb37d9d2e8f30f11df6163908d) (load base 0xd000)
b4d32000-b4d32fff ---         0      1000  
b4d33000-b4d3cfff r--    436000      a000  /system/lib/libart.so
b4d3d000-b4d3dfff rw-    440000      1000  /system/lib/libart.so

目前来看,软件内存篡改可能性不大,硬件(如memory)出问题可能性大.需要关注单个硬件或同一批次硬件是否出现内存相关问题.

你可能感兴趣的:(硬件引起的crash问题分析)