python自动化解析Android Native Crash问题

本文主要分析两方面的问题:

  • 1.如何使用python自动分析一个Native Crash问题,下面用NE来代替。
  • 2.如何分析一个NE问题。

通过我们遇到一个NE堆栈如下:

12-24 14:42:40.307 1047 12174 23304 F libc : Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x81 in tid 23304 (CAM_METADATA), pid 12174 ([email protected])
12-24 14:42:40.569 1047 23711 23711 F DEBUG : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
12-24 14:42:40.570 1047 23711 23711 F DEBUG : Build fingerprint: 'xiaomi/lavender/lavender:9/PKQ1.180904.001/8.12.24:user/release-keys'
12-24 14:42:40.570 1047 23711 23711 F DEBUG : Revision: '0'
12-24 14:42:40.570 1047 23711 23711 F DEBUG : ABI: 'arm'
12-24 14:42:40.570 1047 23711 23711 F DEBUG : pid: 12174, tid: 23304, name: CAM_METADATA >>> /vendor/bin/hw/[email protected] <<<
12-24 14:42:40.570 1047 23711 23711 F DEBUG : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x81
12-24 14:42:40.570 1047 23711 23711 F DEBUG : Cause: null pointer dereference
12-24 14:42:40.570 1047 23711 23711 F DEBUG : r0 00000065 r1 00000000 r2 00000001 r3 00000000
12-24 14:42:40.570 1047 23711 23711 F DEBUG : r4 e4ef8c80 r5 e4ef8c80 r6 d9ddb5e8 r7 d9ddb8e0
12-24 14:42:40.570 1047 23711 23711 F DEBUG : r8 f2cb4768 r9 00000000 r10 e4cc1170 r11 0003f5c6
12-24 14:42:40.570 1047 23711 23711 F DEBUG : ip f2cb525c sp d9ddb5e8 lr f2c6d28f pc f2c6d324
12-24 14:42:40.578 1047 23711 23711 F DEBUG : 
12-24 14:42:40.578 1047 23711 23711 F DEBUG : backtrace:
12-24 14:42:40.578 1047 23711 23711 F DEBUG : #00 pc **0010f324** /vendor/lib/hw/[camera.sdm660.so](http://camera.sdm660.so/) (qcamera::MIPreview::doAnalyzeProcess()+184)
12-24 14:42:40.578 1047 23711 23711 F DEBUG : #01 pc **0010f76b** /vendor/lib/hw/[camera.sdm660.so](http://camera.sdm660.so/) (qcamera::MIPreview::analyzeprocess()+58)
12-24 14:42:40.578 1047 23711 23711 F DEBUG : #02 pc **0009ab63** /vendor/lib/hw/[camera.sdm660.so](http://camera.sdm660.so/)(qcamera::QCamera3MetadataChannel::streamCbRoutine(mm_camera_super_buf_t*, qcamera::QCamera3Stream*)+134)
12-24 14:42:40.578 1047 23711 23711 F DEBUG : #03 pc **00095df1** /vendor/lib/hw/[camera.sdm660.so](http://camera.sdm660.so/) (qcamera::QCamera3Stream::dataProcRoutine(void*)+168)
12-24 14:42:40.578 1047 23711 23711 F DEBUG : #04 pc 00071821 /system/lib/[libc.so](http://libc.so/) (__pthread_start(void*)+22)
12-24 14:42:40.578 1047 23711 23711 F DEBUG : #05 pc 0001e025 /system/lib/[libc.so](http://libc.so/) (__start_thread+24)

12-24 14:42:40.578 1047 23711 23711 F DEBUG : backtrace:
12-24 14:42:40.578 1047 23711 23711 F DEBUG : #00 pc 0010f324 /vendor/lib/hw/camera.sdm660.so (qcamera::MIPreview::doAnalyzeProcess()+184)
12-24 14:42:40.578 1047 23711 23711 F DEBUG : #01 pc 0010f76b /vendor/lib/hw/camera.sdm660.so (qcamera::MIPreview::analyzeprocess()+58)
12-24 14:42:40.578 1047 23711 23711 F DEBUG : #02 pc 0009ab63 /vendor/lib/hw/camera.sdm660.so(qcamera::QCamera3MetadataChannel::streamCbRoutine(mm_camera_super_buf_t, qcamera::QCamera3Stream)+134)
12-24 14:42:40.578 1047 23711 23711 F DEBUG : #03 pc 00095df1 /vendor/lib/hw/camera.sdm660.so (qcamera::QCamera3Stream::dataProcRoutine(void)+168)
12-24 14:42:40.578 1047 23711 23711 F DEBUG : #04 pc 00071821 /system/lib/libc.so (__pthread_start(void
)+22)
12-24 14:42:40.578 1047 23711 23711 F DEBUG : #05 pc 0001e025 /system/lib/libc.so (__start_thread+24)

上面加粗的是堆栈的地址,我们需要使用addr2line把地址解析成对应的函数具体行数,这样才能更好地解决问题。
解析NE堆栈,主要关注两方面:

  • 1.pc地址,对应上面加粗的部分
  • 2.so,例如上面的camera.sdm660.so就是我们需要解析的so

1.开发自动化解析NE问题工具

下面先直接放下工具源码了。
native_crash.py

# -*- coding:UTF-8 -*-
# Author : [email protected]
# Date : 25/12/18

import sys
import os

# 将自己本地的addr2line路径写进来
ADDR2LINE = '~/xiaomi/Developer/android_ndk/android-ndk-r17b/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin/arm-linux-androideabi-addr2line'
SYMBOLS = '~/xiaomi/Developer/tools/camera.sdm660.so'

# main函数, 接收输入参数
argv = sys.argv[1]
file_object = open(argv, 'rU')
stack = ''
try:
    for line in file_object:
        tempStr = line.strip('\n')
        start = tempStr.find('pc')
        tempStr = tempStr[start+3:]
        end = tempStr.find(' ')
        stack += tempStr[:end]+' '
finally:
    file_object.close()
stack = stack[:-1]
print stack
result = os.popen(ADDR2LINE +' -f -e ' + SYMBOLS + ' ' + stack).read()
print result

将上面的NE堆栈写入本地一个文件中:
native_exception.txt

12-24 14:42:40.578 1047 23711 23711 F DEBUG : #00 pc **0010f324**/vendor/lib/hw/[camera.sdm660.so](http://camera.sdm660.so/) (qcamera::MIPreview::doAnalyzeProcess()+184)
12-24 14:42:40.578 1047 23711 23711 F DEBUG : #01 pc **0010f76b**/vendor/lib/hw/[camera.sdm660.so](http://camera.sdm660.so/) (qcamera::MIPreview::analyzeprocess()+58)
12-24 14:42:40.578 1047 23711 23711 F DEBUG : #02 pc **0009ab63**/vendor/lib/hw/[camera.sdm660.so](http://camera.sdm660.so/)(qcamera::QCamera3MetadataChannel::streamCbRoutine(mm_camera_super_buf_t*, qcamera::QCamera3Stream*)+134)
12-24 14:42:40.578 1047 23711 23711 F DEBUG : #03 pc **00095df1**/vendor/lib/hw/[camera.sdm660.so](http://camera.sdm660.so/) (qcamera::QCamera3Stream::dataProcRoutine(void*)+168)*

执行语句如下:
python native_crash.py native_exception.txt
输出的结果如下:

0010f324 0010f76b 0009ab63 00095df1
_ZN7qcamera9MIPreview16doAnalyzeProcessEv
hardware/qcom/camera/QCamera2/algo/xiaomiAgeGender/previewprocess.cpp:1049
_ZN7qcamera9MIPreview14analyzeprocessEv
hardware/qcom/camera/QCamera2/algo/xiaomiAgeGender/previewprocess.cpp:1145
_ZN7qcamera23QCamera3MetadataChannel15streamCbRoutineEP21mm_camera_super_buf_tPNS_14QCamera3StreamE
hardware/qcom/camera/QCamera2/HAL3/QCamera3Channel.cpp:2803
_ZN7qcamera14QCamera3Stream15dataProcRoutineEPv
hardware/qcom/camera/QCamera2/HAL3/QCamera3Stream.cpp:752


1.1 UPDATE版本

本地优化了一下,可以支持解析多个so的堆栈,但是有一个要求,当前需要解析的so要在当前目录下。
现在贴上新的源码:
native_crash.py

# -*- coding:UTF-8 -*-
# Author : [email protected]
# Date : 25/12/18

import sys
import os

# 将自己本地的addr2line路径写进来
ADDR2LINE = '~/xiaomi/Developer/android_ndk/android-ndk-r17b/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin/arm-linux-androideabi-addr2line'

# main函数, 接收输入参数
argv = sys.argv[1]
file_object = open(argv, 'rU')
result = ''
try:
    for line in file_object:
        stack = ''
        so_name = ''
        tempStr = line.strip('\n')
        start = tempStr.find('pc')
        tempStr = tempStr[start+3:]
        end = tempStr.rfind(' ')
        tempStr = tempStr[:end]
        ## 找到具体的pc地址
        stack = tempStr[:tempStr.find(' ')]
        ## 找到so的名称,要求必须在当前目录下
        so_name = tempStr[tempStr.rfind('/') + 1:]
        result += os.popen(ADDR2LINE +' -f -e ' + so_name + ' ' + stack).read()
finally:
    file_object.close()
print result

本地的native_exception.txt如下:

#07 pc 000184d8  /vendor/lib/libmmcamera2_mct.so (mct_list_find_custom_branch+32)
 #08 pc 0001852c  /vendor/lib/libmmcamera2_mct.so (mct_list_find_custom_branch+116)
 #09 pc 0001ae04  /vendor/lib/libmmcamera2_mct.so (mct_pipeline_get_buffer+244)
 #10 pc 0002a99c  /vendor/lib/libmmcamera2_mct.so (mct_module_get_buffer+132)
 #11 pc 0009268c  /vendor/lib/libmmcamera2_imglib_modules.so (module_imgbase_client_fetch_outbuf+424)
 #12 pc 00093488  /vendor/lib/libmmcamera2_imglib_modules.so (module_imgbase_client_get_outputbuf+2108)
 #13 pc 00095c98  /vendor/lib/libmmcamera2_imglib_modules.so (module_imgbase_client_handle_buffer+8988)
 #14 pc 0009f690  /vendor/lib/libmmcamera2_imglib_modules.so (module_imgbase_port_event_func+13612)
 #15 pc 0000c240  /vendor/lib/libmmcamera2_cpp_module.so (cpp_module_send_event_downstream+300)
 #16 pc 00052588  /vendor/lib/libmmcamera2_cpp_module.so (cpp_thread_func+2184)​

解析的命令如下:
python native_crash.py native_exception.txt
结果如下:

jeffmony@jeffmony-OptiPlex-7050:~/xiaomi/Developer/tools$ python native_crash.py native_exception.txt 
mct_list_find_custom_branch
vendor/qcom/proprietary/mm-camera/mm-camera2/media-controller/mct/tools/mct_list.c:358
mct_list_find_custom_branch
vendor/qcom/proprietary/mm-camera/mm-camera2/media-controller/mct/tools/mct_list.c:366
mct_list_find_custom
vendor/qcom/proprietary/mm-camera/mm-camera2/media-controller/mct/tools/mct_list.c:395
mct_module_get_buffer
vendor/qcom/proprietary/mm-camera/mm-camera2/media-controller/mct/module/mct_module.c:395
module_imgbase_client_fetch_outbuf
vendor/qcom/proprietary/mm-camera/mm-camera2/media-controller/modules/imglib/modules/base/module_imgbase_client.c:1578
module_imgbase_client_get_outputbuf
vendor/qcom/proprietary/mm-camera/mm-camera2/media-controller/modules/imglib/modules/base/module_imgbase_client.c:1832
module_imgbase_client_handle_output_buf
vendor/qcom/proprietary/mm-camera/mm-camera2/media-controller/modules/imglib/modules/base/module_imgbase_client.c:2490
module_imgbase_port_event_func
vendor/qcom/proprietary/mm-camera/mm-camera2/media-controller/modules/imglib/modules/base/module_imgbase.c:1805
cpp_module_send_event_downstream
vendor/qcom/proprietary/mm-camera/mm-camera2/media-controller/modules/pproc-new/cpp/cpp_module.c:895
cpp_thread_send_processed_divert
vendor/qcom/proprietary/mm-camera/mm-camera2/media-controller/modules/pproc-new/cpp/cpp_thread.c:1107


1.2 UPDATE版本

本地优化了一下,可以解决部分异常的堆栈,兼容性更加好了。

# -*- coding:UTF-8 -*-
# Author : [email protected]
# Date : 25/12/18

import sys
import os
import re

# 将自己本地的addr2line路径写进来
ADDR2LINE = '~/xiaomi/Developer/android_ndk/android-ndk-r17b/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin/arm-linux-androideabi-addr2line'

# main函数, 接收输入参数
argv = sys.argv[1]
file_object = open(argv, 'rU')
result = ''
try:
    for line in file_object:
        stack = ''
        so_name = ''
        tempStr = line.strip('\n')
        start = tempStr.find('pc')
        tempStr = tempStr[start+3:]
        tempStr = re.sub(' +',' ', tempStr)
        end = tempStr.find(' ')
        ## 找到具体的pc地址
        stack = tempStr[:end]
        tempStr = tempStr[end+1:]
        end = tempStr.find(' ');
        if end != -1:
            tempStr = tempStr[:end]
        ## 找到so的名称,要求必须在当前目录下
        so_name = tempStr[tempStr.rfind('/') + 1:]
        # print stack, so_name
        result += os.popen(ADDR2LINE +' -f -e ' + so_name + ' ' + stack).read()
finally:
    file_object.close()
print result

解析的堆栈例子如下:

   #00 pc 0001ceb2  /system/lib/libc.so (abort+62)
    #01 pc 0007c40f  /vendor/lib/libmibokeh_712.so
    #02 pc 0007c7e3  /vendor/lib/libmibokeh_712.so
    #03 pc 0007c807  /vendor/lib/libmibokeh_712.so
    #04 pc 0005fb25  /vendor/lib/libmibokeh_712.so
    #05 pc 0005f935  /vendor/lib/libmibokeh_712.so
    #06 pc 000499d9  /vendor/lib/libmibokeh_712.so (bokeh::MiBokehImpl::InitCLUT(bokeh::BokehConf const&)+372)
    #07 pc 0004701d  /vendor/lib/libmibokeh_712.so (bokeh::MiBokehImpl::Init(bokeh::BokehConf const&, char const*)+716)
    #08 pc 00049c8d  /vendor/lib/libmibokeh_712.so (bokeh::MiBokehImpl::UpdateSizeIn(int, int, int, int, int, bool)+132)
    #09 pc 00004285  /vendor/lib/camera/components/com.xiaomi.node.misegment.so (MiSegmentMiui::process(MiImageBuffer*, MiImageBuffer*, unsigned int, int, int, long long)+244)
    #10 pc 0000321d  /vendor/lib/camera/components/com.xiaomi.node.misegment.so (ChiMiSegmentNode::ProcessRequest(ChiNodeProcessRequestInfo*)+3052)
    #11 pc 000025cd  /vendor/lib/camera/components/com.xiaomi.node.misegment.so (MiSegmentNodeProcRequest(ChiNodeProcessRequestInfo*)+24)
    #12 pc 00228cf9  /vendor/lib/hw/camera.qcom.so (CamX::ChiNodeWrapper::ExecuteProcessRequest(CamX::ExecuteProcessRequestData*)+1592)
    #13 pc 001ecdd9  /vendor/lib/hw/camera.qcom.so (CamX::Node::ProcessRequest(CamX::NodeProcessRequestData*, unsigned long long)+3440)
    #14 pc 0020d3a9  /vendor/lib/hw/camera.qcom.so (CamX::DeferredRequestQueue::DeferredWorkerCore(CamX::ThreadDataUnit*)+216)
    #15 pc 0020d1e3  /vendor/lib/hw/camera.qcom.so (CamX::DeferredRequestQueue::DeferredWorkerWrapper(void*)+10)
    #16 pc 0018f827  /vendor/lib/hw/camera.qcom.so (CamX::ThreadCore::ProcessJobQueue()+106)
    #17 pc 0018f02d  /vendor/lib/hw/camera.qcom.so (CamX::ThreadCore::DoWork()+156)
    #18 pc 0018ef89  /vendor/lib/hw/camera.qcom.so (CamX::ThreadCore::WorkerThreadBody(void*)+4)
    #19 pc 00063e05  /system/lib/libc.so (__pthread_start(void*)+22)
    #20 pc 0001e0a5  /system/lib/libc.so (__start_thread+22)​


2.完整分析一个NE问题

上面已经解析出来了NE堆栈,接下来找代码中对应的行数:


只知道是这行出现问题了,但是不知道是 m_InputMainBuf还是 m_InputMainBuf->frame_idx出现问题了。
这时候需要看一下tombstone文件了。


Build fingerprint: 'xiaomi/lavender/lavender:9/PKQ1.180904.001/8.12.24:user/release-keys'
Revision: '0'
ABI: 'arm'
pid: 751, tid: 8118, name: CAM_METADATA >>> /vendor/bin/hw/[email protected] <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x88
Cause: null pointer dereference
r0 0000006c r1 00000000 r2 00000001 r3 00000000
r4 e07e3000 r5 e07e3000 r6 caf835e8 r7 caf838e0
r8 e24a7768 r9 00000000 r10 e0785170 r11 0003f5c6
ip e24a825c sp caf835e8 lr e246028f pc e2460324

backtrace:

#00 pc 0010f324 /vendor/lib/hw/[camera.sdm660.so](http://camera.sdm660.so/) (qcamera::MIPreview::doAnalyzeProcess()+184)
#01 pc 0010f76b /vendor/lib/hw/[camera.sdm660.so](http://camera.sdm660.so/) (qcamera::MIPreview::analyzeprocess()+58)
#02 pc 0009ab63 /vendor/lib/hw/[camera.sdm660.so](http://camera.sdm660.so/) (qcamera::QCamera3MetadataChannel::streamCbRoutine(mm_camera_super_buf_t*, qcamera::QCamera3Stream*)+134)
#03 pc 00095df1 /vendor/lib/hw/[camera.sdm660.so](http://camera.sdm660.so/) (qcamera::QCamera3Stream::dataProcRoutine(void*)+168)
#04 pc 00071821 /system/lib/[libc.so](http://libc.so/) (__pthread_start(void*)+22)
#05 pc 0001e025 /system/lib/[libc.so](http://libc.so/) (__start_thread+24)

上面加粗的地方要注意一下,下面分析的时候都需要提到的,0000006c 表示初始地址,0x88表示偏移量。下面会接着分析。

下面使用gdb调试一下目前可能发生指针异常的地方:
jeffmony@jeffmony-OptiPlex-7050:~/下载/out$ ~/xiaomi/android_source/aosp/prebuilts/gdb/linux-x86/bin/gdb target/product/lavender/symbols/vendor/lib/hw/camera.sdm660.so
GNU gdb (GDB) 7.11
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from target/product/lavender/symbols/vendor/lib/hw/camera.sdm660.so...done.
(gdb) info functions doAnalyze
All functions matching regular expression "doAnalyze":

File hardware/qcom/camera/QCamera2/algo/xiaomiAgeGender/previewprocess.cpp:
void qcamera::MIPreview::doAnalyzeProcess();

Non-debugging symbols:
0x0006c928 qcamera::MIPreview::doAnalyzeProcess()@plt
(gdb) *p ((mm_camera_buf_def_t )(0)).frame_idx
Cannot access memory at address 0x1c

同时可以将本地的so解析为汇编语言,check一下内存细节:

~/xiaomi/android_source/aosp/prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.9/bin/arm-linux-androideabi-objdump -Sl target/product/lavender/symbols/vendor/lib/hw/camera.sdm660.so > target/product/lavender/symbols/vendor/lib/hw/camera.sdm660.so.asm

打开target/product/lavender/symbols/vendor/lib/hw/camera.sdm660.so.asm文件

hardware/qcom/camera/QCamera2/algo/xiaomiAgeGender/previewprocess.cpp:1049
10f31a: 6ce8 ldr r0, [r5, #76] ; 0x4c
10f31c: f8df b2d4 ldr.w fp, [pc, #724] ; 10f5f4 <_ZN7qcamera9MIPreview16doAnalyzeProcessEv+0x388>
10f320: f04f 0900 mov.w r9, #0
10f324: f8d0 801c ldr.w r8, [r0, #28]
10f328: 44fb add fp, pc

这里的28 实际上就是0x1c

解析到的m_pInputMainBuf的地址是0x1c(这是初始地址为0的情况下的地址),现在初始地址是 0x6c,偏移是0x1c,加起来是0x88,正好对应tombstones上出现问题的地址。

但是我们也要注意到初始地址是0x6c,并不是null,说明此时0x6c地址是有数据,可以访问,但是指向的偏移位置不能访问。

这时候需要我们验证一下,m_pInputMainBuf赋值是不是存在问题。现在沿着这个思路找一下,发现m_pInputMainBuf没有初始化,导致出现指针地址无法访问的问题。

你可能感兴趣的:(python自动化解析Android Native Crash问题)