iOS获取任意线程调用堆栈信息

场景需求

线上app运行过程中有内存突变、卡顿、cpu飙升、crash等情况,需要获取发生这些情况时的所有堆栈信息,以此来辅助定位问题

1. callStackSymbols

只能获取当前堆栈信息,不能获取指定其他线程的信息,所以不满足要求

 [NSThread callStackSymbols];
 
 0   LXDAppFluecyMonitor                 0x0000000102a30699 -[ViewController tableView:didSelectRowAtIndexPath:] + 89,
1   UIKitCore                           0x0000000116721902 -[UITableView _selectRowAtIndexPath:animated:scrollPosition:notifyDelegate:isCellMultiSelect:deselectPrevious:] + 1962,
2   UIKitCore                           0x000000011672113d -[UITableView _selectRowAtIndexPath:animated:scrollPosition:notifyDelegate:] + 94,
3   UIKitCore                           0x0000000116721bcb -[UITableView _userSelectRowAtPendingSelectionIndexPath:] + 341,
4   UIKitCore                           0x0000000116a322d5 -[_UIAfterCACommitBlock run] + 54,
5   UIKitCore                           0x0000000116a327cd -[_UIAfterCACommitQueue flush] + 190,
6   libdispatch.dylib                   0x000000010c7d6816 _dispatch_call_block_and_release + 12,
7   libdispatch.dylib                   0x000000010c7d7a5b _dispatch_client_callout + 8,
8   libdispatch.dylib                   0x000000010c7e6325 _dispatch_main_queue_drain + 1169,
9   libdispatch.dylib                   0x000000010c7e5e86 _dispatch_main_queue_callback_4CF + 31,
10  CoreFoundation                      0x000000010b5d6261 __CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE__ + 9,

2. Mach Thread

思路

  1. 通过内核API获取所有线程列表
  2. 遍历每个pthread_t,获取线程上下文信息_STRUCT_MCONTEXT
  3. 通过context获得栈帧指针,然后不断调用previous获得当前线程的所有调用堆栈
  4. 通过栈帧指针获得函数调用地址
  5. 通过_dyld_image相关API遍历所有image镜像
  6. 找到load commands的LC_SEGMENT(__TEXT)中包含函数地址的镜像
  7. 获取ASLR,然后找到函数地址在符号表中对应的位置
  8. 然后去字符表中查找函数名字

获取堆栈函数调用地址

  1. 所有线程:调用内核API函数task_threads获取指定task线程列表,即list

    thread_act_array_t list;
    mach_msg_type_number_t count;
    task_threads(mach_task_self(), &list, &count);
    
  2. 指定线程:调用API函数pthread_from_mach_thread_np获得对应线程pthread_t,非UI线程比较name

    for (int idx = 0; idx < count; idx++) {
        pthread_t pt = pthread_from_mach_thread_np(list[idx]);
        if ([nsthread isMainThread] && list[idx] == main_thread_id) { return list[idx]; }
        if (pt) {
            name[0] = '\0';
            pthread_getname_np(pt, name, sizeof(name));
            if (!strcmp(name, [nsthread name].UTF8String)) {
                [nsthread setName: originName];
                return list[idx];
            }
        }
    }
    
  3. 线程信息:调用thread_get_state获得指定线程上下问信息_STRUCT_MCONTEXT。thread_get_stateAPI两个参数随着cpu架构不同而改变。_STRUCT_MCONTEXT结构存储当前线程栈顶指针(stack pointer)和最顶部的栈帧指针(frame pointer),从而获得整个线程的调用栈。
    thread_get_state传入thread,_STRUCT_MCONTEXT->__ss(寄存器指针结构体),以及cpu相关常量(target_act,old_stateCnt),来实现_STRUCT_MCONTEXT赋值

    bool lxd_fillThreadStateIntoMachineContext(thread_t thread, _STRUCT_MCONTEXT * machineContext) {
        mach_msg_type_number_t state_count = LXD_THREAD_STATE_COUNT;
        kern_return_t kr = thread_get_state(thread, LXD_THREAD_STATE, (thread_state_t)&machineContext->__ss, &state_count);
        return (kr == KERN_SUCCESS);
    }
    
  4. 栈帧结构体赋值vm_read_overwrite

    1. 栈帧结构体
      typedef struct StackFrameEntry{
          const struct StackFrameEntry *const previous;  //前一个栈帧地址
          const uintptr_t return_address;  //栈帧的函数返回地址
      } StackFrameEntry;
      
    2. 通过上一步获取的machineContext获取第一个栈帧指针
       lxd_mach_copyMem((void *)machineContext->__ss.LXD_FRAME_POINTER, &frame, sizeof(frame))
      
      //参数src:栈帧指针
      //参数dst:StackFrameEntry实例指针
      //参数numBytes:StackFrameEntry结构体大小
      kern_return_t lxd_mach_copyMem(const void * src, const void * dst, const size_t numBytes) {
          vm_size_t bytesCopied = 0;
          //   调用api函数,根据栈帧指针获取该栈帧对应的函数地址
          return vm_read_overwrite(mach_task_self(), (vm_address_t)src, (vm_size_t)numBytes, (vm_address_t)dst, &bytesCopied);
      }
      
      打印frame
      Printing description of frame:
      (LXDStackFrameEntry) frame = {
        previous = 0x0000000109f6cb68
        return_address = 11598032417672659023
      }
      
    3. 通过frame.previous获取前一个栈帧地址,不断遍历,获得当前线程所有函数调用的地址
      //循环遍历,停止条件MAX_FRAME_NUMBER栈帧个数
      for (; idx < MAX_FRAME_NUMBER; idx++) {
          //栈帧函数赋值
          backtraceBuffer[idx] = frame.return_address;
          if (backtraceBuffer[idx] == FAILED_UINT_PTR_ADDRESS ||
              frame.previous == NULL ||
              //根据当前的栈帧的previous,获取前一个栈帧地址
              lxd_mach_copyMem(frame.previous, &frame, sizeof(frame)) != KERN_SUCCESS) {
              break;
          }
      }
      

获得堆栈调用函数名

关于Mach-O的相关知识可以看这篇文章:https://www.coderzhou.com/2019/06/05/fishhook/#Mach-O
源码参考:https://github.com/bestswifter/BSBacktraceLogger

  1. 创建一个和上面backtraceBuffer长度一样的Dl_info数组
    Dl_info symbolicated[backtraceLength];
    
  2. 逐个遍历backtraceBuffer,获取对应的符号信息添加到symbolicated中
  3. 找到栈帧地址对应的image镜像
    • 遍历镜像,通过_dyld_get_image_vmaddr_slide获取ASLR偏移地址,计算出调用函数栈帧地址在mach-O文件中的地址
    • 遍历mach-o的load commands找到LC_SEGMENT
    • 计算调用函数在mach-o中的地址是否包含在LC_SEGMENT段中
    • 返回镜像idx
    uint32_t lxd_imageIndexContainingAddress(const uintptr_t address) {
        const uint32_t imageCount = _dyld_image_count();
        const struct mach_header * header = FAILED_UINT_PTR_ADDRESS;
        
        for (uint32_t iImg = 0; iImg < imageCount; iImg++) {
            header = _dyld_get_image_header(iImg);
            if (header != NULL) {
    //           ASLR: _dyld_get_image_vmaddr_slide获取偏移slide
                uintptr_t addressWSlide = address - (uintptr_t)_dyld_get_image_vmaddr_slide(iImg);
                uintptr_t cmdPtr = lxd_firstCmdAfterHeader(header);
                if (cmdPtr == FAILED_UINT_PTR_ADDRESS) { continue; }
                
                for (uint32_t iCmd = 0; iCmd < header->ncmds; iCmd++) {
                    const struct load_command * loadCmd = (struct load_command *)cmdPtr;
                    if (loadCmd->cmd == LC_SEGMENT) {
                        const struct segment_command * segCmd = (struct segment_command *)cmdPtr;
                        if (addressWSlide >= segCmd->vmaddr &&
                            addressWSlide < segCmd->vmaddr + segCmd->vmsize) {
                            return iImg;
                        }
                    } else if (loadCmd->cmd == LC_SEGMENT_64) {
                        const struct segment_command_64 * segCmd = (struct segment_command_64 *)cmdPtr;
                        if (addressWSlide >= segCmd->vmaddr &&
                            addressWSlide < segCmd->vmaddr + segCmd->vmsize) {
                            
                            char *image_name = (char *)_dyld_get_image_name(iImg);
                            const struct mach_header *mh = _dyld_get_image_header(iImg);
                            intptr_t vmaddr_slide = _dyld_get_image_vmaddr_slide(iImg);
                         
                            printf("Image name %s at address 0x%llx and ASLR slide 0x%lx.\n",
                                   image_name, (mach_vm_address_t)mh, vmaddr_slide);
                            return iImg;
                        }
                    }
                    cmdPtr += loadCmd->cmdsize;
                }
            }
        }
        return UINT_MAX;
    }
    
    image.png

    用MachOView查看,和上面获取的数据是一致的


    image.png

    打印出segCmd的虚拟内存结束的地址,判断函数虚拟内存地址是否在当前段中


    image.png
  4. 找到对应镜像中load commands的起始段地址,这里正好是代码段__TEXT
    uintptr_t lxd_segmentBaseOfImageIndex(const uint32_t idx) {
        const struct mach_header * header = _dyld_get_image_header(idx);
        
        uintptr_t cmdPtr = lxd_firstCmdAfterHeader(header);
        if (cmdPtr == FAILED_UINT_PTR_ADDRESS) { return FAILED_UINT_PTR_ADDRESS; }
        for (uint32_t idx = 0; idx < header->ncmds; idx++) {
            const struct load_command * loadCmd = (struct load_command *)cmdPtr;
            if (loadCmd->cmd == LC_SEGMENT) {
                const struct segment_command * segCmd = (struct segment_command *)cmdPtr;
                if (strcmp(segCmd->segname, SEG_LINKEDIT) == 0) {
                    return segCmd->vmaddr - segCmd->fileoff;
                }
            } else if (loadCmd->cmd == LC_SEGMENT_64) {
                const struct segment_command_64 * segCmd = (struct segment_command_64 *)cmdPtr;
                if (strcmp(segCmd->segname, SEG_LINKEDIT) == 0) {
                    return segCmd->vmaddr - segCmd->fileoff;
                }
            }
            cmdPtr += loadCmd->cmdsize;
        }
        return FAILED_UINT_PTR_ADDRESS;
    }
    
  5. 遍历load commands,找到LC_SYMTAB,里面包含了符号表和字符串表的偏移信息
    struct symtab_command {
        uint32_t    cmd;        /* LC_SYMTAB */
        uint32_t    cmdsize;    /* sizeof(struct symtab_command) */
        uint32_t    symoff;     /* 表示符号表的偏移 */
        uint32_t    nsyms;      /* 符号表条目的个数 */
        uint32_t    stroff;     /* 字符串表在文件中的偏移 */
        uint32_t    strsize;    /* 字符串表的大小 */
    };
    
    image.png
  6. 遍历符号表,找到函数地址对应的符号表条目所在的地址
    符号表单条目结构体
    struct nlist_64 {
        union {
            uint32_t  n_strx; /* index into the string table */
        } n_un;
        uint8_t n_type;        /* type flag, see below */
        uint8_t n_sect;        /* section number or NO_SECT */
        uint16_t n_desc;       /* see  */
        uint64_t n_value;      /* value of this symbol (or stab offset) */
    };
    
  7. 通过上一步获取的符号表数据,获得函数符号在字符串表中的偏移量,然后获得对应的字符串
    if (loadCmd->cmd == LC_SYMTAB) {
        //LC_SYMTAB 是符号表和字符串表的偏移信息
        const struct symtab_command * symtabCmd = (struct symtab_command *)cmdPtr;
        //符号表在内存中的地址(包含偏移) symoff符号表的偏移
        const LXD_NLIST * symbolTable = (LXD_NLIST *)(segmentBase + symtabCmd->symoff);
        //字符串表在内存中的地址(包含偏移) stroff字符串表在文件中的偏移
        const uintptr_t stringTable = segmentBase + symtabCmd->stroff;
        //nsyms符号表条目的个数
        for (uint32_t iSym = 0; iSym < symtabCmd->nsyms; iSym++) {
            if (symbolTable[iSym].n_value == FAILED_UINT_PTR_ADDRESS) { continue; }
            //符号表每一项开始地址
            uintptr_t symbolBase = symbolTable[iSym].n_value;
            //函数地址在符号表的偏移
            uintptr_t currentDistance = addressWithSlide - symbolBase;
            if ( (addressWithSlide >= symbolBase && currentDistance <= bestDistance) ) {
                bestMatch = symbolTable + iSym;
                bestDistance = currentDistance;
            }
        }
        if (bestMatch != NULL) {
            info->dli_saddr = (void *)(bestMatch->n_value + imageVMAddressSlide);
            //n_un.n_strx 表示符号名在字符串表中的偏移量,用于表示函数名
            info->dli_sname = (char *)((intptr_t)stringTable + (intptr_t)bestMatch->n_un.n_strx);
            NSLog(@"%s",info->dli_sname);
            if (*info->dli_sname == '_') {
                info->dli_sname++;
            }
            if (info->dli_saddr == info->dli_fbase && bestMatch->n_type == 3) {
                info->dli_sname = NULL;
            }
            break;
        }
    }
    
    函数调用地址在符号表中对应的位置0x00000001000048a0
    image.png

    MachOView中查看
    image.png

    ASLR地址是0x0000000004f31000
    函数调用字符在字符串表中的地址0x0000000104f40940
    去掉偏移量的地址:0x0000000104f40940 - 0x0000000004f31000 = 0x000000010000F940
    image.png

    在MachOView中查看
    image.png

    打印信息
    image.png

参考文章
https://www.jianshu.com/p/df5b08330afd
https://www.jianshu.com/p/8b78bbbcaf89
https://blog.csdn.net/jasonblog/article/details/49909209
https://elliotsomething.github.io/2017/06/28/thread%E5%AD%A6%E4%B9%A0/

你可能感兴趣的:(iOS获取任意线程调用堆栈信息)