利用runtime这个黑魔法可以hook Objective-c的方法。
如果有一个hook C语言函数的需求(比如NSLog()函数),那该如何完成?
Facebook开源的C函数库fishhook是一种解决方案。
fishhook is a very simple library that enables dynamically rebinding symbols in Mach-O binaries running on iOS in the simulator and on device.
以上摘自 fishhook README.md:fishhook是一个简单的动态重新绑定Mach-O二进制文件的符号。
多简单?整个库不到200行代码。却收获了3000多个star。不愧为神作!
fishhook的工作原理是解析Mach-O二进制文件,找到要重新绑定的符号所在的位置,替换掉符号要跳转的执行代码地址从而达到hook的目的。
所以,在解析fishhook原理之前,让我们先来了解一下Mach-O文件。
Mach-O文件
什么是Mach-O文件:Mach-O格式全称为Mach Object文件格式的缩写,是Mac/iOS上用于存储程序,库的标准文件。我们在开发中常见的 '.o'、'.a'、'.dSYM'、App包里的二进制执行文件、framework等都输入Mach-O文件的范畴。
利用命令:$ file mach-o文件 可以得到Mach-O文件的类型
//查看app包里的二进制执行文件类型:Mach-O文件,64位,可执行文件,arm64架构
Huangjb:Desktop mac$ file TestFishHook
TestFishHook: Mach-O 64-bit executable arm64
查看Mach-O文件最方便的方法就是用MachOView.app查看。
可以看出Mach-O文件主要有三部分:
- mach header
- load commands
- raw datas
mach header信息
在mach/loader.h总定义了mach header相关的结构体:
//32位 mach_header
struct mach_header {
uint32_t magic; /* mach magic number identifier */
cpu_type_t cputype; /* cpu specifier */
cpu_subtype_t cpusubtype; /* machine specifier */
uint32_t filetype; /* type of file */
uint32_t ncmds; /* number of load commands */
uint32_t sizeofcmds; /* the size of all the load commands */
uint32_t flags; /* flags */
};
***************** 参数解析 *****************
magic:魔数
有四种值:0xfeedface(MH_MAGIC:32位大端模式),0xcefaedfe(MH_CIGAM:32位小端模式)
0xfeedfacf(MH_MAGIC_64:64位大端模式),0xcffaedfe(MH_CIGAM_64:64小端模式)
cputype:cpu类型
cpusubtype:cpu子类型
filetype:mach-o文件类型,mach/loader.h中定义了文件类型对应的值
#define MH_OBJECT 0x1 /* relocatable object file */
#define MH_EXECUTE 0x2 /* demand paged executable file */
#define MH_FVMLIB 0x3 /* fixed VM shared library file */
#define MH_CORE 0x4 /* core file */
#define MH_PRELOAD 0x5 /* preloaded executable file */
......
ncmds:load commands的数量
sizeofcmds:load commands区域的大小
LoadCommands信息
LoadCommands位于mach header后面,为dyld加载二进制文件到内存中提供信息。LoadCommands会告诉dyld从mach-o文件某个偏移处,加载多大的数据,加载到虚拟内存哪个地址等信息。可以说,load commands 是整个mach-o文件的蓝图。
load commands有多种类型,不同类型的数据不一样。但是所有的类型都包含cmd和cmdsize两个信息,对应的结构体
struct load_command {
uint32_t cmd; /* 加载命令类型 */
uint32_t cmdsize; /* 加载命令大小 */
};
load commands类型:
- _PAGEZERO:空指针陷阱?
- _TEXT: 程序代码段
- _DATA: 程序数据段可读写区
- _RODATA:程序只读数据段
- _LINKEDIT:链接编辑器段
- LC_SYMTAB:符号表信息
- LC_DYSYMTAB:动态符号表信息
- LC_LOAD_DYLINKER:dyld的路径,一般是/usr/lib/dyld,内核根据这个位置加载dyld
- LC_UUID:生成的唯一标识符
- LC_SOURCE_VERSION:版本信息
- LC_MAIN:程序执行的主函数入口
- LC_ENCRYPTION_INFO:加密信息,crpyt id为1表示加壳
- LC_LOAD_DYLIB:加载的动态库信息
- LC_CODE_SIGNATURE:文件的签名信息
cmd类型为LC_SEGMENT的加载命令对应的结构体如下:
struct segment_command { /* for 32-bit architectures */
uint32_t cmd; /* cmd类型 */
uint32_t cmdsize; /* 大小 */
char segname[16]; /* segment name */
uint32_t vmaddr; /* 虚拟内存地址 */
uint32_t vmsize; /* 占的虚拟内存大小 */
uint32_t fileoff; /* 在mach-o文件的偏移 */
uint32_t filesize; /* 在mach-o所占大小 */
vm_prot_t maxprot; /* 段的页面所需要的最高内存保护 */
vm_prot_t initprot; /* 段页面初始化的内存保护 */
uint32_t nsects; /* 包含的section数量 */
uint32_t flags; /* flags */
};
_TEXT、 _DATA、 _RODATA又包含若干个section。
有几个section跟我们今天讨论的主题相关:
_TEXT : __stubs 用于动态库链接的桩
_TEXT : __stub_helper 用于动态库链接的桩的辅助
_DATA: __la_symbol_ptr 延迟加载符号指针表
_DATA: __nl_symbol_ptr 非延迟加载符号指针表
section对应的结构体:
struct section { /* for 32-bit architectures */
char sectname[16]; /* section名字 */
char segname[16]; /* section所在的segment名称 */
uint32_t addr; /* 虚拟内存地址 */
uint32_t size; /* section大小 */
uint32_t offset; /* 在mach-o文件的偏移 */
uint32_t align; /* 字节大小对齐 */
uint32_t reloff; /* 重定位入口的文件偏移 */
uint32_t nreloc; /* 需要重定位的入口数量 */
uint32_t flags; /* 包含section的type和attributes*/
uint32_t reserved1; /* reserved (for offset or index) */
uint32_t reserved2; /* reserved (for count or sizeof) */
};
fishhook原理
App调用外部库函数的时候,因为库函数是以动态库的形式存在于共享内存中,加上ASLR技术的影响,每次启动后内存地址都会有一个随机的偏移地址,所以在编译的时候并不能确定库函数的执行地址。
所以,为了能够准确的访问到外部库函数的,苹果采用了一种叫做PIC(位置代码独立)技术。
对于非懒加载符号表_nl_symbol_ptr,dyld会立刻马上去链接动态库
对于延迟加载符号表 _la_symbol_ptr,编译的时候会在Mach-O的数据段的_nl_symbol_ptr区保留一个指针(初始化全为0),用来指向外部函数。后期,dyld会动态的设置这个指针指向外部库函数的内存地址
对于延迟加载的外部库函数,当我们在App中运行中第一次去调用库函数时,借助于_stubs -> _la_symbol_ptr -> _stub_helper -> _nl_symbol_ptr的一系列跳转可以得到外部库函数的内存地址。随后dyld会把 _la_symbol_ptr直接指向外部库函数的内存地址(这也是 _la_symbol_ptr为什么叫做延迟加载区)
_nl_symbol_ptr和 _la_symbol_ptr位于Mach-O的DATA段。用户可以去修改这两个区的数据,把这两个区的指针指向我们自定义的函数指针,达到hook的目的。fishhook就是基于这个原理。
注意:对于程序内部的C函数,函数地址在编译的时候就已经确定在了_TEXT代码段中,所以不能够修改,这也是fishhook不能hook内部C函数的原因!
fishhook官方原理图
用法:(摘自github)
#import
#import
#import "AppDelegate.h"
#import "fishhook.h"
//指针用来接收原系统close函数指针
static int (*orig_close)(int);
//指针用来接收原系统open函数指针
static int (*orig_open)(const char *, int, ...);
//要替换系统close的函数
int my_close(int fd) {
printf("Calling real close(%d)\n", fd);
return orig_close(fd);
}
//要替换系统open的函数
int my_open(const char *path, int oflag, ...) {
va_list ap = {0};
mode_t mode = 0;
if ((oflag & O_CREAT) != 0) {
// mode only applies to O_CREAT
va_start(ap, oflag);
mode = va_arg(ap, int);
va_end(ap);
printf("Calling real open('%s', %d, %d)\n", path, oflag, mode);
return orig_open(path, oflag, mode);
} else {
printf("Calling real open('%s', %d)\n", path, oflag);
return orig_open(path, oflag, mode);
}
}
int main(int argc, char * argv[])
{
@autoreleasepool {
rebind_symbols((struct rebinding[2]){{"close", my_close, (void *)&orig_close}, {"open", my_open, (void *)&orig_open}}, 2);
// Open our own binary and print out first 4 bytes (which is the same
// for all Mach-O binaries on a given architecture)
int fd = open(argv[0], O_RDONLY);
uint32_t magic_number = 0;
read(fd, &magic_number, 4);
printf("Mach-O Magic Number: %x \n", magic_number);
close(fd);
return UIApplicationMain(argc, argv, nil, NSStringFromClass([AppDelegate class]));
}
}
源码解析:
//定义一个数据结构
struct rebinding {
const char *name; //要重新绑定的的函数名称
void *replacement; //替换的函数指针
void **replaced; //定义一个指针指向原函数的地址
};
//定义一个链表节点数据结构
struct rebindings_entry {
struct rebinding *rebindings; //rebinding数组指针
size_t rebindings_nel; //要重新绑定的函数个数
struct rebindings_entry *next; //链表节点next指针
};
//全局的重新绑定的链表头指针
static struct rebindings_entry *_rebindings_head;
static int prepend_rebindings(struct rebindings_entry **rebindings_head,
struct rebinding rebindings[],
size_t nel) {
//新建一个链表节点
struct rebindings_entry *new_entry = (struct rebindings_entry *) malloc(sizeof(struct rebindings_entry));
if (!new_entry) {
return -1;
}
new_entry->rebindings = (struct rebinding *) malloc(sizeof(struct rebinding) * nel);
if (!new_entry->rebindings) {
free(new_entry);
return -1;
}
//把包含重新绑定信息的若干个rebinding设置到新建节点的rebindings成员
memcpy(new_entry->rebindings, rebindings, sizeof(struct rebinding) * nel);
//设置这个节点要重新绑定的函数的个数
new_entry->rebindings_nel = nel;
//新节点作为链表头,拼接本来的链表
new_entry->next = *rebindings_head;
*rebindings_head = new_entry;
return 0;
}
int rebind_symbols(struct rebinding rebindings[], size_t rebindings_nel) {
//在链表头增加新的节点
int retval = prepend_rebindings(&_rebindings_head, rebindings, rebindings_nel);
if (retval < 0) {
return retval;
}
if (!_rebindings_head->next) {
//首次调用,注册系统回调
_dyld_register_func_for_add_image(_rebind_symbols_for_image);
} else {
uint32_t c = _dyld_image_count();
for (uint32_t i = 0; i < c; i++) {
_rebind_symbols_for_image(_dyld_get_image_header(i), _dyld_get_image_vmaddr_slide(i));
}
}
return retval;
}
//回调函数
//const struct mach_header *header:mach_header的地址
//intptr_t slide:slide ASLR随机偏移量
static void _rebind_symbols_for_image(const struct mach_header *header,
intptr_t slide) {
rebind_symbols_for_image(_rebindings_head, header, slide);
}
static void rebind_symbols_for_image(struct rebindings_entry *rebindings,
const struct mach_header *header,
intptr_t slide) {
Dl_info info;
if (dladdr(header, &info) == 0) {
return;
}
segment_command_t *cur_seg_cmd;
segment_command_t *linkedit_segment = NULL;
struct symtab_command* symtab_cmd = NULL;
struct dysymtab_command* dysymtab_cmd = NULL;
//得到load command的起始地址
uintptr_t cur = (uintptr_t)header + sizeof(mach_header_t);
//遍历load commands区域,得到linkedit_segment,symtab_cmd,dysymtab_cmd的地址
for (uint i = 0; i < header->ncmds; i++, cur += cur_seg_cmd->cmdsize) {
cur_seg_cmd = (segment_command_t *)cur;
//LC_SEGMENT指令
if (cur_seg_cmd->cmd == LC_SEGMENT_ARCH_DEPENDENT) {
//__LINKEDIT段
if (strcmp(cur_seg_cmd->segname, SEG_LINKEDIT) == 0) {
//记录__LINKEDIT段地址
linkedit_segment = cur_seg_cmd;
}
} else if (cur_seg_cmd->cmd == LC_SYMTAB) { //符号表
//记录symtab地址
symtab_cmd = (struct symtab_command*)cur_seg_cmd;
} else if (cur_seg_cmd->cmd == LC_DYSYMTAB) {//动态符号表
//记录dysymtab地址
dysymtab_cmd = (struct dysymtab_command*)cur_seg_cmd;
}
}
if (!symtab_cmd || !dysymtab_cmd || !linkedit_segment ||
!dysymtab_cmd->nindirectsyms) {
return;
}
//确定ASLR随机偏移后,mach-header的内存地址
uintptr_t linkedit_base = (uintptr_t)slide + linkedit_segment->vmaddr - linkedit_segment->fileoff;
//计算Symbol Table的位置
nlist_t *symtab = (nlist_t *)(linkedit_base + symtab_cmd->symoff);
//计算String Table的位置
char *strtab = (char *)(linkedit_base + symtab_cmd->stroff);
//计算Dynamic Symbol Table的位置
uint32_t *indirect_symtab = (uint32_t *)(linkedit_base + dysymtab_cmd->indirectsymoff);
cur = (uintptr_t)header + sizeof(mach_header_t);
//再一次遍历load commands区域
for (uint i = 0; i < header->ncmds; i++, cur += cur_seg_cmd->cmdsize) {
cur_seg_cmd = (segment_command_t *)cur;
//LC_SEGMENT
if (cur_seg_cmd->cmd == LC_SEGMENT_ARCH_DEPENDENT) {
//跳过不是DATA段的load command
if (strcmp(cur_seg_cmd->segname, SEG_DATA) != 0 &&
strcmp(cur_seg_cmd->segname, SEG_DATA_CONST) != 0) {
continue;
}
//遍历load command中__DATA中的section
for (uint j = 0; j < cur_seg_cmd->nsects; j++) {
section_t *sect =
(section_t *)(cur + sizeof(segment_command_t)) + j;
//__la_symbol_ptr区
if ((sect->flags & SECTION_TYPE) == S_LAZY_SYMBOL_POINTERS) {
perform_rebinding_with_section(rebindings, sect, slide, symtab, strtab, indirect_symtab);
}
//__nl_symbol_ptr区
if ((sect->flags & SECTION_TYPE) == S_NON_LAZY_SYMBOL_POINTERS) {
perform_rebinding_with_section(rebindings, sect, slide, symtab, strtab, indirect_symtab);
}
}
}
}
}
//fishhook通过函数名找到函数指针并替换
static void perform_rebinding_with_section(struct rebindings_entry *rebindings,
section_t *section,
intptr_t slide,
nlist_t *symtab,
char *strtab,
uint32_t *indirect_symtab) {
//获取__la_symbol_ptr区或__nl_symbol_ptr在indirect symtab表中的起始地址
uint32_t *indirect_symbol_indices = indirect_symtab + section->reserved1;
//计算__la_symbol_ptr区或__nl_symbol_ptr区内存地址
void **indirect_symbol_bindings = (void **)((uintptr_t)slide + section->addr);
//计算(延时/非延时)加载区的大小,并遍历
for (uint i = 0; i < section->size / sizeof(void *); i++) {
uint32_t symtab_index = indirect_symbol_indices[i];
if (symtab_index == INDIRECT_SYMBOL_ABS || symtab_index == INDIRECT_SYMBOL_LOCAL ||
symtab_index == (INDIRECT_SYMBOL_LOCAL | INDIRECT_SYMBOL_ABS)) {
continue;
}
//获取符号在String Table中的偏移
uint32_t strtab_offset = symtab[symtab_index].n_un.n_strx;
//获取符号名字
char *symbol_name = strtab + strtab_offset;
//_是符号的开始,'\0'是字符串结束符,所以一个合法的符号至少要包含'_'和除'\0'以外的其他字符
bool symbol_name_longer_than_1 = symbol_name[0] && symbol_name[1];
struct rebindings_entry *cur = rebindings;
while (cur) {
//遍历链表,将替换成新实现,保存老实现
for (uint j = 0; j < cur->rebindings_nel; j++) {
//匹配函数名,strcmp比较字符串函数,碰到'\0'字符函数结束,字符串相等返回0
if (strcmp(&symbol_name[1], cur->rebindings[j].name) == 0) {
if (cur->rebindings[j].replaced != NULL &&
indirect_symbol_bindings[i] != cur->rebindings[j].replacement) {
//保存绑定前的函数指针
*(cur->rebindings[j].replaced) = indirect_symbol_bindings[i];
}
//替换新实现函数指针
indirect_symbol_bindings[i] = cur->rebindings[j].replacement;
goto symbol_loop;
}
}
cur = cur->next;
}
symbol_loop:;
}
}