人生无根蒂,飘如陌上尘。 分散逐风转,此已非常身。
— 陶渊明 《杂诗》
mach-o
格式是OS X系统上的可执行文件格式,类似于windows的PE
与linux的ELF
,如果不彻底搞清楚mach-o
的格式与相关知识,去做其他研究,无异于建造空中阁楼。
每个Mach-O文件斗包含一个Mach-O头,然后是载入命令(Load Commands),最后是数据块(Data)。
接下来就对整个Mach-O的格式做出详细的分析。
Mach-O文件的格式如下图所示:
又如下几个部分组成:
Headers的定义可以在开源的内核代码中找到。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
/* * The 32-bit mach header appears at the very beginning of the object file for * 32-bit architectures. */ struct mach_header { uint32_t magic; /* mach magic number identifier */ cpu_type_t cputype; /* cpu specifier */ cpu_subtype_t cpusubtype; /* machine specifier */ uint32_t filetype; /* type of file */ uint32_t ncmds; /* number of load commands */ uint32_t sizeofcmds; /* the size of all the load commands */ uint32_t flags; /* flags */ }; /* Constant for the magic field of the mach_header (32-bit architectures) */ #define MH_MAGIC 0xfeedface /* the mach magic number */ #define MH_CIGAM 0xcefaedfe /* NXSwapInt(MH_MAGIC) */ /* * The 64-bit mach header appears at the very beginning of object files for * 64-bit architectures. */ struct mach_header_64 { uint32_t magic; /* mach magic number identifier */ cpu_type_t cputype; /* cpu specifier */ cpu_subtype_t cpusubtype; /* machine specifier */ uint32_t filetype; /* type of file */ uint32_t ncmds; /* number of load commands */ uint32_t sizeofcmds; /* the size of all the load commands */ uint32_t flags; /* flags */ uint32_t reserved; /* reserved */ }; /* Constant for the magic field of the mach_header_64 (64-bit architectures) */ #define MH_MAGIC_64 0xfeedfacf /* the 64-bit mach magic number */ #define MH_CIGAM_64 0xcffaedfe /* NXSwapInt(MH_MAGIC_64) */ |
根据mach_header
与mach_header_64
的定义,很明显可以看出,Headers的主要作用就是帮助系统迅速的定位Mach-O文件的运行环境,文件类型。
使用工具分析一个mach-o文件来具体的看一下Mach-O Headers。
通过otool可以得到Mach header的具体的情况,但是可读性略微有一点差。
1 2 3 4 5 |
➜ bin otool -h git git: Mach header magic cputype cpusubtype caps filetype ncmds sizeofcmds flags 0xfeedfacf 16777223 3 0x80 2 17 1432 0x00200085 |
还有一个工具是MachOview可以看的更清楚一点。
因为Mach-O文件不仅仅用来实现可执行文件,同时还用来实现了其他内容
他的源码定义如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
#define MH_OBJECT 0x1 /* relocatable object file */ #define MH_EXECUTE 0x2 /* demand paged executable file */ #define MH_FVMLIB 0x3 /* fixed VM shared library file */ #define MH_CORE 0x4 /* core file */ #define MH_PRELOAD 0x5 /* preloaded executable file */ #define MH_DYLIB 0x6 /* dynamically bound shared library */ #define MH_DYLINKER 0x7 /* dynamic link editor */ #define MH_BUNDLE 0x8 /* dynamically bound bundle file */ #define MH_DYLIB_STUB 0x9 /* shared library stub for static */ /* linking only, no section contents */ #define MH_DSYM 0xa /* companion file with only debug */ /* sections */ #define MH_KEXT_BUNDLE 0xb /* x86_64 kexts */ |
解释一下一些常用到的文件类型。
File Type | 用处 | 例子 |
---|---|---|
MH_OBJECT | 编译过程中产生的*.obj文件 | gcc -c xxx.c 生成xxx.o文件 |
MH_EXECUTABLE | 可执行二进制文件 | /usr/bin/git |
MH_CORE | CoreDump | 崩溃时的Dump文件 |
MH_DYLIB | 动态库 | /usr/lib/里面的那些库文件 |
MH_DYLINKER | 连接器linker | /usr/lib/dyld文件 |
MH_KEXT_BUNDLE | 内核扩展文件 | 自己开发的简单内核模块 |
Mach-O headers还包含了一些很重要的dyld的加载参数。代码中的定义如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
#define MH_INCRLINK 0x2 /* the object file is the output of an incremental link against a base file and can't be link edited again */ #define MH_DYLDLINK 0x4 /* the object file is input for the dynamic linker and can't be staticly link edited again */ #define MH_BINDATLOAD 0x8 /* the object file's undefined references are bound by the dynamic linker when loaded. */ #define MH_PREBOUND 0x10 /* the file has its dynamic undefined references prebound. */ #define MH_SPLIT_SEGS 0x20 /* the file has its read-only and read-write segments split */ #define MH_LAZY_INIT 0x40 /* the shared library init routine is to be run lazily via catching memory faults to its writeable segments (obsolete) */ #define MH_TWOLEVEL 0x80 /* the image is using two-level name space bindings */ ... //太长,有兴趣可以自己看源码 // EXTERNAL_HEADERS/mach-o/x86_64/loader.h |
同样简单的介绍几个比较重要的。
Flag Type | 含义 |
---|---|
MH_NOUNDEFS | 目标没有未定义的符号,不存在链接依赖 |
MH_DYLDLINK | 该目标文件是dyld的输入文件,无法被再次的静态链接 |
MH_PIE | 允许随机的地址空间 |
MH_ALLOW_STACK_EXECUTION | 栈内存可执行代码,一般是默认关闭的。 |
MH_NO_HEAP_EXECUTION | 堆内存无法执行代码 |
这是load_command的数据结构
1 2 3 4 |
struct load_command { uint32_t cmd; /* type of load command */ uint32_t cmdsize; /* total size of command in bytes */ }; |
Load Commands 直接就跟在Header后面,所有command占用内存的总和在Mach-O Header里面已经给出了。在加载过Header之后就是通过解析LoadCommand来加载接下来的数据了。我简单的看了一下内核中是如何解析macho数据的,抛开内核的实现细节,逻辑其实也十分简单。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
static load_return_t parse_machfile( struct vnode *vp, vm_map_t map, thread_t thread, struct mach_header *header, off_t file_offset, off_t macho_size, int depth, int64_t aslr_offset, int64_t dyld_aslr_offset, load_result_t *result ) { [...] //此处省略大量初始化与检测 /* * Loop through each of the load_commands indicated by the * Mach-O header; if an absurd value is provided, we just * run off the end of the reserved section by incrementing * the offset too far, so we are implicitly fail-safe. */ offset = mach_header_sz; ncmds = header->ncmds; while (ncmds--) { /* * Get a pointer to the command. */ lcp = (struct load_command *)(addr + offset); //lcp设为当前要解析的cmd的地址 oldoffset = offset; //oldoffset是从macho文件内存开始的地方偏移到当前command的偏移量 offset += lcp->cmdsize; //重新计算offset,再加上当前command的长度,offset的值为文件内存起始地址到下一个command的偏移量 /* * Perform prevalidation of the struct load_command * before we attempt to use its contents. Invalid * values are ones which result in an overflow, or * which can not possibly be valid commands, or which * straddle or exist past the reserved section at the * start of the image. */ if (oldoffset > offset || lcp->cmdsize < sizeof(struct load_command) || offset > header->sizeofcmds + mach_header_sz) { ret = LOAD_BADMACHO; break; } //做了一个检测,与如何加载进入内存无关 /* * Act on struct load_command's for which kernel * intervention is required. */ switch(lcp->cmd) { case LC_SEGMENT: [...] ret = load_segment(lcp, header->filetype, control, file_offset, macho_size, vp, map, slide, result); break; case LC_SEGMENT_64: [...] ret = load_segment(lcp, header->filetype, control, file_offset, macho_size, vp, map, slide, result); break; case LC_UNIXTHREAD: if (pass != 1) break; ret = load_unixthread( (struct thread_command *) lcp, thread, slide, result); break; case LC_MAIN: if (pass != 1) break; if (depth != 1) break; ret = load_main( (struct entry_point_command *) lcp, thread, slide, result); break; case LC_LOAD_DYLINKER: if (pass != 3) break; if ((depth == 1) && (dlp == 0)) { dlp = (struct dylinker_command *)lcp; dlarchbits = (header->cputype & CPU_ARCH_MASK); } else { ret = LOAD_FAILURE; } break; case LC_UUID: if (pass == 1 && depth == 1) { ret = load_uuid((struct uuid_command *) lcp, (char *)addr + mach_header_sz + header->sizeofcmds, result); } break; case LC_CODE_SIGNATURE: [...] ret = load_code_signature( (struct linkedit_data_command *) lcp, vp, file_offset, macho_size, header->cputype, result); [...] break; #if CONFIG_CODE_DECRYPTION case LC_ENCRYPTION_INFO: case LC_ENCRYPTION_INFO_64: if (pass != 3) break; ret = set_code_unprotect( (struct encryption_info_command *) lcp, addr, map, slide, vp, file_offset, header->cputype, header->cpusubtype); if (ret != LOAD_SUCCESS) { printf("proc %d: set_code_unprotect() error %d " "for file \"%s\"\n", p->p_pid, ret, vp->v_name); /* * Don't let the app run if it's * encrypted but we failed to set up the * decrypter. If the keys are missing it will * return LOAD_DECRYPTFAIL. */ if (ret == LOAD_DECRYPTFAIL) { /* failed to load due to missing FP keys */ proc_lock(p); p->p_lflag |= P_LTERM_DECRYPTFAIL; proc_unlock(p); } psignal(p, SIGKILL); } break; #endif default: /* Other commands are ignored by the kernel */ ret = LOAD_SUCCESS; break; } if (ret != LOAD_SUCCESS) break; } if (ret != LOAD_SUCCESS) break; } [...] //此处略去加载之后的处理代码 } |
这里主要看while循环刚刚进入的时候几行代码,来理解是如何通过load_command的cmd字段来解析Macho文件的数据。
1 2 3 4 5 6 7 8 |
... lcp = (struct load_command *)(addr + offset); //lcp设为当前要解析的cmd的地址 oldoffset = offset; //oldoffset是从macho文件内存开始的地方偏移到当前command的偏移量 offset += lcp->cmdsize; //重新计算offset,再加上当前command的长度,offset的值为文件内存起始地址到下一个command的偏移量 ... |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
switch(lcp->cmd) { case LC_SEGMENT: [...] ret = load_segment(lcp, header->filetype, control, file_offset, macho_size, vp, map, slide, result); break; case LC_SEGMENT_64: [...] ret = load_segment(lcp, header->filetype, control, file_offset, macho_size, vp, map, slide, result); break; case LC_UNIXTHREAD: if (pass != 1) break; ret = load_unixthread( (struct thread_command *) lcp, thread, slide, result); break; case LC_MAIN: if (pass != 1) break; if (depth != 1) break; ret = load_main( (struct entry_point_command *) lcp, thread, slide, result); break; case LC_LOAD_DYLINKER: if (pass != 3) break; if ((depth == 1) && (dlp == 0)) { dlp = (struct dylinker_command *)lcp; dlarchbits = (header->cputype & CPU_ARCH_MASK); } else { ret = LOAD_FAILURE; } break; case LC_UUID: if (pass == 1 && depth == 1) { ret = load_uuid((struct uuid_command *) lcp, (char *)addr + mach_header_sz + header->sizeofcmds, result); } break; case LC_CODE_SIGNATURE: [...] ret = load_code_signature( (struct linkedit_data_command *) lcp, vp, file_offset, macho_size, header->cputype, result); [...] break; #if CONFIG_CODE_DECRYPTION case LC_ENCRYPTION_INFO: case LC_ENCRYPTION_INFO_64: if (pass != 3) break; ret = set_code_unprotect( (struct encryption_info_command *) lcp, addr, map, slide, vp, file_offset, header->cputype, header->cpusubtype); if (ret != LOAD_SUCCESS) { printf("proc %d: set_code_unprotect() error %d " "for file \"%s\"\n", p->p_pid, ret, vp->v_name); /* * Don't let the app run if it's * encrypted but we failed to set up the * decrypter. If the keys are missing it will * return LOAD_DECRYPTFAIL. */ if (ret == LOAD_DECRYPTFAIL) { /* failed to load due to missing FP keys */ proc_lock(p); p->p_lflag |= P_LTERM_DECRYPTFAIL; proc_unlock(p); } psignal(p, SIGKILL); } break; #endif default: /* Other commands are ignored by the kernel */ ret = LOAD_SUCCESS; break; } |
从这一段代码可以看出,根据cmd字段的类型不同,使用了不同的函数来加载。简单的列出一张表看一看在内核代码中不同的command类型都有哪些作用。
Command类型 | 处理函数 | 用途 |
---|---|---|
LC_SEGMENT;LC_SEGMENT_64 | load_segment | 将segment中的数据加载并映射到进程的内存空间去 |
LC_LOAD_DYLINKER | load_dylinker | 调用/usr/lib/dyld程序 |
LC_UUID | load_uuid | 加载128-bit的唯一ID |
LC_THREAD | load_thread | 开启一个MACH线程,但是不分配栈空间。 |
LC_UNIXTHREAD | load_unixthread | 开启一个UNIX线程 |
LC_CODE_SIGNATURE | load_code_signature | 进行数字签名 |
LC_ENCRYPTION_INFO | set_code_unprotect | 加密二进制文件 |
加载数据时,主要加载的就是LC_SEGMET活着LC_SEGMENT_64。其他的Segment的用途在上一节已经简单的介绍了,这里不做深究。
LCSEGMENT以及LC_SEGMENT_64的数据结构是这样的。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
struct segment_command { /* for 32-bit architectures */ uint32_t cmd; /* LC_SEGMENT */ uint32_t cmdsize; /* includes sizeof section structs */ char segname[16]; /* segment name */ uint32_t vmaddr; /* memory address of this segment */ uint32_t vmsize; /* memory size of this segment */ uint32_t fileoff; /* file offset of this segment */ uint32_t filesize; /* amount to map from the file */ vm_prot_t maxprot; /* maximum VM protection */ vm_prot_t initprot; /* initial VM protection */ uint32_t nsects; /* number of sections in segment */ uint32_t flags; /* flags */ }; struct segment_command_64 { /* for 64-bit architectures */ uint32_t cmd; /* LC_SEGMENT_64 */ uint32_t cmdsize; /* includes sizeof section_64 structs */ char segname[16]; /* segment name */ uint64_t vmaddr; /* memory address of this segment */ uint64_t vmsize; /* memory size of this segment */ uint64_t fileoff; /* file offset of this segment */ uint64_t filesize; /* amount to map from the file */ vm_prot_t maxprot; /* maximum VM protection */ vm_prot_t initprot; /* initial VM protection */ uint32_t nsects; /* number of sections in segment */ uint32_t flags; /* flags */ }; |
可以看出,这里大部分的数据是用来帮助内核将Segment映射到虚拟内存的。主要要关注的是nsects
字段,标示了Segment中有多少secetion。section是具体有用的数据存放的地方。
Section的数据结构如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
struct section { /* for 32-bit architectures */ char sectname[16]; /* name of this section */ char segname[16]; /* segment this section goes in */ uint32_t addr; /* memory address of this section */ uint32_t size; /* size in bytes of this section */ uint32_t offset; /* file offset of this section */ uint32_t align; /* section alignment (power of 2) */ uint32_t reloff; /* file offset of relocation entries */ uint32_t nreloc; /* number of relocation entries */ uint32_t flags; /* flags (section type and attributes)*/ uint32_t reserved1; /* reserved (for offset or index) */ uint32_t reserved2; /* reserved (for count or sizeof) */ }; struct section_64 { /* for 64-bit architectures */ char sectname[16]; /* name of this section */ char segname[16]; /* segment this section goes in */ uint64_t addr; /* memory address of this section */ uint64_t size; /* size in bytes of this section */ uint32_t offset; /* file offset of this section */ uint32_t align; /* section alignment (power of 2) */ uint32_t reloff; /* file offset of relocation entries */ uint32_t nreloc; /* number of relocation entries */ uint32_t flags; /* flags (section type and attributes)*/ uint32_t reserved1; /* reserved (for offset or index) */ uint32_t reserved2; /* reserved (for count or sizeof) */ uint32_t reserved3; /* reserved */ }; |
除了同样有帮助内存映射的变量外,在了解Mach-O格式的时候,只需要知道不同的Section有着不同的作用就可以了。
Section | 作用 |
---|---|
__text | 代码 |
__cstring | 硬编码的字符串 |
__const | const 关键词修饰过的变量 |
__DATA.__bss | bss段 |
因为section类型已经是最小的分类了,还有更多复杂section段就不一一例举了,遇到没见过的section类型可以自行查找Apple文档。
通过对Mach-O格式的仔细分析,可以更好的理解Mach-O文件的加载过程,为研究dyld或者其他OS X系统下的模块打好基础。
1.mach-o文件加载的全过程(1)
http://dongaxis.github.io/2015/01/01/mac-o%E6%96%87%E4%BB%B6%E5%8A%A0%E8%BD%BD%E7%9A%84%E5%85%A8%E8%BF%87%E7%A8%8B-1/
2.Mach-O 可执行文件
http://objccn.io/issue-6-3/
3.iPhone Mach-O文件格式与代码签名
http://zhiwei.li/text/2012/02/15/iphone-mach-o%E6%96%87%E4%BB%B6%E6%A0%BC%E5%BC%8F%E4%B8%8E%E4%BB%A3%E7%A0%81%E7%AD%BE%E5%90%8D/
4.Dynamic Linking of Imported Functions in Mach-O
http://www.codeproject.com/Articles/187181/Dynamic-Linking-of-Imported-Functions-in-Mach-O
5.otool详解Mach-o文件头部
http://www.mc2lab.com/?p=68
原文地址: http://turingh.github.io/2016/03/07/mach-o%E6%96%87%E4%BB%B6%E6%A0%BC%E5%BC%8F%E5%88%86%E6%9E%90/