在使用像Visual Studio或Qt Creator等IDE时,通常有一个叫做“构建”的按钮。当编辑完成要运行和测试时点一下它,程序就能跑起来了,所以我们很少关心编译和链接。其实,编译和链接合并在一起就称为 构建(Build)。简单的一次按键,实际背后却是异常复杂的过程:
Object File没有一个很合适的中文名称,书中翻译为目标文件。这一小节介绍有关目标文件的最重要的几个知识:文件类型、段的概念、常用的查看工具。
目标文件就是源代码编译后未进行链接的中间文件(Windows下的.obj和Linux下的.o),它跟可执行文件的内容和结构相似,所以一般跟可执行文件一起采用一种格式存储。不只是可执行文件,动态链接库和静态链接库(Windows下的.lib、.dll和Linux下的.a、.so)都按照可执行文件的格式存储。以ELF为例,这样ELF就有了下面四种类型:
用file命令可以查看各种文件的类型:
$ file openfile.c
openfile.c: C source, ASCII text, with CRLF line terminators
$ gcc -c openfile.c
$ file openfile.o
openfile.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
$ gcc -o openfile openfile.c
$ file openfile
openfile: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), \for GNU/Linux 2.6.24, BuildID[sha1]=086b5fbe84778f5683f7ef4dbd710fe2837370db, not stripped
$ file /lib/i386-linux-gnu/ld-2.19.so
/lib/i386-linux-gnu/ld-2.19.so: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, BuildID[sha1]=12f5bdcd6abd5fa411d5db326afcece6044621c4, stripped
讲目标文件是什么样子,先要了解的最重要的一个概念就是 “段”(Segment or Section)。例如,编译后的机器指令被放在代码段.text,全局变量和局部静态变量放在数据段.data。未初始化的全局变量和局部静态变量默认值都为0,本可以都放在.data段,但为了节省空间它们被放在一个.bss段中。ELF的文件头中包含了文件类型、入口地址、目标硬件,以及描述文件中各个段的 段表,后面会详细分析段表的作用。
为什么代码段和数据段要分开?
1)权限:程序装载后,数据和指令被映射到两个区域,可以分别设置为可读写和只读。
2)缓存:指令和数据分离有利于提高程序的局部性,提高缓存命中率。
3)共享:分离后的指令可以在多个进程间共享,节省大量内存。
之前在六星经典CSAPP-笔记(7)加载与链接(上)中的“2.对象文件查看工具”已经简单列举了学习链接时常用的工具,像nm、objdump、readelf、ldd等。这里重点总结一下objdump和readelf的最常见命令:
书上以一小段代码SimpleSection.c为例,里面包含了各种常见的元素,如外部引用printf()、内部引用func1()、全局变量global_init_var、未初始化的全局变量global_uninit_var、静态变量static_var、未初始化的静态变量static_var2等,详细研究了ELF文件的结构。
/* * SimpleSection.c * * Linux: * gcc -c SimpleSection.c * * Windows: * cl SimpleSection.c /c /Za */
int printf(const char* format, ...);
int global_init_var = 84;
int global_uninit_var;
void func1(int i)
{
printf("%d\n", i);
}
int main(void)
{
static int static_var = 85;
static int static_var2;
int a = 1;
int b;
func1(static_var + static_var2 + a + b);
return a;
}
用objdump -f或readelf -h都能查看到file header,不过readelf更详尽:
[root@localhost Temp]# readelf -h SimpleSection.o
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2\'s complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: REL (Relocatable file) Machine: Intel 80386 Version: 0x1 Entry point address: 0x0 Start of program headers: 0 (bytes into file) Start of section headers: 284 (bytes into file) Flags: 0x0 Size of this header: 52 (bytes) Size of program headers: 0 (bytes) Number of program headers: 0 Size of section headers: 40 (bytes) Number of section headers: 11 Section header string table index: 8
在/usr/include/elf.h中定义了32位和64位版本的ELF文件头结构,与上面readelf的输出是一一对应,重点关注加粗的字段和其含义,e_entry之后的属性都是有关program header和section header的在后面会详细学习:
/* Type for a 16-bit quantity. */
typedef uint16_t Elf32_Half;
/* Types for signed and unsigned 32-bit quantities. */
typedef uint32_t Elf32_Word;
typedef int32_t Elf32_Sword;
/* Types for signed and unsigned 64-bit quantities. */
typedef uint64_t Elf32_Xword;
typedef int64_t Elf32_Sxword;
/* Type of addresses. */
typedef uint32_t Elf32_Addr;
/* Type of file offsets. */
typedef uint32_t Elf32_Off;
/* Type for section indices, which are 16-bit quantities. */
typedef uint16_t Elf32_Section;
/* Type for version symbol information. */
typedef Elf32_Half Elf32_Versym;
/* 32位ELF的文件头的结构定义 */
typedef struct
{
unsigned char e_ident[EI_NIDENT]; /* Magic number and other info */
Elf32_Half e_type; /* Object file type */
Elf32_Half e_machine; /* Architecture */
Elf32_Word e_version; /* Object file version */
Elf32_Addr e_entry; /* Entry point virtual address */
Elf32_Off e_phoff; /* Program header table file offset */
Elf32_Off e_shoff; /* Section header table file offset */
Elf32_Word e_flags; /* Processor-specific flags */
Elf32_Half e_ehsize; /* ELF header size in bytes */
Elf32_Half e_phentsize; /* Program header table entry size */
Elf32_Half e_phnum; /* Program header table entry count */
Elf32_Half e_shentsize; /* Section header table entry size */
Elf32_Half e_shnum; /* Section header table entry count */
Elf32_Half e_shstrndx; /* Section header string table index */
} Elf32_Ehdr;
/* e_ident:0x7f454c460101010000包含以下七项,下标常量以'EI_'开头:*/
/* ELF魔数:0x7f454c46。四个字节分别是DEL控制符和ELF三个字母的ASCII码。用于在操作系统加载可执行文件时能快速确认文件类型,否则拒绝加载 */
#define EI_MAG0 0 /* File identification byte 0 index */
#define ELFMAG0 0x7f /* Magic number byte 0 */
#define EI_MAG1 1 /* File identification byte 1 index */
#define ELFMAG1 'E' /* Magic number byte 1 */
#define EI_MAG2 2 /* File identification byte 2 index */
#define ELFMAG2 'L' /* Magic number byte 2 */
#define EI_MAG3 3 /* File identification byte 3 index */
#define ELFMAG3 'F' /* Magic number byte 3 */
/* Class: 0x01对应'ELF32' */
#define EI_CLASS 4 /* File class byte index */
#define ELFCLASSNONE 0 /* Invalid class */
#define ELFCLASS32 1 /* 32-bit objects */
#define ELFCLASS64 2 /* 64-bit objects */
/* Data: 0x01对应'2's complement, little endian' */
#define EI_DATA 5 /* Data encoding byte index */
#define ELFDATANONE 0 /* Invalid data encoding */
#define ELFDATA2LSB 1 /* 2's complement, little endian */
#define ELFDATA2MSB 2 /* 2's complement, big endian */
/* Version: 0x01对应'1' */
#define EI_VERSION 6 /* File version byte index */
/* OS/ABI: 0x00对应'UNIX System V ABI',ABI表示Application Binary Interface,指操作系统/编译器/库函数与应用程序间的交互接口 */
#define EI_OSABI 7 /* OS ABI identification */
#define ELFOSABI_NONE 0 /* UNIX System V ABI */
#define ELFOSABI_SYSV 0 /* Alias. */
#define ELFOSABI_HPUX 1 /* HP-UX */
#define ELFOSABI_NETBSD 2 /* NetBSD. */
#define ELFOSABI_LINUX 3 /* Linux. */
...
/* ABI Version: 0x00对应'0' */
#define EI_ABIVERSION 8 /* ABI version */
/* padding: 下标9开始到15的字节都是0x00 */
#define EI_PAD 9 /* Byte index of padding bytes */
/* e_type表示ELF文件类型,0x01对应'REL (Relocatable file)',相关常量都以'ET_'开头 */
#define ET_NONE 0 /* No file type */
#define ET_REL 1 /* Relocatable file */
#define ET_EXEC 2 /* Executable file */
#define ET_DYN 3 /* Shared object file */
#define ET_CORE 4 /* Core file */
...
/* e_machine表示CPU平台,0x03对应'Intel 80386',相关常量都以'EM_'开头 */
#define EM_NONE 0 /* No machine */
#define EM_M32 1 /* AT&T WE 32100 */
#define EM_SPARC 2 /* SUN SPARC */
#define EM_386 3 /* Intel 80386 */
...
/* e_version表示ELF版本号,0x01对应'01' */
#define EV_NONE 0 /* Invalid ELF version */
#define EV_CURRENT 1 /* Current version */
#define EV_NUM 2
段表是除了文件头以外最重要的结构。它保存了所有段的信息,比如段名、长度、读写权限等。编译器、链接器和装载器都是依靠段表来定位和访问各个段的属性的。段表的位置就是前面介绍过的e_shoff的值决定的,比如我的SimpleSection.o中段表位于偏移284字节。
用objdump -h只能查看.text、.data、.bss、.rodata、.comment六个段的内容,省略了一些像符号表、字符串表、段名字符串表、重定位表等辅助性的段。所以readelf -S查看到的才是真正完整的段表结构:
[root@localhost Temp]# readelf -S SimpleSection.o
There are 11 section headers, starting at offset 0x11c:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 00000000 000034 00005b 00 AX 0 0 4
[ 2] .rel.text REL 00000000 00042c 000028 08 9 1 4
[ 3] .data PROGBITS 00000000 000090 000008 00 WA 0 0 4
[ 4] .bss NOBITS 00000000 000098 000004 00 WA 0 0 4
[ 5] .rodata PROGBITS 00000000 000098 000004 00 A 0 0 1
[ 6] .comment PROGBITS 00000000 00009c 00002e 00 0 0 1
[ 7] .note.GNU-stack PROGBITS 00000000 0000ca 000000 00 0 0 1
[ 8] .shstrtab STRTAB 00000000 0000ca 000051 00 0 0 1
[ 9] .symtab SYMTAB 00000000 0002d4 0000f0 10 10 10 4
[10] .strtab STRTAB 00000000 0003c4 000066 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings)
I (info), L (link order), G (group), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
前面看到readelf -h的文件头输出与Elf32_Ehdr一一对应,同样地,readelf -S的段表输出也与一个叫做Elf32_Shdr的结构对应,值的含义也有对应的常量定义。下面对照Elf32_Shdr的结构定义,以.text段为例解释一下各个属性的含义:
/* 32位ELF的段表描述符的结构定义 */
typedef struct
{
Elf32_Word sh_name; /* Section name (string tbl index) */
Elf32_Word sh_type; /* Section type */
Elf32_Word sh_flags; /* Section flags */
Elf32_Addr sh_addr; /* Section virtual addr at execution */
Elf32_Off sh_offset; /* Section file offset */
Elf32_Word sh_size; /* Section size in bytes */
Elf32_Word sh_link; /* Link to another section */
Elf32_Word sh_info; /* Additional section information */
Elf32_Word sh_addralign; /* Section alignment */
Elf32_Word sh_entsize; /* Entry size if section holds table */
} Elf32_Shdr;
/* Legal values for sh_type (section type). */
#define SHT_NULL 0 /* Section header table entry unused */
#define SHT_PROGBITS 1 /* Program data */
#define SHT_SYMTAB 2 /* Symbol table */
#define SHT_STRTAB 3 /* String table */
#define SHT_RELA 4 /* Relocation entries with addends */
#define SHT_HASH 5 /* Symbol hash table */
#define SHT_DYNAMIC 6 /* Dynamic linking information */
#define SHT_NOTE 7 /* Notes */
#define SHT_NOBITS 8 /* Program space with no data (bss) */
#define SHT_REL 9 /* Relocation entries, no addends */
#define SHT_SHLIB 10 /* Reserved */
#define SHT_DYNSYM 11 /* Dynamic linker symbol table */
...
/* Legal values for sh_flags (section flags). */
#define SHF_WRITE (1 << 0) /* Writable */
#define SHF_ALLOC (1 << 1) /* Occupies memory during execution */
#define SHF_EXECINSTR (1 << 2) /* Executable */
...
了解了段表的结构,我们就能完整对ELF文件进行“庖丁解牛”般的剖析了。《程序员的自我修养》说:真正了不起的程序员对自己的程序的每一个字节都了如指掌。现在就有一个成为了不起的程序员的机会,我们就开始吧!用hexdump -C -v
做辅助,将ELF的每个字节都以十六进制和ASCII码的形式打印(-v打印所有相同的邻接行):
00000000: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 |.ELF…………|
00000010: 01 00 03 00 01 00 00 00 00 00 00 00 00 00 00 00
00000020: 1c 01 00 00 00 00 00 00 34 00 00 00 00 00 28 00
一开始的魔数及其他字节在前面都讲解过,就不细说了,重点看三处:
下面就看一下0x11c位置的段表是什么样子。因为第一个段是无效段,所以我们跳过头40个字节,分析.text段的描述符:
00000110 2e 47 4e 55 2d 73 74 61 63 6b 00 00 00 00 00 00
00000120 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000140 00 00 00 00 1f 00 00 00 01 00 00 00 06 00 00 00
00000150 00 00 00 00 34 00 00 00 5b 00 00 00 00 00 00 00
00000160 00 00 00 00 04 00 00 00 00 00 00 00 1b 00 00 00
首先,看一下如何在.shstrtab段中找到段名。这里因为我们知道readelf的输出结果中.shstrtab下标是8,说明它前面有8个段描述符,所以它的地址应为284(11C) + 8*40 = 604(25C)。对于链接器来说,它要遍历整个段表才能找到.shstrtab的段描述符。查看.shstrtab的描述符,发现它的偏移是0xca000000。
00000250 00 00 00 00 01 00 00 00 00 00 00 00 11 00 00 00
00000260 03 00 00 00 00 00 00 00 00 00 00 00 ca 00 00 00
00000270 51 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00
00000280 00 00 00 00 01 00 00 00 02 00 00 00 00 00 00 00
因为.text段的段名偏移是1f,所以段名地址就是0xca + 0x1f = 0xe9处,此处的0x2e74657874对应的ASCII字符串恰好就是.text。这里注意两点:
000000c0 34 2e 31 2e 32 2d 35 35 29 00 00 2e 73 79 6d 74 |4.1.2-55)…symt|
000000d0 61 62 00 2e 73 74 72 74 61 62 00 2e 73 68 73 74 |ab..strtab..shst|
000000e0 72 74 61 62 00 2e 72 65 6c 2e 74 65 78 74 00 2e |rtab..rel.text..|
000000f0 64 61 74 61 00 2e 62 73 73 00 2e 72 6f 64 61 74 |data..bss..rodat|
00000100 61 00 2e 63 6f 6d 6d 65 6e 74 00 2e 6e 6f 74 65 |a..comment..note|
00000110 2e 47 4e 55 2d 73 74 61 63 6b 00 00 00 00 00 00 |.GNU-stack……|
…
其次,我们再看一下.text段的内容。因为段在文件中的偏移是0x34000000,大小是0x5b000000,所以终止地址是0x8f000000。
00000030 0b 00 08 00 55 89 e5 83 ec 08 8b 45 08 89 44 24
00000040 04 c7 04 24 00 00 00 00 e8 fc ff ff ff c9 c3 8d
00000050 4c 24 04 83 e4 f0 ff 71 fc 55 89 e5 51 83 ec 14
00000060 c7 45 f4 01 00 00 00 8b 15 04 00 00 00 a1 00 00
00000070 00 00 8d 04 02 03 45 f4 03 45 f8 89 04 24 e8 fc
00000080 ff ff ff 8b 45 f4 83 c4 14 59 5d 8d 61 fc c3 00
用objdump -d确认我们的分析是正确的:
[root@localhost Temp]# objdump -d SimpleSection.o
SimpleSection.o: file format elf32-i386
Disassembly of section .text:
00000000 <func1>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 ec 08 sub $0x8,%esp
6: 8b 45 08 mov 0x8(%ebp),%eax
...
0000001b <main>:
1b: 8d 4c 24 04 lea 0x4(%esp),%ecx
1f: 83 e4 f0 and $0xfffffff0,%esp
22: ff 71 fc pushl 0xfffffffc(%ecx)
...
按照上面对.text段的分析方法逐个section分析,现在就可以描绘出ELF文件的完整结构,按照栈地址从高到低的传统:
……………………………………………… 0x00000454
.rel.text
……………………………………………… 0x0000042c
.strtab
……………………………………………… 0x000003c4
.symtab
……………………………………………… 0x000002d4
Section Table
……………………………………………… 0x0000011c
.shstrtab
……………………………………………… 0x000000ca
.note.GNU-stack
……………………………………………… 0x000000ca
.comment
……………………………………………… 0x0000009c
.rodata
……………………………………………… 0x00000098
.bss
……………………………………………… 0x00000098
.data
……………………………………………… 0x00000090
.text
……………………………………………… 0x00000034
ELF File Header
……………………………………………… 0x00000000
最后一个段.rel.text的偏移是0x42c,大小为0x28,所以ELF文件的终止地址应为0x42c + 0x28 = 0x454 = 1108。也就是说SimpleSection.o的文件大小应为1108字节,我们通过ls命令验证一下,的确如此!
[root@localhost Temp]# ll SimpleSection.o
-rw-r--r-- 1 root root 1108 Jun 2 15:25 SimpleSection.o