一、前言
通常,操作系统为了加载一个程序,会在编译后的代码的前面添加一个文件头,提供相应的定位信息,这样操作系统才能在加载EXE时将代码段、数据段加载到正确的内存位置。同时,有些编译器还会提供一些调试信息,如符号表等。如果是.o文件,通常称为relocatable file,这种文件没有经过链接,需要进行重定位,不可以执行。如果是EXE文件,称为executable file,经过连接器链接的可以直接执行,这时文件中的虚拟地址是最终的。操作系统可以设定加载的段基地址,也就是操作系统可以将整个EXE加载到任意位置,但是必须按照EXE中的信息将相应的段加载到合适的位置,相对距离不变,这样代码才能正确执行。提供文件头的EXE文件依赖于加载器的加载,如execve()系统调用,然而操作系统的初始阶段是没有加载器的,我们只能直接跳到某条指令开始执行,这时需要纯二进制文件(raw binary),代码的入口即为文件的第一条语句。有工具可以将EXE文件转换为纯二进制文件,即objcopy。这里,我们通过研究64位可执行文件的格式,以及利用工具objdump将编译后的机器指令反汇编为汇编指令,来了解一些EXE的信息。
二、求最大值的GNU汇编代码max.s
#开头的为注释,下同
#数据段
.section .data
data_items:
.long 'H','E','L','L','O','_','W','O','R','L','D','!','!',0#使用long类型是为了看大端和小端
#代码段
.section .text
#将入口地址声明为全局可见,默认是局部可见
.globl _start
_start:
#GNU汇编中左边是源操作数,右边是目标操作数,与intel汇编正好相反
#常数要加$,不加$的符号视为地址,寄存器前面要加%
movl $0, %edi
movl data_items(,%edi,4), %eax # (data_items+ 4*edi) → eax
#将data_items的第一个数据放入寄存器ebx中,ebx保存最大值
movl %eax, %ebx# eax → ebx
start_loop:
#数据为0时结束,表示没有数据了
cmpl $0, %eax
je loop_exit
incl %edi
movl data_items(,%edi,4), %eax# (data_items+ 4*edi) → eax
cmpl %ebx, %eax
jle start_loop# eax <= ebx
movl %eax, %ebx# eax > ebx,赋给eax → ebx
jmp start_loop
loop_exit:
movl $1, %eax# 1号系统调用,exit(ebx),结束进程
int $0x80
三、编译和运行
环境:ubuntu 15.04
编译:gcc -c -o max.o max.s
链接:ld -o max max.o
运行./max
运行之后通过echo $?可以查看该命令的退出状态,该状态即为最大值,95。
gcc中有指示编译成32位的选项-m32,此时代码段和数据段的对齐就不会是0x200000,距离会变得比较短。对应ld要加-m elf_i386选项,指明为32位平台。
ld中有指示代码段的加载地址的选项-Ttext,如-Ttext 0,则加载地址为0
四、EXE文件的格式
4.1 查看max的ELF等定位信息
命令:readelf -a max
-a表示查看所有ELF信息
可以得到如下的输出信息:
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 #EXE文件的魔数
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)#是EXE文件
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x4000b0 #程序入口地址,虚拟地址
Start of program headers: 64 (bytes into file)#文件中program headers 的偏移
Start of section headers: 656 (bytes into file)#文件中section headers的偏移
Flags: 0x0
Size of this header: 64 (bytes)#ELF header的大小
Size of program headers: 56 (bytes)#program headers的大小
Number of program headers: 2 #program headers的个数
Size of section headers: 64 (bytes) #section headers的大小
Number of section headers: 6#section headers的个数
Section header string table index: 3
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
#代码段入口地址0x4000b0,文件偏移地址0xb0,大小为0x2d
[ 1] .text PROGBITS 00000000004000b0 000000b0
000000000000002d 0000000000000000 AX 0 0 1
#数据段入口地址0x6000dd,文件偏移地址0xdd,大小为0x38
[ 2] .data PROGBITS 00000000006000dd 000000dd
0000000000000038 0000000000000000 WA 0 0 1
#节名表入口地址0x0,文件偏移地址0x115,大小为0x27
[ 3] .shstrtab STRTAB 0000000000000000 00000115
0000000000000027 0000000000000000 0 0 1
#符号表入口地址0x0,文件偏移地址0x140,大小为0x108
[ 4] .symtab SYMTAB 0000000000000000 00000140
0000000000000108 0000000000000018 5 7 8
#字符串表入口地址0x0,文件偏移地址0x248,大小为0x48
[ 5] .strtab STRTAB 0000000000000000 00000248
0000000000000048 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
There are no section groups in this file.
#program headers 提供段定位信息
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
#代码段,读和可执行,虚拟地址0x400000 →物理地址0x400000,文件偏移0,
#长度为#0xdd,对齐为0x200000
#包含ELF header和代码段
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000000dd 0x00000000000000dd R E 200000
#数据段,读和写,虚拟地址0x6000dd →物理地址0x6000dd,文件偏移0xdd,长度为#0x38,对齐为0x200000
LOAD 0x00000000000000dd 0x00000000006000dd 0x00000000006000dd
0x0000000000000038 0x0000000000000038 RW 200000
Section to Segment mapping:
Segment Sections...
00 .text
01 .data
There is no dynamic section in this file.
There are no relocations in this file.
The decoding of unwind sections for machine type Advanced Micro Devices X86-64 is not currently supported.
#符号表:程序中的符号及其对应的地址
Symbol table '.symtab' contains 11 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000004000b0 0 SECTION LOCAL DEFAULT 1
2: 00000000006000dd 0 SECTION LOCAL DEFAULT 2
3: 0000000000000000 0 FILE LOCAL DEFAULT ABS max.o
4: 00000000006000dd 0 NOTYPE LOCAL DEFAULT 2 data_items
5: 00000000004000bf 0 NOTYPE LOCAL DEFAULT 1 start_loop
6: 00000000004000d6 0 NOTYPE LOCAL DEFAULT 1 loop_exit
7: 00000000004000b0 0 NOTYPE GLOBAL DEFAULT 1 _start
8: 0000000000600115 0 NOTYPE GLOBAL DEFAULT 2 __bss_start
9: 0000000000600115 0 NOTYPE GLOBAL DEFAULT 2 _edata
10: 0000000000600118 0 NOTYPE GLOBAL DEFAULT 2 _end
No version information found in this file.
4.2 反汇编代码
命令:objdump -d max
-d表示反汇编
输出:
file format elf64-x86-64
Disassembly of section .text:
#根据program headers提供的信息,最终代码段将加载到0x4000b0这个位置
00000000004000b0 <_start>:
4000b0: bf 00 00 00 00 mov $0x0,%edi
#data_items被换成0x6000dd,即数据段的起始地址
4000b5: 67 8b 04 bd dd 00 60 mov 0x6000dd(,%edi,4),%eax
4000bc: 00
4000bd: 89 c3 mov %eax,%ebx
#start_loop和loop_exit都被换掉
00000000004000bf
4000bf: 83 f8 00 cmp $0x0,%eax
4000c2: 74 12 je 4000d6
4000c4: ff c7 inc %edi
4000c6: 67 8b 04 bd dd 00 60 mov 0x6000dd(,%edi,4),%eax
4000cd: 00
4000ce: 39 d8 cmp %ebx,%eax
4000d0: 7e ed jle 4000bf
4000d2: 89 c3 mov %eax,%ebx
4000d4: eb e9 jmp 4000bf
00000000004000d6
4000d6: b8 01 00 00 00 mov $0x1,%eax
4000db: cd 80 int $0x80
4.3 max文件的二进制内容及对应关系
命令:xxd -g 1 max
查看整个文件,默认偏移为0
输出:
0000000: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 .ELF............#ELF header
0000010: 02 00 3e 00 01 00 00 00 b0 00 40 00 00 00 00 00 ..>.......@.....#偏移:0
0000020: 40 00 00 00 00 00 00 00 90 02 00 00 00 00 00 00 @...............
0000030: 00 00 00 00 40 00 38 00 02 00 40 00 06 00 03 00 [email protected]...@.....#长度:64B
0000040: 01 00 00 00 05 00 00 00 00 00 00 00 00 00 00 00 ................#program headers
0000050: 00 00 40 00 00 00 00 00 00 00 40 00 00 00 00 00 ..@.......@.....#偏移:0x40
0000060: dd 00 00 00 00 00 00 00 dd 00 00 00 00 00 00 00 ................ #长度: 56B x 2
0000070: 00 00 20 00 00 00 00 00 01 00 00 00 06 00 00 00 .. ..........…
0000080: dd 00 00 00 00 00 00 00 dd 00 60 00 00 00 00 00 ..........`.....
0000090: dd 00 60 00 00 00 00 00 38 00 00 00 00 00 00 00 ..`.....8.......
00000a0: 38 00 00 00 00 00 00 00 00 00 20 00 00 00 00 00 8......... .....
00000b0: bf 00 00 00 00 67 8b 04 bd dd 00 60 00 89 c3 83 .....g.....`....#代码段
00000c0: f8 00 74 12 ff c7 67 8b 04 bd dd 00 60 00 39 d8 ..t...g.....`.9.#偏移:0xb0
00000d0: 7e ed 89 c3 eb e9 b8 01 00 00 00 cd 80 48 00 00 ~............H..#长度:0x2d字节
00000e0: 00 45 00 00 00 4c 00 00 00 4c 00 00 00 4f 00 00 .E...L...L...O.. #数据段
00000f0: 00 5f 00 00 00 57 00 00 00 4f 00 00 00 52 00 00 ._...W...O...R..#偏移: 0xdd
0000100: 00 4c 00 00 00 44 00 00 00 21 00 00 00 21 00 00 .L...D...!...!..#长度: 0x38字节
0000110: 00 00 00 00 00 00 2e 73 79 6d 74 61 62 00 2e 73 .......symtab..s#节名表shstrtab
0000120: 74 72 74 61 62 00 2e 73 68 73 74 72 74 61 62 00 trtab..shstrtab.#偏移: 0x115
0000130: 2e 74 65 78 74 00 2e 64 61 74 61 00 00 00 00 00 .text..data.....#长度: 0x27
0000140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................#符号表.symtab
0000150: 00 00 00 00 00 00 00 00 00 00 00 00 03 00 01 00 ................#有11条目x 24字节
0000160: b0 00 40 00 00 00 00 00 00 00 00 00 00 00 00 00 ..@.............#对应下面符号的地址
0000170: 00 00 00 00 03 00 02 00 dd 00 60 00 00 00 00 00 ..........`.....#偏移:0x140
0000180: 00 00 00 00 00 00 00 00 01 00 00 00 04 00 f1 ff ................#长度: 0x108
0000190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00001a0: 07 00 00 00 00 00 02 00 dd 00 60 00 00 00 00 00 ..........`.....#data_items
00001b0: 00 00 00 00 00 00 00 00 12 00 00 00 00 00 01 00 ................#start_loop
00001c0: bf 00 40 00 00 00 00 00 00 00 00 00 00 00 00 00 ..@.............
00001d0: 1d 00 00 00 00 00 01 00 d6 00 40 00 00 00 00 00 ..........@.....#loop_exit
00001e0: 00 00 00 00 00 00 00 00 27 00 00 00 10 00 01 00 ........'.......#_start
00001f0: b0 00 40 00 00 00 00 00 00 00 00 00 00 00 00 00 ..@.............
0000200: 2e 00 00 00 10 00 02 00 15 01 60 00 00 00 00 00 ..........`.....#_bss_start
0000210: 00 00 00 00 00 00 00 00 3a 00 00 00 10 00 02 00 ........:.......
0000220: 15 01 60 00 00 00 00 00 00 00 00 00 00 00 00 00 ..`.............#_edata
0000230: 41 00 00 00 10 00 02 00 18 01 60 00 00 00 00 00 A.........`.....#_end
0000240: 00 00 00 00 00 00 00 00 00 6d 61 78 2e 6f 00 64 .........max.o.d#字符串表strtab
0000250: 61 74 61 5f 69 74 65 6d 73 00 73 74 61 72 74 5f ata_items.start_#偏移: 0x248
0000260: 6c 6f 6f 70 00 6c 6f 6f 70 5f 65 78 69 74 00 5f loop.loop_exit._ #长度 : 0x46
0000270: 73 74 61 72 74 00 5f 5f 62 73 73 5f 73 74 61 72 start.__bss_star
0000280: 74 00 5f 65 64 61 74 61 00 5f 65 6e 64 00 00 00 t._edata._end…
0000290: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................#section headers
00002a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................偏移:0x290
00002b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................#64B x 6
00002c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................#空
00002d0: 1b 00 00 00 01 00 00 00 06 00 00 00 00 00 00 00 ................
00002e0: b0 00 40 00 00 00 00 00 b0 00 00 00 00 00 00 00 ..@.............#.text
00002f0: 2d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 -............…
0000300: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0000310: 21 00 00 00 01 00 00 00 03 00 00 00 00 00 00 00 !...............
0000320: dd 00 60 00 00 00 00 00 dd 00 00 00 00 00 00 00 ..`.............
0000330: 38 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 8...............#.data
0000340: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0000350: 11 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................
0000360: 00 00 00 00 00 00 00 00 15 01 00 00 00 00 00 00 ................
0000370: 27 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '...............#.shstrtab
0000380: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0000390: 01 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 ................
00003a0: 00 00 00 00 00 00 00 00 40 01 00 00 00 00 00 00 ........@.......
00003b0: 08 01 00 00 00 00 00 00 05 00 00 00 07 00 00 00 ................#.symtab
00003c0: 08 00 00 00 00 00 00 00 18 00 00 00 00 00 00 00 ................
00003d0: 09 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................
00003e0: 00 00 00 00 00 00 00 00 48 02 00 00 00 00 00 00 ........H.......#.strtab
00003f0: 46 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 F...............
0000400: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
五、关系图
5.1 EXE文件中的关系
注:箭头未必表示先后关系
5.2 代码文件的结构
ELF header : 64B |
program headers : 56B x 2 |
.text : 45B |
.data : 56B |
.shstrtab : 39B |
.symtab : 24B x 11 |
.strtab : 70B |
section headers : 64B x 6 |
六、EXE文件与BIN文件的转换
6.1 抽取代码段和数据段
要将带有可执行文件头和调试信息的EXE文件转换为纯文本文件,可以用如下命令:
objcopy -O binary -R .note -R .comment max max_copy
表示将max输出为二进制文件,保存在max_copy中,忽略.note和.comment的字段。
6.2 查看代码段
命令:xxd -g 1 -l 256 max_copy
查看开头的256个字节
得到开头的代码段:
0000000: bf 00 00 00 00 67 8b 04 bd dd 00 60 00 89 c3 83 .....g.....`....
0000010: f8 00 74 12 ff c7 67 8b 04 bd dd 00 60 00 39 d8 ..t...g.....`.9.
0000020: 7e ed 89 c3 eb e9 b8 01 00 00 00 cd 80 00 00 00 ~...............
0000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0000090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
6.3 查看数据段
命令:xxd -g 1 -l 256 -s 0x20002d max_copy
-s 表示offset,从0x20002d(= 数据段加载地址0x6000dd - 代码段加载地址0x4000b0)
开始展示,-g 表示每组是1个字节的十六进制,-l表示展示256个字节。
得到数据段:
020002d: 48 00 00 00 45 00 00 00 4c 00 00 00 4c 00 00 00 H...E...L...L...
020003d: 4f 00 00 00 5f 00 00 00 57 00 00 00 4f 00 00 00 O..._...W...O...
020004d: 52 00 00 00 4c 00 00 00 44 00 00 00 21 00 00 00 R...L...D...!...
020005d: 21 00 00 00 00 00 00 00 !.......
可以看出,max_copy刚好只包含了代码段和数据段,且代码段位于文件开头。