这时候我们还有两个疑问,一个是 inculde
.file "demo.c" // 文件名称 .section .rodata .LC0: .string "Hello, World! " .text // 代码段 只读 数据为何数据代码段? .globl main // globl 全局 .type main, @function // 类型 函数 main: // 函数首地址 pushq %rbp // 开辟栈帧 movq %rsp, %rbp // 开辟栈帧 movl $.LC0, %edi // 将字符串移动到 edi 寄存器中 call puts // 调用 puts 函数 movl $0, %eax // 将 0 移动到 eax 寄存器中,清空 eax 寄存器用来存放函数返回值 popq %rbp // 销毁栈 ret // 销毁栈 .size main, .-main .ident "GCC: (GNU) 4.8.5 20150623 (Red Hat 4.8.5-36)" .section .note.GNU-stack,"",@progbits
puts 函数我们并没有定义,他是从何而来,这个时候我们就需要引入操作系统。
CPU引入了保护模式,和操作系统配合,保护了整个计算机,那么我们一个普通的应用程序需要去操作硬件和CPU,都需要经过操作系统,那么操作系统就需要提供一个入口,让上层去访问,然后操作系统在内部实现调用硬件,这里我们可以类比架构。
一个Web页面或者是APP,因为数据是存放到后台的,他需要后台与后台交互实现它的功能,所以后台就说,如果你需要访问我的数据,那么你就按照我定义的规则来访问我,然后我处理之后将数据交由你。
而这个规则就是写一个 接口比如说 /user/add 表示功能新增一个用户。
然后我对这个接口发起了请求,后端服务就需要从如何接受请求,如何再返回,中间出现的问题都需要进行处理,每个接口都这么写,就很烦,所以一门语言创造出来就需要对基础通用的功能进行封装,这里每个语言都需要 HTTP协议 的实现,所以每个语言还需要提供他的库函数。
而操作系统也有这样的接口,把这些接口都成为 系统调用 以 sys_ 开头,定义的规范。
而C语言是有标准的,ANSI C 和 GUN C 而 libc 则是 ANSI C 的库函数,glibc 则是 GUN C 的库函数。
而现在我就去调用这个函数,这个函数封装的了系统调用 。由此我们使用函数库间接的调用了操作系统提供的接口。这些都是公共的接口,不需要重复编写。
那么这些接口会被汇总到一个头文件中,我们引入这个头文件即可使用。最后 GCC将这些源文件进行解析,然后再编译成汇编文件。但是计算机识别的二进制文件,那么我们最后还需要将汇编文件转化为二进制可执行的文件,那么在各大操作系统中是不一样的,比如说 windows 是 exe ,UNIX 是 .out
将这些宏处理,和头文件首先和源文件进行解析汇总生成一个 .i 文件
然后将处理之后的文件编译成汇编代码生成一个 .s 文件
最后将这个 .s 文件转化为可执行文件也就是 .out .os .o
最后由汇编器编译成最终的目标文件,既然要编译,就需要指定一个规范,而这个规范就是
ELF (Executable and Linking Format)
但是这个接口是需要实现,那么他是怎么找到这个实现的呢?
两个代码是独立的文件,我们就需要找到对方的函数在内存中的地址是多少,就需要一个映射关系,使用地址和方法入口来找到 调用的指令的真实地址在哪里。
这就是链接器的任务,将符号解析,并在生成文件找到真实的地址。
链接器:符号解析+代码重定位
我们使用 readelf 命令来查看 ELF 格式,使用objdump 命令查询二进制反汇编
Usage: readelf
编译并连接 gcc test.c
#includeint main() { printf("%s", "hello world!"); return 1; }
readlef -a a.out
ELF Header: Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 // 魔数 Class: ELF64 // 文件类型 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) //可执行文件 Machine: Advanced Micro Devices X86-64 Version: 0x1 Entry point address: 0x400430 Start of program headers: 64 (bytes into file) Start of section headers: 6456 (bytes into file) Flags: 0x0 Size of this header: 64 (bytes) Size of program headers: 56 (bytes) Number of program headers: 9 Size of section headers: 64 (bytes) Number of section headers: 31 Section header string table index: 30 Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [ 0] NULL 0000000000000000 00000000 0000000000000000 0000000000000000 0 0 0 [ 1] .interp PROGBITS 0000000000400238 00000238 000000000000001c 0000000000000000 A 0 0 1 [ 2] .note.ABI-tag NOTE 0000000000400254 00000254 0000000000000020 0000000000000000 A 0 0 4 [ 3] .note.gnu.build-i NOTE 0000000000400274 00000274 0000000000000024 0000000000000000 A 0 0 4 [ 4] .gnu.hash GNU_HASH 0000000000400298 00000298 000000000000001c 0000000000000000 A 5 0 8 [ 5] .dynsym DYNSYM 00000000004002b8 000002b8 0000000000000060 0000000000000018 A 6 1 8 [ 6] .dynstr STRTAB 0000000000400318 00000318 000000000000003f 0000000000000000 A 0 0 1 [ 7] .gnu.version VERSYM 0000000000400358 00000358 0000000000000008 0000000000000002 A 5 0 2 [ 8] .gnu.version_r VERNEED 0000000000400360 00000360 0000000000000020 0000000000000000 A 6 1 8 [ 9] .rela.dyn RELA 0000000000400380 00000380 0000000000000018 0000000000000018 A 5 0 8 [10] .rela.plt RELA 0000000000400398 00000398 0000000000000030 0000000000000018 AI 5 24 8 [11] .init PROGBITS 00000000004003c8 000003c8 000000000000001a 0000000000000000 AX 0 0 4 [12] .plt PROGBITS 00000000004003f0 000003f0 0000000000000030 0000000000000010 AX 0 0 16 [13] .plt.got PROGBITS 0000000000400420 00000420 0000000000000008 0000000000000000 AX 0 0 8 [14] .text PROGBITS 0000000000400430 00000430 0000000000000182 0000000000000000 AX 0 0 16 [15] .fini PROGBITS 00000000004005b4 000005b4 0000000000000009 0000000000000000 AX 0 0 4 [16] .rodata PROGBITS 00000000004005c0 000005c0 0000000000000020 0000000000000000 A 0 0 8 [17] .eh_frame_hdr PROGBITS 00000000004005e0 000005e0 0000000000000034 0000000000000000 A 0 0 4 [18] .eh_frame PROGBITS 0000000000400618 00000618 00000000000000f4 0000000000000000 A 0 0 8 [19] .init_array INIT_ARRAY 0000000000600e10 00000e10 0000000000000008 0000000000000008 WA 0 0 8 [20] .fini_array FINI_ARRAY 0000000000600e18 00000e18 0000000000000008 0000000000000008 WA 0 0 8 [21] .jcr PROGBITS 0000000000600e20 00000e20 0000000000000008 0000000000000000 WA 0 0 8 [22] .dynamic DYNAMIC 0000000000600e28 00000e28 00000000000001d0 0000000000000010 WA 6 0 8 [23] .got PROGBITS 0000000000600ff8 00000ff8 0000000000000008 0000000000000008 WA 0 0 8 [24] .got.plt PROGBITS 0000000000601000 00001000 0000000000000028 0000000000000008 WA 0 0 8 [25] .data PROGBITS 0000000000601028 00001028 0000000000000004 0000000000000000 WA 0 0 1 [26] .bss NOBITS 000000000060102c 0000102c 0000000000000004 0000000000000000 WA 0 0 1 [27] .comment PROGBITS 0000000000000000 0000102c 000000000000002d 0000000000000001 MS 0 0 1 [28] .symtab SYMTAB 0000000000000000 00001060 0000000000000600 0000000000000018 29 47 8 [29] .strtab STRTAB 0000000000000000 00001660 00000000000001cb 0000000000000000 0 0 1 [30] .shstrtab STRTAB 0000000000000000 0000182b 000000000000010c 0000000000000000 0 0 1 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings), I (info), L (link order), O (extra OS processing required), G (group), T (TLS), C (compressed), x (unknown), o (OS specific), E (exclude), l (large), p (processor specific) There are no section groups in this file. Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align PHDR 0x0000000000000040 0x0000000000400040 0x0000000000400040 0x00000000000001f8 0x00000000000001f8 R E 8 INTERP 0x0000000000000238 0x0000000000400238 0x0000000000400238 0x000000000000001c 0x000000000000001c R 1 [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2] LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000 0x000000000000070c 0x000000000000070c R E 200000 LOAD 0x0000000000000e10 0x0000000000600e10 0x0000000000600e10 0x000000000000021c 0x0000000000000220 RW 200000 DYNAMIC 0x0000000000000e28 0x0000000000600e28 0x0000000000600e28 0x00000000000001d0 0x00000000000001d0 RW 8 NOTE 0x0000000000000254 0x0000000000400254 0x0000000000400254 0x0000000000000044 0x0000000000000044 R 4 GNU_EH_FRAME 0x00000000000005e0 0x00000000004005e0 0x00000000004005e0 0x0000000000000034 0x0000000000000034 R 4 GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 RW 10 GNU_RELRO 0x0000000000000e10 0x0000000000600e10 0x0000000000600e10 0x00000000000001f0 0x00000000000001f0 R 1 Section to Segment mapping: Segment Sections... 00 01 .interp 02 .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .plt.got .text .fini .rodata .eh_frame_hdr .eh_frame 03 .init_array .fini_array .jcr .dynamic .got .got.plt .data .bss 04 .dynamic 05 .note.ABI-tag .note.gnu.build-id 06 .eh_frame_hdr 07 08 .init_array .fini_array .jcr .dynamic .got Dynamic section at offset 0xe28 contains 24 entries: Tag Type Name/Value 0x0000000000000001 (NEEDED) Shared library: [libc.so.6] 0x000000000000000c (INIT) 0x4003c8 0x000000000000000d (FINI) 0x4005b4 0x0000000000000019 (INIT_ARRAY) 0x600e10 0x000000000000001b (INIT_ARRAYSZ) 8 (bytes) 0x000000000000001a (FINI_ARRAY) 0x600e18 0x000000000000001c (FINI_ARRAYSZ) 8 (bytes) 0x000000006ffffef5 (GNU_HASH) 0x400298 0x0000000000000005 (STRTAB) 0x400318 0x0000000000000006 (SYMTAB) 0x4002b8 0x000000000000000a (STRSZ) 63 (bytes) 0x000000000000000b (SYMENT) 24 (bytes) 0x0000000000000015 (DEBUG) 0x0 0x0000000000000003 (PLTGOT) 0x601000 0x0000000000000002 (PLTRELSZ) 48 (bytes) 0x0000000000000014 (PLTREL) RELA 0x0000000000000017 (JMPREL) 0x400398 0x0000000000000007 (RELA) 0x400380 0x0000000000000008 (RELASZ) 24 (bytes) 0x0000000000000009 (RELAENT) 24 (bytes) 0x000000006ffffffe (VERNEED) 0x400360 0x000000006fffffff (VERNEEDNUM) 1 0x000000006ffffff0 (VERSYM) 0x400358 0x0000000000000000 (NULL) 0x0 Relocation section '.rela.dyn' at offset 0x380 contains 1 entries: Offset Info Type Sym. Value Sym. Name + Addend 000000600ff8 000300000006 R_X86_64_GLOB_DAT 0000000000000000 __gmon_start__ + 0 Relocation section '.rela.plt' at offset 0x398 contains 2 entries: Offset Info Type Sym. Value Sym. Name + Addend 000000601018 000100000007 R_X86_64_JUMP_SLO 0000000000000000 printf@GLIBC_2.2.5 + 0 000000601020 000200000007 R_X86_64_JUMP_SLO 0000000000000000 __libc_start_main@GLIBC_2.2.5 + 0 The decoding of unwind sections for machine type Advanced Micro Devices X86-64 is not currently supported. Symbol table '.dynsym' contains 4 entries: Num: Value Size Type Bind Vis Ndx Name 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND 1: 0000000000000000 0 FUNC GLOBAL DEFAULT UND printf@GLIBC_2.2.5 (2) 2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main@GLIBC_2.2.5 (2) 3: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__ Symbol table '.symtab' contains 64 entries: Num: Value Size Type Bind Vis Ndx Name 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND 1: 0000000000400238 0 SECTION LOCAL DEFAULT 1 2: 0000000000400254 0 SECTION LOCAL DEFAULT 2 3: 0000000000400274 0 SECTION LOCAL DEFAULT 3 4: 0000000000400298 0 SECTION LOCAL DEFAULT 4 5: 00000000004002b8 0 SECTION LOCAL DEFAULT 5 6: 0000000000400318 0 SECTION LOCAL DEFAULT 6 7: 0000000000400358 0 SECTION LOCAL DEFAULT 7 8: 0000000000400360 0 SECTION LOCAL DEFAULT 8 9: 0000000000400380 0 SECTION LOCAL DEFAULT 9 10: 0000000000400398 0 SECTION LOCAL DEFAULT 10 11: 00000000004003c8 0 SECTION LOCAL DEFAULT 11 12: 00000000004003f0 0 SECTION LOCAL DEFAULT 12 13: 0000000000400420 0 SECTION LOCAL DEFAULT 13 14: 0000000000400430 0 SECTION LOCAL DEFAULT 14 15: 00000000004005b4 0 SECTION LOCAL DEFAULT 15 16: 00000000004005c0 0 SECTION LOCAL DEFAULT 16 17: 00000000004005e0 0 SECTION LOCAL DEFAULT 17 18: 0000000000400618 0 SECTION LOCAL DEFAULT 18 19: 0000000000600e10 0 SECTION LOCAL DEFAULT 19 20: 0000000000600e18 0 SECTION LOCAL DEFAULT 20 21: 0000000000600e20 0 SECTION LOCAL DEFAULT 21 22: 0000000000600e28 0 SECTION LOCAL DEFAULT 22 23: 0000000000600ff8 0 SECTION LOCAL DEFAULT 23 24: 0000000000601000 0 SECTION LOCAL DEFAULT 24 25: 0000000000601028 0 SECTION LOCAL DEFAULT 25 26: 000000000060102c 0 SECTION LOCAL DEFAULT 26 27: 0000000000000000 0 SECTION LOCAL DEFAULT 27 28: 0000000000000000 0 FILE LOCAL DEFAULT ABS crtstuff.c 29: 0000000000600e20 0 OBJECT LOCAL DEFAULT 21 __JCR_LIST__ 30: 0000000000400460 0 FUNC LOCAL DEFAULT 14 deregister_tm_clones 31: 0000000000400490 0 FUNC LOCAL DEFAULT 14 register_tm_clones 32: 00000000004004d0 0 FUNC LOCAL DEFAULT 14 __do_global_dtors_aux 33: 000000000060102c 1 OBJECT LOCAL DEFAULT 26 completed.6355 34: 0000000000600e18 0 OBJECT LOCAL DEFAULT 20 __do_global_dtors_aux_fin 35: 00000000004004f0 0 FUNC LOCAL DEFAULT 14 frame_dummy 36: 0000000000600e10 0 OBJECT LOCAL DEFAULT 19 __frame_dummy_init_array_ 37: 0000000000000000 0 FILE LOCAL DEFAULT ABS test.c 38: 0000000000000000 0 FILE LOCAL DEFAULT ABS crtstuff.c 39: 0000000000400708 0 OBJECT LOCAL DEFAULT 18 __FRAME_END__ 40: 0000000000600e20 0 OBJECT LOCAL DEFAULT 21 __JCR_END__ 41: 0000000000000000 0 FILE LOCAL DEFAULT ABS 42: 0000000000600e18 0 NOTYPE LOCAL DEFAULT 19 __init_array_end 43: 0000000000600e28 0 OBJECT LOCAL DEFAULT 22 _DYNAMIC 44: 0000000000600e10 0 NOTYPE LOCAL DEFAULT 19 __init_array_start 45: 00000000004005e0 0 NOTYPE LOCAL DEFAULT 17 __GNU_EH_FRAME_HDR 46: 0000000000601000 0 OBJECT LOCAL DEFAULT 24 _GLOBAL_OFFSET_TABLE_ 47: 00000000004005b0 2 FUNC GLOBAL DEFAULT 14 __libc_csu_fini 48: 0000000000601028 0 NOTYPE WEAK DEFAULT 25 data_start 49: 000000000060102c 0 NOTYPE GLOBAL DEFAULT 25 _edata 50: 00000000004005b4 0 FUNC GLOBAL DEFAULT 15 _fini 51: 0000000000000000 0 FUNC GLOBAL DEFAULT UND printf@@GLIBC_2.2.5 52: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main@@GLIBC_ 53: 0000000000601028 0 NOTYPE GLOBAL DEFAULT 25 __data_start 54: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__ 55: 00000000004005c8 0 OBJECT GLOBAL HIDDEN 16 __dso_handle 56: 00000000004005c0 4 OBJECT GLOBAL DEFAULT 16 _IO_stdin_used 57: 0000000000400540 101 FUNC GLOBAL DEFAULT 14 __libc_csu_init 58: 0000000000601030 0 NOTYPE GLOBAL DEFAULT 26 _end 59: 0000000000400430 0 FUNC GLOBAL DEFAULT 14 _start 60: 000000000060102c 0 NOTYPE GLOBAL DEFAULT 26 __bss_start 61: 000000000040051d 31 FUNC GLOBAL DEFAULT 14 main 62: 0000000000601030 0 OBJECT GLOBAL HIDDEN 25 __TMC_END__ 63: 00000000004003c8 0 FUNC GLOBAL DEFAULT 11 _init Version symbols section '.gnu.version' contains 4 entries: Addr: 0000000000400358 Offset: 0x000358 Link: 5 (.dynsym) 000: 0 (*local*) 2 (GLIBC_2.2.5) 2 (GLIBC_2.2.5) 0 (*local*) Version needs section '.gnu.version_r' contains 1 entries: Addr: 0x0000000000400360 Offset: 0x000360 Link: 6 (.dynstr) 000000: Version: 1 File: libc.so.6 Cnt: 1 0x0010: Name: GLIBC_2.2.5 Flags: none Version: 2 Displaying notes found at file offset 0x00000254 with length 0x00000020: Owner Data size Description GNU 0x00000010 NT_GNU_ABI_TAG (ABI version tag) OS: Linux, ABI: 2.6.32 Displaying notes found at file offset 0x00000274 with length 0x00000024: Owner Data size Description GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring) Build ID: 4d903e1d996808c09e73c09ebeb0335e806b13db
objdump -d a.out
a.out: file format elf64-x86-64 Disassembly of section .init: 00000000004003c8 <_init>: 4003c8: 48 83 ec 08 sub $0x8,%rsp 4003cc: 48 8b 05 25 0c 20 00 mov 0x200c25(%rip),%rax # 600ff8 <__gmon_start__> 4003d3: 48 85 c0 test %rax,%rax 4003d6: 74 05 je 4003dd <_init+0x15> 4003d8: e8 43 00 00 00 callq 400420 <.plt.got> 4003dd: 48 83 c4 08 add $0x8,%rsp 4003e1: c3 retq Disassembly of section .plt: 00000000004003f0 <.plt>: 4003f0: ff 35 12 0c 20 00 pushq 0x200c12(%rip) # 601008 <_GLOBAL_OFFSET_TABLE_+0x8> 4003f6: ff 25 14 0c 20 00 jmpq *0x200c14(%rip) # 601010 <_GLOBAL_OFFSET_TABLE_+0x10> 4003fc: 0f 1f 40 00 nopl 0x0(%rax) 0000000000400400: 400400: ff 25 12 0c 20 00 jmpq *0x200c12(%rip) # 601018 400406: 68 00 00 00 00 pushq $0x0 40040b: e9 e0 ff ff ff jmpq 4003f0 <.plt> 0000000000400410 <__libc_start_main@plt>: 400410: ff 25 0a 0c 20 00 jmpq *0x200c0a(%rip) # 601020 <__libc_start_main@GLIBC_2.2.5> 400416: 68 01 00 00 00 pushq $0x1 40041b: e9 d0 ff ff ff jmpq 4003f0 <.plt> Disassembly of section .plt.got: 0000000000400420 <.plt.got>: 400420: ff 25 d2 0b 20 00 jmpq *0x200bd2(%rip) # 600ff8 <__gmon_start__> 400426: 66 90 xchg %ax,%ax Disassembly of section .text: 0000000000400430 <_start>: 400430: 31 ed xor %ebp,%ebp 400432: 49 89 d1 mov %rdx,%r9 400435: 5e pop %rsi 400436: 48 89 e2 mov %rsp,%rdx 400439: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp 40043d: 50 push %rax 40043e: 54 push %rsp 40043f: 49 c7 c0 b0 05 40 00 mov $0x4005b0,%r8 400446: 48 c7 c1 40 05 40 00 mov $0x400540,%rcx 40044d: 48 c7 c7 1d 05 40 00 mov $0x40051d,%rdi 400454: e8 b7 ff ff ff callq 400410 <__libc_start_main@plt> 400459: f4 hlt 40045a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 0000000000400460 : 400460: b8 37 10 60 00 mov $0x601037,%eax 400465: 55 push %rbp 400466: 48 2d 30 10 60 00 sub $0x601030,%rax 40046c: 48 83 f8 0e cmp $0xe,%rax 400470: 48 89 e5 mov %rsp,%rbp 400473: 77 02 ja 400477 400475: 5d pop %rbp 400476: c3 retq 400477: b8 00 00 00 00 mov $0x0,%eax 40047c: 48 85 c0 test %rax,%rax 40047f: 74 f4 je 400475 400481: 5d pop %rbp 400482: bf 30 10 60 00 mov $0x601030,%edi 400487: ff e0 jmpq *%rax 400489: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) 0000000000400490 : 400490: b8 30 10 60 00 mov $0x601030,%eax 400495: 55 push %rbp 400496: 48 2d 30 10 60 00 sub $0x601030,%rax 40049c: 48 c1 f8 03 sar $0x3,%rax 4004a0: 48 89 e5 mov %rsp,%rbp 4004a3: 48 89 c2 mov %rax,%rdx 4004a6: 48 c1 ea 3f shr $0x3f,%rdx 4004aa: 48 01 d0 add %rdx,%rax 4004ad: 48 d1 f8 sar %rax 4004b0: 75 02 jne 4004b4 4004b2: 5d pop %rbp 4004b3: c3 retq 4004b4: ba 00 00 00 00 mov $0x0,%edx 4004b9: 48 85 d2 test %rdx,%rdx 4004bc: 74 f4 je 4004b2 4004be: 5d pop %rbp 4004bf: 48 89 c6 mov %rax,%rsi 4004c2: bf 30 10 60 00 mov $0x601030,%edi 4004c7: ff e2 jmpq *%rdx 4004c9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) 00000000004004d0 <__do_global_dtors_aux>: 4004d0: 80 3d 55 0b 20 00 00 cmpb $0x0,0x200b55(%rip) # 60102c <_edata> 4004d7: 75 11 jne 4004ea <__do_global_dtors_aux+0x1a> 4004d9: 55 push %rbp 4004da: 48 89 e5 mov %rsp,%rbp 4004dd: e8 7e ff ff ff callq 400460 4004e2: 5d pop %rbp 4004e3: c6 05 42 0b 20 00 01 movb $0x1,0x200b42(%rip) # 60102c <_edata> 4004ea: f3 c3 repz retq 4004ec: 0f 1f 40 00 nopl 0x0(%rax) 00000000004004f0 : 4004f0: 48 83 3d 28 09 20 00 cmpq $0x0,0x200928(%rip) # 600e20 <__JCR_END__> 4004f7: 00 4004f8: 74 1e je 400518 4004fa: b8 00 00 00 00 mov $0x0,%eax 4004ff: 48 85 c0 test %rax,%rax 400502: 74 14 je 400518 400504: 55 push %rbp 400505: bf 20 0e 60 00 mov $0x600e20,%edi 40050a: 48 89 e5 mov %rsp,%rbp 40050d: ff d0 callq *%rax 40050f: 5d pop %rbp 400510: e9 7b ff ff ff jmpq 400490 400515: 0f 1f 00 nopl (%rax) 400518: e9 73 ff ff ff jmpq 400490 000000000040051d : 40051d: 55 push %rbp 40051e: 48 89 e5 mov %rsp,%rbp 400521: be d0 05 40 00 mov $0x4005d0,%esi 400526: bf dd 05 40 00 mov $0x4005dd,%edi 40052b: b8 00 00 00 00 mov $0x0,%eax 400530: e8 cb fe ff ff callq 400400 400535: b8 01 00 00 00 mov $0x1,%eax 40053a: 5d pop %rbp 40053b: c3 retq 40053c: 0f 1f 40 00 nopl 0x0(%rax) 0000000000400540 <__libc_csu_init>: 400540: 41 57 push %r15 400542: 41 89 ff mov %edi,%r15d 400545: 41 56 push %r14 400547: 49 89 f6 mov %rsi,%r14 40054a: 41 55 push %r13 40054c: 49 89 d5 mov %rdx,%r13 40054f: 41 54 push %r12 400551: 4c 8d 25 b8 08 20 00 lea 0x2008b8(%rip),%r12 # 600e10 <__frame_dummy_init_array_entry> 400558: 55 push %rbp 400559: 48 8d 2d b8 08 20 00 lea 0x2008b8(%rip),%rbp # 600e18 <__init_array_end> 400560: 53 push %rbx 400561: 4c 29 e5 sub %r12,%rbp 400564: 31 db xor %ebx,%ebx 400566: 48 c1 fd 03 sar $0x3,%rbp 40056a: 48 83 ec 08 sub $0x8,%rsp 40056e: e8 55 fe ff ff callq 4003c8 <_init> 400573: 48 85 ed test %rbp,%rbp 400576: 74 1e je 400596 <__libc_csu_init+0x56> 400578: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) 40057f: 00 400580: 4c 89 ea mov %r13,%rdx 400583: 4c 89 f6 mov %r14,%rsi 400586: 44 89 ff mov %r15d,%edi 400589: 41 ff 14 dc callq *(%r12,%rbx,8) 40058d: 48 83 c3 01 add $0x1,%rbx 400591: 48 39 eb cmp %rbp,%rbx 400594: 75 ea jne 400580 <__libc_csu_init+0x40> 400596: 48 83 c4 08 add $0x8,%rsp 40059a: 5b pop %rbx 40059b: 5d pop %rbp 40059c: 41 5c pop %r12 40059e: 41 5d pop %r13 4005a0: 41 5e pop %r14 4005a2: 41 5f pop %r15 4005a4: c3 retq 4005a5: 90 nop 4005a6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 4005ad: 00 00 00 00000000004005b0 <__libc_csu_fini>: 4005b0: f3 c3 repz retq Disassembly of section .fini: 00000000004005b4 <_fini>: 4005b4: 48 83 ec 08 sub $0x8,%rsp 4005b8: 48 83 c4 08 add $0x8,%rsp 4005bc: c3 retq
引用外部函数,但是不链接 gcc -c test.c ,生成可执行文件,就得连接因为这里的sum只是声明并没有实现所以无法生成可执行文件。
#includeextern int sum(int a, int b); int main() { printf("%d", sum(1, 2)); printf("%s", "hello world!"); return 1; }
这里我们使用命令来查看ELF文件格式 readelf -a test.o
ELF Header: Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 Class: ELF64 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: REL (Relocatable file) Machine: Advanced Micro Devices X86-64 Version: 0x1 Entry point address: 0x0 Start of program headers: 0 (bytes into file) Start of section headers: 832 (bytes into file) Flags: 0x0 Size of this header: 64 (bytes) Size of program headers: 0 (bytes) Number of program headers: 0 Size of section headers: 64 (bytes) Number of section headers: 13 Section header string table index: 12 Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [ 0] NULL 0000000000000000 00000000 0000000000000000 0000000000000000 0 0 0 [ 1] .text PROGBITS 0000000000000000 00000040 000000000000003f 0000000000000000 AX 0 0 1 [ 2] .rela.text RELA 0000000000000000 00000230 0000000000000090 0000000000000018 I 10 1 8 [ 3] .data PROGBITS 0000000000000000 0000007f 0000000000000000 0000000000000000 WA 0 0 1 [ 4] .bss NOBITS 0000000000000000 0000007f 0000000000000000 0000000000000000 WA 0 0 1 [ 5] .rodata PROGBITS 0000000000000000 0000007f 0000000000000013 0000000000000000 A 0 0 1 [ 6] .comment PROGBITS 0000000000000000 00000092 000000000000002e 0000000000000001 MS 0 0 1 [ 7] .note.GNU-stack PROGBITS 0000000000000000 000000c0 0000000000000000 0000000000000000 0 0 1 [ 8] .eh_frame PROGBITS 0000000000000000 000000c0 0000000000000038 0000000000000000 A 0 0 8 [ 9] .rela.eh_frame RELA 0000000000000000 000002c0 0000000000000018 0000000000000018 I 10 8 8 [10] .symtab SYMTAB 0000000000000000 000000f8 0000000000000120 0000000000000018 11 9 8 [11] .strtab STRTAB 0000000000000000 00000218 0000000000000018 0000000000000000 0 0 1 [12] .shstrtab STRTAB 0000000000000000 000002d8 0000000000000061 0000000000000000 0 0 1 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings), I (info), L (link order), O (extra OS processing required), G (group), T (TLS), C (compressed), x (unknown), o (OS specific), E (exclude), l (large), p (processor specific) There are no section groups in this file. There are no program headers in this file. Relocation section '.rela.text' at offset 0x230 contains 6 entries: Offset Info Type Sym. Value Sym. Name + Addend 00000000000f 000a00000002 R_X86_64_PC32 0000000000000000 sum - 4 000000000016 00050000000a R_X86_64_32 0000000000000000 .rodata + 0 000000000020 000b00000002 R_X86_64_PC32 0000000000000000 printf - 4 000000000025 00050000000a R_X86_64_32 0000000000000000 .rodata + 3 00000000002a 00050000000a R_X86_64_32 0000000000000000 .rodata + 10 000000000034 000b00000002 R_X86_64_PC32 0000000000000000 printf - 4 Relocation section '.rela.eh_frame' at offset 0x2c0 contains 1 entries: Offset Info Type Sym. Value Sym. Name + Addend 000000000020 000200000002 R_X86_64_PC32 0000000000000000 .text + 0 The decoding of unwind sections for machine type Advanced Micro Devices X86-64 is not currently supported. Symbol table '.symtab' contains 12 entries: Num: Value Size Type Bind Vis Ndx Name 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND 1: 0000000000000000 0 FILE LOCAL DEFAULT ABS data.c 2: 0000000000000000 0 SECTION LOCAL DEFAULT 1 3: 0000000000000000 0 SECTION LOCAL DEFAULT 3 4: 0000000000000000 0 SECTION LOCAL DEFAULT 4 5: 0000000000000000 0 SECTION LOCAL DEFAULT 5 6: 0000000000000000 0 SECTION LOCAL DEFAULT 7 7: 0000000000000000 0 SECTION LOCAL DEFAULT 8 8: 0000000000000000 0 SECTION LOCAL DEFAULT 6 9: 0000000000000000 63 FUNC GLOBAL DEFAULT 1 main 10: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND sum 11: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND printf
使用 objdump -d test.o
Disassembly of section .text: // 左边是真实的指令,右边是符号 0000000000000000: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 48 83 ec 10 sub $0x10,%rsp 8: be 02 00 00 00 mov $0x2,%esi d: bf 01 00 00 00 mov $0x1,%edi 12: b8 00 00 00 00 mov $0x0,%eax 17: e8 00 00 00 00 callq 1c 1c: 89 45 fc mov %eax,-0x4(%rbp) 1f: b8 01 00 00 00 mov $0x1,%eax 24: c9 leaveq 25: c3 retq
生成符号表,如果我们将其中一个符号表删除,他是不是就找不到映射关系了。就无法进行链接了。
使用 strip sum.o 命令即可删除符号表。
这就是静态链接,我从另一个符号表中,找到我需要的函数标记,将对应的地址找到,替换我的符号。
#includeint main() { int a = sum(1, 2); return 1; } int sum(int a, int b) { return a + b; }
gcc -c test.c
Relocation section '.rela.text' at offset 0x1d8 contains 1 entries: Offset Info Type Sym. Value Sym. Name + Addend 000000000018 000900000002 R_X86_64_PC32 0000000000000000 sum - 4 Relocation section '.rela.eh_frame' at offset 0x1f0 contains 1 entries: Offset Info Type Sym. Value Sym. Name + Addend 000000000020 000200000002 R_X86_64_PC32 0000000000000000 .text + 0
gcc -c test.c sum.o
Relocation section '.rela.dyn' at offset 0x360 contains 1 entries: Offset Info Type Sym. Value Sym. Name + Addend 000000600ff8 000200000006 R_X86_64_GLOB_DAT 0000000000000000 __gmon_start__ + 0 Relocation section '.rela.plt' at offset 0x378 contains 1 entries: Offset Info Type Sym. Value Sym. Name + Addend 000000601018 000100000007 R_X86_64_JUMP_SLO 0000000000000000 __libc_start_main@GLIBC_2.2.5 + 0
编译成可执行文件之后,我们发现 sum 的符号没有了
a.out : 可执行文件 00000000004004cd: 4004cd: 55 push %rbp 4004ce: 48 89 e5 mov %rsp,%rbp 4004d1: 48 83 ec 10 sub $0x10,%rsp 4004d5: be 02 00 00 00 mov $0x2,%esi 4004da: bf 01 00 00 00 mov $0x1,%edi 4004df: b8 00 00 00 00 mov $0x0,%eax 4004e4: e8 0a 00 00 00 callq 4004f3 // 经过链接后变为地址 4004e9: 89 45 fc mov %eax,-0x4(%rbp) 4004ec: b8 01 00 00 00 mov $0x1,%eax 4004f1: c9 leaveq 4004f2: c3 retq test.o 文件 可链接文件 0000000000000000 : 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 48 83 ec 10 sub $0x10,%rsp 8: be 02 00 00 00 mov $0x2,%esi d: bf 01 00 00 00 mov $0x1,%edi 12: b8 00 00 00 00 mov $0x0,%eax 17: e8 00 00 00 00 callq 1c // 这是sum函数 1c: 89 45 fc mov %eax,-0x4(%rbp) 1f: b8 01 00 00 00 mov $0x1,%eax 24: c9 leaveq 25: c3 retq
静态链接的时候会在编译的时候就将地址重定位。就需要将所有的文件都聚合起来。在更换机器的之后直接可以运行起来。
如果我们有两个相同的程序,都使用相同的代码,这就会导致内存中有亢余代码,那我们想着是不是可以两个程序公用以一段代码。所以就出现了动态链接器。
因为两个程序使用的是虚拟地址,而并不是物理地址,所以公有代码对于两个程序来说,需要的是虚拟地址,而不能是物理地址。如何获取到公有代码的地址端,根据当前执行的指令地址再去加上偏移量得到公有代码的物理地址。每个程序对应的偏移量都不一致,是根据指令计算而来的。
gcc -shared -o libmysum.so sum.c
gcc data.o libmysum.so 编译
./a.out 运行,发现报错,当前运行时库,找不到。
./a.out: error while loading shared libraries: libmysum.so: cannot open shared object file: No such file or directory
编译时,-fPIC 代码与地址无关。
/usr/bin/ld: /tmp/cceZUIwy.o: relocation R_X86_64_32 against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Nonrepresentable section on output
collect2: error: ld returned 1 exit status
这里为了让他找到这个库,需要将他加入到环境变量中。
LD_LIBRARY_PATH |
程序加载运行期间查找动态链接库的路径 |
LIBRARY_PATH |
程序编译期间查找动态链接库时的路径 |
export LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH。.表示当前文件目录下。
这样就可以运行了
ELF Header: Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 Class: ELF64 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: DYN (Shared object file) Machine: Advanced Micro Devices X86-64 Version: 0x1 Entry point address: 0x530 Start of program headers: 64 (bytes into file) Start of section headers: 6128 (bytes into file) Flags: 0x0 Size of this header: 64 (bytes) Size of program headers: 56 (bytes) Number of program headers: 7 Size of section headers: 64 (bytes) Number of section headers: 27 Section header string table index: 26 Relocation section '.rela.dyn' at offset 0x430 contains 8 entries: Offset Info Type Sym. Value Sym. Name + Addend 000000200e28 000000000008 R_X86_64_RELATIVE 5e0 000000200e30 000000000008 R_X86_64_RELATIVE 5a0 000000200e40 000000000008 R_X86_64_RELATIVE 200e40 000000200fd8 000100000006 R_X86_64_GLOB_DAT 0000000000000000 _ITM_deregisterTMClone + 0 000000200fe0 000200000006 R_X86_64_GLOB_DAT 0000000000000000 __gmon_start__ + 0 000000200fe8 000300000006 R_X86_64_GLOB_DAT 0000000000000000 _Jv_RegisterClasses + 0 000000200ff0 000400000006 R_X86_64_GLOB_DAT 0000000000000000 _ITM_registerTMCloneTa + 0 000000200ff8 000500000006 R_X86_64_GLOB_DAT 0000000000000000 __cxa_finalize@GLIBC_2.2.5 + 0 The decoding of unwind sections for machine type Advanced Micro Devices X86-64 is not currently supported. Symbol table '.dynsym' contains 12 entries: Num: Value Size Type Bind Vis Ndx Name 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND 1: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_deregisterTMCloneTab 2: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__ 3: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _Jv_RegisterClasses 4: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_registerTMCloneTable 5: 0000000000000000 0 FUNC WEAK DEFAULT UND __cxa_finalize@GLIBC_2.2.5 (2) 6: 0000000000201018 0 NOTYPE GLOBAL DEFAULT 21 _edata 7: 0000000000201020 0 NOTYPE GLOBAL DEFAULT 22 _end 8: 0000000000000615 20 FUNC GLOBAL DEFAULT 11 sum 9: 0000000000201018 0 NOTYPE GLOBAL DEFAULT 22 __bss_start 10: 00000000000004f0 0 FUNC GLOBAL DEFAULT 8 _init 11: 000000000000062c 0 FUNC GLOBAL DEFAULT 12 _fini
现在我们看到
ELF有三种文件类型
readelf -a a.out
Relocation section '.rela.plt' at offset 0x480 contains 3 entries: Offset Info Type Sym. Value Sym. Name + Addend 000000601018 000100000007 R_X86_64_JUMP_SLO 0000000000000000 printf@GLIBC_2.2.5 + 0 000000601020 000200000007 R_X86_64_JUMP_SLO 0000000000000000 __libc_start_main@GLIBC_2.2.5 + 0 000000601028 000400000007 R_X86_64_JUMP_SLO 0000000000000000 sum + 0
extern int global_data; static int static_data = 33; int data = 11; extern int global_sum(int a, int b); static int static_sum(int a, int b) { return a + b + global_data + static_data + data; }; int sum(int a, int b) { return a + b + global_data + static_data + data; }; int main() { printf("%d", global_data); printf("%d", static_data); printf("%d", data); global_sum(1, 2); static_sum(1, 2); sum(1, 2); return 1; }
我们发现动态链接器和静态链接器的项不一样。
静态链接器生成的表项是 .rela.text 动态链接器生成的表项是 .rela.dyn
静态链接器找到这些表项的偏移量即可计算出而地址,所以他可以写死,而动态链接器却不行,因为在两个程序操作系统给分配的地址不一样,虽然都叫同一个变量名,但是这个数据是共享的,如果我将这个地址写死,那么怎么去知道这个数据现在归谁去操作,有安全性问题。
我们找到 200fd0 的地址
我们发现他使用了 IP 寄存器(指令指针寄存器)+ 一个偏移地址 从而获取到了这个数据。而相对这个数据是谁放进去的?动态链接器
每个程序都会给分配内存,那么我给整个内存分配的时候预先准备一张表,如果程序需要访问这个数据,你先掉用动态链接器,我给你将这个地址填写上去。
现在我们发现了他是使用偏移量做的相对地址,() 解引用,将 rip 寄存器中存放的地址取出来。
然后我我们找到这个地址,发现他是 GOT 这是一个表项。
动态链接器会将数据的真实的地址存放在GOT中,函数执行到当前的指令解引用就会将其真实地址获取出来,查表,然后即可访问的数据。
现在访问数据的问题解决了。
现在来解决函数的问题。
函数因为是懒加载的方式,在程序运行的时候,使用当前函数的时候才会加载进内存中,那么也就无法使用偏移的方式做函数的动态加载,因为函数没有加载,没法预先知道他的偏移量在哪里。
我们观察到这里调用了一个 PLT 的东西。跳转到相应的地方。640 650 660我们去找到这个地址。
我们发现有这样的表项,640
然后跳转到
又是相同的做法,这次叫做 got.plt
然后将 0x0 压入栈中。这个啥含义呢?
我们发现这里有个表项,三个,从0 - 2 对应的是数组下标。
然后继续跳转到 630 而这三个函数都一起跳转到 同一个地址,这个代码其实就是动态链接库的函数的地址,使用这个函数,这个函数的功能就是 将操作系统懒加载的分配的地址写入到指定的位置上去。
然后这个函数的地址就找到了。完成调用
我们来总结一下:
GOT:当访问的是数据的时候使用 GOT
PLT:当访问的是函数的时候使用 PIT + GOT
我们来梳理流程。
访问变量时:
访问函数时:
这个时候我们来看一个 ELF 的文档。我们来对照一个程序来看。
#includeint main() { printf("%s", "hello,world"); return 1; }
ELF Header: Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 Class: ELF64 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: Advanced Micro Devices X86-64 Version: 0x1 Entry point address: 0x400430 // 程序的入口 Start of program headers: 64 (bytes into file) Start of section headers: 6456 (bytes into file) Flags: 0x0 Size of this header: 64 (bytes) Size of program headers: 56 (bytes) Number of program headers: 9 Size of section headers: 64 (bytes) Number of section headers: 31 Section header string table index: 30 Elf file type is EXEC (Executable file) Entry point 0x400430 There are 9 program headers, starting at offset 64 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align PHDR 0x0000000000000040 0x0000000000400040 0x0000000000400040 0x00000000000001f8 0x00000000000001f8 R E 8 INTERP 0x0000000000000238 0x0000000000400238 0x0000000000400238 0x000000000000001c 0x000000000000001c R 1 [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2] LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000 0x000000000000070c 0x000000000000070c R E 200000 LOAD 0x0000000000000e10 0x0000000000600e10 0x0000000000600e10 0x000000000000021c 0x0000000000000220 RW 200000 DYNAMIC 0x0000000000000e28 0x0000000000600e28 0x0000000000600e28 0x00000000000001d0 0x00000000000001d0 RW 8 NOTE 0x0000000000000254 0x0000000000400254 0x0000000000400254 0x0000000000000044 0x0000000000000044 R 4 GNU_EH_FRAME 0x00000000000005e0 0x00000000004005e0 0x00000000004005e0 0x0000000000000034 0x0000000000000034 R 4 GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 RW 10 GNU_RELRO 0x0000000000000e10 0x0000000000600e10 0x0000000000600e10 0x00000000000001f0 0x00000000000001f0 R 1 Section to Segment mapping: Segment Sections... 00 01 .interp 02 .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .plt.got .text .fini .rodata .eh_frame_hdr .eh_frame 03 .init_array .fini_array .jcr .dynamic .got .got.plt .data .bss 04 .dynamic 05 .note.ABI-tag .note.gnu.build-id 06 .eh_frame_hdr 07 08 .init_array .fini_array .jcr .dynamic .got
对比文档我们看到
代码的入口是
Entry point address: 0x400430 // 程序的入口
LOAD 0x0000000000000000 0x0000000000400000
为什么程序的起始地址在 0x0000000000400000
然后根据地址顺序排序:
那么当前代码的入口函数在哪里?
main 函数地址在 40051d 很明显这里不是起始地址。
Entry point address: 0x400430 // 代码的起始地址在这里
得出结论,这段代码是链接器生成的,而 mian 是一个回调函数。
/* This is the canonical entry point, usually the first thing in the text segment. The SVR4/i386 ABI (pages 3-31, 3-32) says that when the entry point runs, most registers' values are unspecified, except for: %edx Contains a function pointer to be registered with `atexit'. This is how the dynamic linker arranges to have DT_FINI functions called for shared libraries that have been loaded before this code runs. %esp The stack contains the arguments and environment: // main 函数入参 0(%esp) argc 4(%esp) argv[0] ... (4*argc)(%esp) NULL (4*(argc+1))(%esp) envp[0] ... NULL */
SVR4/i386 ABI (pages 3-31, 3-32)
ABI 的文档
当调用时,首先执行 .init , 当函数执行完成之后 调用 .finit
// execve("./a.out", ["./a.out"], 0x7fff77b31400 /* 25 vars */) = 0 brk(NULL) = 0x76f000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4d5535a000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=22025, ...}) = 0 mmap(NULL, 22025, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f4d55354000 close(3) = 0 open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`&\2\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=2156592, ...}) = 0 mmap(NULL, 3985920, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f4d54d6c000 mprotect(0x7f4d54f30000, 2093056, PROT_NONE) = 0 mmap(0x7f4d5512f000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1c3000) = 0x7f4d5512f000 mmap(0x7f4d55135000, 16896, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f4d55135000 close(3) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4d55353000 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4d55351000 arch_prctl(ARCH_SET_FS, 0x7f4d55351740) = 0 access("/etc/sysconfig/strcasecmp-nonascii", F_OK) = -1 ENOENT (No such file or directory) access("/etc/sysconfig/strcasecmp-nonascii", F_OK) = -1 ENOENT (No such file or directory) mprotect(0x7f4d5512f000, 16384, PROT_READ) = 0 mprotect(0x600000, 4096, PROT_READ) = 0 mprotect(0x7f4d5535b000, 4096, PROT_READ) = 0 munmap(0x7f4d55354000, 22025) = 0 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4d55359000 write(1, "./a.out", 7./a.out) = 7 exit_group(1) = ? +++ exited with 1 +++
#includeint main(int agrc, int **argv) { printf("%s", argv[0]); return 1; }
我们就来看一个 gcc 的源码
sysdeps\sh\start.S
csu\libc-start.c
STATIC int LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL), int argc, char **argv, #ifdef LIBC_START_MAIN_AUXVEC_ARG ElfW(auxv_t) *auxvec, #endif // init 函数定义 __typeof (main) init, void (*fini) (void), void (*rtld_fini) (void), void *stack_end) { #ifndef SHARED char **ev = &argv[argc + 1]; __environ = ev; /* Store the lowest stack address. This is done in ld.so if this is the code for the DSO. */ __libc_stack_end = stack_end; # ifdef HAVE_AUX_VECTOR /* First process the auxiliary vector since we need to find the program header to locate an eventually present PT_TLS entry. */ # ifndef LIBC_START_MAIN_AUXVEC_ARG ElfW(auxv_t) *auxvec; { char **evp = ev; while (*evp++ != NULL) ; auxvec = (ElfW(auxv_t) *) evp; } # endif _dl_aux_init (auxvec); if (GL(dl_phdr) == NULL) # endif { /* Starting from binutils-2.23, the linker will define the magic symbol __ehdr_start to point to our own ELF header if it is visible in a segment that also includes the phdrs. So we can set up _dl_phdr and _dl_phnum even without any information from auxv. */ extern const ElfW(Ehdr) __ehdr_start # if BUILD_PIE_DEFAULT __attribute__ ((visibility ("hidden"))); # else __attribute__ ((weak, visibility ("hidden"))); if (&__ehdr_start != NULL) # endif { assert (__ehdr_start.e_phentsize == sizeof *GL(dl_phdr)); GL(dl_phdr) = (const void *) &__ehdr_start + __ehdr_start.e_phoff; GL(dl_phnum) = __ehdr_start.e_phnum; } } __tunables_init (__environ); ARCH_INIT_CPU_FEATURES (); /* Do static pie self relocation after tunables and cpu features are setup for ifunc resolvers. Before this point relocations must be avoided. */ _dl_relocate_static_pie (); /* Perform IREL{,A} relocations. */ ARCH_SETUP_IREL (); /* The stack guard goes into the TCB, so initialize it early. */ ARCH_SETUP_TLS (); /* In some architectures, IREL{,A} relocations happen after TLS setup in order to let IFUNC resolvers benefit from TCB information, e.g. powerpc's hwcap and platform fields available in the TCB. */ ARCH_APPLY_IREL (); /* Set up the stack checker's canary. */ uintptr_t stack_chk_guard = _dl_setup_stack_chk_guard (_dl_random); # ifdef THREAD_SET_STACK_GUARD THREAD_SET_STACK_GUARD (stack_chk_guard); # else __stack_chk_guard = stack_chk_guard; # endif /* Initialize libpthread if linked in. */ if (__pthread_initialize_minimal != NULL) __pthread_initialize_minimal (); /* Set up the pointer guard value. */ uintptr_t pointer_chk_guard = _dl_setup_pointer_guard (_dl_random, stack_chk_guard); # ifdef THREAD_SET_POINTER_GUARD THREAD_SET_POINTER_GUARD (pointer_chk_guard); # else __pointer_chk_guard_local = pointer_chk_guard; # endif #endif /* !SHARED */ /* Register the destructor of the dynamic linker if there is any. */ if (__glibc_likely (rtld_fini != NULL)) __cxa_atexit ((void (*) (void *)) rtld_fini, NULL, NULL); #ifndef SHARED /* Perform early initialization. In the shared case, this function is called from the dynamic loader as early as possible. */ __libc_early_init (true); /* Call the initializer of the libc. This is only needed here if we are compiling for the static library in which case we haven't run the constructors in `_dl_start_user'. */ __libc_init_first (argc, argv, __environ); /* Register the destructor of the statically-linked program. */ __cxa_atexit (call_fini, NULL, NULL); /* Some security at this point. Prevent starting a SUID binary where the standard file descriptors are not opened. We have to do this only for statically linked applications since otherwise the dynamic loader did the work already. */ if (__builtin_expect (__libc_enable_secure, 0)) __libc_check_standard_fds (); #endif /* !SHARED */ /* Call the initializer of the program, if any. */ #ifdef SHARED if (__builtin_expect (GLRO(dl_debug_mask) & DL_DEBUG_IMPCALLS, 0)) GLRO(dl_debug_printf) ("\ninitialize program: %s\n\n", argv[0]); if (init != NULL) /* This is a legacy program which supplied its own init routine. */ // 执行 init 函数 (*init) (argc, argv, __environ MAIN_AUXVEC_PARAM); else /* This is a current program. Use the dynamic segment to find constructors. */ call_init (argc, argv, __environ); /* Auditing checkpoint: we have a new object. */ _dl_audit_preinit (GL(dl_ns)[LM_ID_BASE]._ns_loaded); if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_IMPCALLS)) GLRO(dl_debug_printf) ("\ntransferring control: %s\n\n", argv[0]); #else /* !SHARED */ call_init (argc, argv, __environ); _dl_debug_initialize (0, LM_ID_BASE); #endif __libc_start_call_main (main, argc, argv MAIN_AUXVEC_PARAM); } __libc_start_call_main (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL), int argc, char **argv #ifdef LIBC_START_MAIN_AUXVEC_ARG , ElfW(auxv_t) *auxvec #endif ) { int result; /* Memory for the cancellation buffer. */ struct pthread_unwind_buf unwind_buf; int not_first_call; DIAG_PUSH_NEEDS_COMMENT; #if __GNUC_PREREQ (7, 0) /* This call results in a -Wstringop-overflow warning because struct pthread_unwind_buf is smaller than jmp_buf. setjmp and longjmp do not use anything beyond the common prefix (they never access the saved signal mask), so that is a false positive. */ DIAG_IGNORE_NEEDS_COMMENT (11, "-Wstringop-overflow="); #endif not_first_call = setjmp ((struct __jmp_buf_tag *) unwind_buf.cancel_jmp_buf); DIAG_POP_NEEDS_COMMENT; if (__glibc_likely (! not_first_call)) { struct pthread *self = THREAD_SELF; /* Store old info. */ unwind_buf.priv.data.prev = THREAD_GETMEM (self, cleanup_jmp_buf); unwind_buf.priv.data.cleanup = THREAD_GETMEM (self, cleanup); /* Store the new cleanup handler info. */ THREAD_SETMEM (self, cleanup_jmp_buf, &unwind_buf); /* Run the program. */ // 运行 main 函数 result = main (argc, argv, __environ MAIN_AUXVEC_PARAM); } else { /* Remove the thread-local data. */ __nptl_deallocate_tsd (); /* One less thread. Decrement the counter. If it is zero we terminate the entire process. */ result = 0; if (! atomic_decrement_and_test (&__nptl_nthreads)) /* Not much left to do but to exit the thread, not the process. */ while (1) INTERNAL_SYSCALL_CALL (exit, 0); } exit (result); }
ELF 规范了一个程序的内存布局,操作系统只需要根据ELF规划,加载即可。
然后加载到入口地址,开始执行第一个函数 _start 然后去加载动态链接库中的 __libc_start_main 函数,然后在__libc_start_main 函数中调用了_init 函数,在_init 函数中又继续调用 __gmon_start__,而在 __libc_start_main中也调用了 main 函数
而我们看 JVM 虚拟机是怎么样执行过程。
首先通过 javac 生成 IR 中间语言,然后将中间语言 IR 编译成 机器语言,先 IR 想到机器语言去运行,需要一个链接器 而 JVM 就是干这个事的。而 JVM 是一个动态链接库,他需要运行,则需要一个程序去加载他,那么 java 就是这个程序
那我们就来验证一下。使用 JDK源码
/* * Entry point. */ int JLI_Launch(int argc, char ** argv, /* main argc, argc */ int jargc, const char** jargv, /* java args */ int appclassc, const char** appclassv, /* app classpath */ const char* fullversion, /* full version defined */ const char* dotversion, /* dot version defined */ const char* pname, /* program name */ const char* lname, /* launcher name */ jboolean javaargs, /* JAVA_ARGS */ jboolean cpwildcard, /* classpath wildcard*/ jboolean javaw, /* windows-only javaw */ jint ergo /* ergonomics class policy */ ) { int mode = LM_UNKNOWN; char *what = NULL; char *cpath = 0; char *main_class = NULL; int ret; InvocationFunctions ifn; jlong start, end; char jvmpath[MAXPATHLEN]; char jrepath[MAXPATHLEN]; char jvmcfg[MAXPATHLEN]; _fVersion = fullversion; _dVersion = dotversion; _launcher_name = lname; _program_name = pname; _is_java_args = javaargs; _wc_enabled = cpwildcard; _ergo_policy = ergo; InitLauncher(javaw); DumpState(); if (JLI_IsTraceLauncher()) { int i; printf("Command line args:\n"); for (i = 0; i < argc ; i++) { printf("argv[%d] = %s\n", i, argv[i]); } AddOption("-Dsun.java.launcher.diag=true", NULL); } /* * Make sure the specified version of the JRE is running. * * There are three things to note about the SelectVersion() routine: * 1) If the version running isn't correct, this routine doesn't * return (either the correct version has been exec'd or an error * was issued). * 2) Argc and Argv in this scope are *not* altered by this routine. * It is the responsibility of subsequent code to ignore the * arguments handled by this routine. * 3) As a side-effect, the variable "main_class" is guaranteed to * be set (if it should ever be set). This isn't exactly the * poster child for structured programming, but it is a small * price to pay for not processing a jar file operand twice. * (Note: This side effect has been disabled. See comment on * bugid 5030265 below.) */ SelectVersion(argc, argv, &main_class); CreateExecutionEnvironment(&argc, &argv, jrepath, sizeof(jrepath), jvmpath, sizeof(jvmpath), jvmcfg, sizeof(jvmcfg)); ifn.CreateJavaVM = 0; ifn.GetDefaultJavaVMInitArgs = 0; if (JLI_IsTraceLauncher()) { start = CounterGet(); } // 加载JVM if (!LoadJavaVM(jvmpath, &ifn)) { return(6); } if (JLI_IsTraceLauncher()) { end = CounterGet(); } JLI_TraceLauncher("%ld micro seconds to LoadJavaVM\n", (long)(jint)Counter2Micros(end-start)); ++argv; --argc; if (IsJavaArgs()) { /* Preprocess wrapper arguments */ TranslateApplicationArgs(jargc, jargv, &argc, &argv); if (!AddApplicationOptions(appclassc, appclassv)) { return(1); } } else { /* Set default CLASSPATH */ cpath = getenv("CLASSPATH"); if (cpath == NULL) { cpath = "."; } SetClassPath(cpath); } /* Parse command line options; if the return value of * ParseArguments is false, the program should exit. */ if (!ParseArguments(&argc, &argv, &mode, &what, &ret, jrepath)) { return(ret); } /* Override class path if -jar flag was specified */ if (mode == LM_JAR) { SetClassPath(what); /* Override class path */ } /* set the -Dsun.java.command pseudo property */ SetJavaCommandLineProp(what, argc, argv); /* Set the -Dsun.java.launcher pseudo property */ SetJavaLauncherProp(); /* set the -Dsun.java.launcher.* platform properties */ SetJavaLauncherPlatformProps(); return JVMInit(&ifn, threadStackSize, argc, argv, mode, what, ret); } int JVMInit(InvocationFunctions* ifn, jlong threadStackSize, int argc, char **argv, int mode, char *what, int ret) { ShowSplashScreen(); return ContinueInNewThread(ifn, threadStackSize, argc, argv, mode, what, ret); } int ContinueInNewThread(InvocationFunctions* ifn, jlong threadStackSize, int argc, char **argv, int mode, char *what, int ret) { /* * If user doesn't specify stack size, check if VM has a preference. * Note that HotSpot no longer supports JNI_VERSION_1_1 but it will * return its default stack size through the init args structure. */ if (threadStackSize == 0) { struct JDK1_1InitArgs args1_1; memset((void*)&args1_1, 0, sizeof(args1_1)); args1_1.version = JNI_VERSION_1_1; ifn->GetDefaultJavaVMInitArgs(&args1_1); /* ignore return value */ if (args1_1.javaStackSize > 0) { threadStackSize = args1_1.javaStackSize; } } { /* Create a new thread to create JVM and invoke main method */ JavaMainArgs args; int rslt; args.argc = argc; args.argv = argv; args.mode = mode; args.what = what; args.ifn = *ifn; // JavaMain 入口函数 rslt = ContinueInNewThread0(JavaMain, threadStackSize, (void*)&args); /* If the caller has deemed there is an error we * simply return that, otherwise we return the value of * the callee */ return (ret != 0) ? ret : rslt; } } jboolean LoadJavaVM(const char *jvmpath, InvocationFunctions *ifn) { void *libjvm; JLI_TraceLauncher("JVM path is %s\n", jvmpath); libjvm = dlopen(jvmpath, RTLD_NOW + RTLD_GLOBAL); if (libjvm == NULL) { #if defined(__solaris__) && defined(__sparc) && !defined(_LP64) /* i.e. 32-bit sparc */ FILE * fp; Elf32_Ehdr elf_head; int count; int location; // 打开了动态链接库 fp = fopen(jvmpath, "r"); if (fp == NULL) { JLI_ReportErrorMessage(DLL_ERROR2, jvmpath, dlerror()); return JNI_FALSE; } /* read in elf header */ count = fread((void*)(&elf_head), sizeof(Elf32_Ehdr), 1, fp); fclose(fp); if (count < 1) { JLI_ReportErrorMessage(DLL_ERROR2, jvmpath, dlerror()); return JNI_FALSE; } /* * Check for running a server vm (compiled with -xarch=v8plus) * on a stock v8 processor. In this case, the machine type in * the elf header would not be included the architecture list * provided by the isalist command, which is turn is gotten from * sysinfo. This case cannot occur on 64-bit hardware and thus * does not have to be checked for in binaries with an LP64 data * model. */ if (elf_head.e_machine == EM_SPARC32PLUS) { char buf[257]; /* recommended buffer size from sysinfo man page */ long length; char* location; length = sysinfo(SI_ISALIST, buf, 257); if (length > 0) { location = JLI_StrStr(buf, "sparcv8plus "); if (location == NULL) { JLI_ReportErrorMessage(JVM_ERROR3); return JNI_FALSE; } } } #endif JLI_ReportErrorMessage(DLL_ERROR1, __LINE__); JLI_ReportErrorMessage(DLL_ERROR2, jvmpath, dlerror()); return JNI_FALSE; } // 动态接连这个 JNI_CreateJavaVM 这个函数 然后执行 ifn->CreateJavaVM = (CreateJavaVM_t) dlsym(libjvm, "JNI_CreateJavaVM"); if (ifn->CreateJavaVM == NULL) { JLI_ReportErrorMessage(DLL_ERROR2, jvmpath, dlerror()); return JNI_FALSE; } ifn->GetDefaultJavaVMInitArgs = (GetDefaultJavaVMInitArgs_t) dlsym(libjvm, "JNI_GetDefaultJavaVMInitArgs"); if (ifn->GetDefaultJavaVMInitArgs == NULL) { JLI_ReportErrorMessage(DLL_ERROR2, jvmpath, dlerror()); return JNI_FALSE; } ifn->GetCreatedJavaVMs = (GetCreatedJavaVMs_t) dlsym(libjvm, "JNI_GetCreatedJavaVMs"); if (ifn->GetCreatedJavaVMs == NULL) { JLI_ReportErrorMessage(DLL_ERROR2, jvmpath, dlerror()); return JNI_FALSE; } return JNI_TRUE; } int JNICALL JavaMain(void * _args) { JavaMainArgs *args = (JavaMainArgs *)_args; int argc = args->argc; char **argv = args->argv; int mode = args->mode; char *what = args->what; InvocationFunctions ifn = args->ifn; JavaVM *vm = 0; JNIEnv *env = 0; jclass mainClass = NULL; jclass appClass = NULL; // actual application class being launched jmethodID mainID; jobjectArray mainArgs; int ret = 0; jlong start, end; RegisterThread(); /* Initialize the virtual machine */ start = CounterGet(); if (!InitializeJVM(&vm, &env, &ifn)) { JLI_ReportErrorMessage(JVM_ERROR1); exit(1); } if (showSettings != NULL) { ShowSettings(env, showSettings); CHECK_EXCEPTION_LEAVE(1); } if (printVersion || showVersion) { PrintJavaVersion(env, showVersion); CHECK_EXCEPTION_LEAVE(0); if (printVersion) { LEAVE(); } } /* If the user specified neither a class name nor a JAR file */ if (printXUsage || printUsage || what == 0 || mode == LM_UNKNOWN) { PrintUsage(env, printXUsage); CHECK_EXCEPTION_LEAVE(1); LEAVE(); } FreeKnownVMs(); /* after last possible PrintUsage() */ if (JLI_IsTraceLauncher()) { end = CounterGet(); JLI_TraceLauncher("%ld micro seconds to InitializeJVM\n", (long)(jint)Counter2Micros(end-start)); } /* At this stage, argc/argv have the application's arguments */ if (JLI_IsTraceLauncher()){ int i; printf("%s is '%s'\n", launchModeNames[mode], what); printf("App's argc is %d\n", argc); for (i=0; i < argc; i++) { printf(" argv[%2d] = '%s'\n", i, argv[i]); } } ret = 1; /* * Get the application's main class. * * See bugid 5030265. The Main-Class name has already been parsed * from the manifest, but not parsed properly for UTF-8 support. * Hence the code here ignores the value previously extracted and * uses the pre-existing code to reextract the value. This is * possibly an end of release cycle expedient. However, it has * also been discovered that passing some character sets through * the environment has "strange" behavior on some variants of * Windows. Hence, maybe the manifest parsing code local to the * launcher should never be enhanced. * * Hence, future work should either: * 1) Correct the local parsing code and verify that the * Main-Class attribute gets properly passed through * all environments, * 2) Remove the vestages of maintaining main_class through * the environment (and remove these comments). * * This method also correctly handles launching existing JavaFX * applications that may or may not have a Main-Class manifest entry. */ mainClass = LoadMainClass(env, mode, what); CHECK_EXCEPTION_NULL_LEAVE(mainClass); /* * In some cases when launching an application that needs a helper, e.g., a * JavaFX application with no main method, the mainClass will not be the * applications own main class but rather a helper class. To keep things * consistent in the UI we need to track and report the application main class. */ appClass = GetApplicationClass(env); NULL_CHECK_RETURN_VALUE(appClass, -1); /* * PostJVMInit uses the class name as the application name for GUI purposes, * for example, on OSX this sets the application name in the menu bar for * both SWT and JavaFX. So we'll pass the actual application class here * instead of mainClass as that may be a launcher or helper class instead * of the application class. */ PostJVMInit(env, appClass, vm); /* * The LoadMainClass not only loads the main class, it will also ensure * that the main method's signature is correct, therefore further checking * is not required. The main method is invoked here so that extraneous java * stacks are not in the application stack trace. */ mainID = (*env)->GetStaticMethodID(env, mainClass, "main", "([Ljava/lang/String;)V"); CHECK_EXCEPTION_NULL_LEAVE(mainID); /* Build platform specific argument array */ mainArgs = CreateApplicationArgs(env, argv, argc); CHECK_EXCEPTION_NULL_LEAVE(mainArgs); /* Invoke main method. */ (*env)->CallStaticVoidMethod(env, mainClass, mainID, mainArgs); /* * The launcher's exit code (in the absence of calls to * System.exit) will be non-zero if main threw an exception. */ ret = (*env)->ExceptionOccurred(env) == NULL ? 0 : 1; LEAVE(); } /* /* * Initializes the Java Virtual Machine. Also frees options array when * finished. */ static jboolean InitializeJVM(JavaVM **pvm, JNIEnv **penv, InvocationFunctions *ifn) { JavaVMInitArgs args; jint r; memset(&args, 0, sizeof(args)); args.version = JNI_VERSION_1_2; args.nOptions = numOptions; args.options = options; args.ignoreUnrecognized = JNI_FALSE; if (JLI_IsTraceLauncher()) { int i = 0; printf("JavaVM args:\n "); printf("version 0x%08lx, ", (long)args.version); printf("ignoreUnrecognized is %s, ", args.ignoreUnrecognized ? "JNI_TRUE" : "JNI_FALSE"); printf("nOptions is %ld\n", (long)args.nOptions); for (i = 0; i < numOptions; i++) printf(" option[%2d] = '%s'\n", i, args.options[i].optionString); } r = ifn->CreateJavaVM(pvm, (void **)penv, &args); JLI_MemFree(options); return r == JNI_OK; } static jclass helperClass = NULL;
地址:
000000000085d4e0 动态连接库的偏移量
0x7ffff66ee4e0 CreateJavaVM 运行时的地址
0x7ffff66ee4e0L - 0x000000000085d4e0L = 7ffff5e91000
pmap -x 5018 使用 pmap 获取到真实的JVM地址
运行时的地址 - 偏移量 = 起始地址
00007ffff5e91000:
readelf --dyn-syms /home/muliao/software/openjdk-jdk8-b118/build/linux-x86_64-normal-server-slowdebug/jdk/lib/amd64/server/libjvm.so | grep JNI_CreateJavaVM 459: 000000000085d4e0 530 FUNC GLOBAL DEFAULT 14 JNI_CreateJavaVM@@SUNWprivate_1.1
堆内存分配:
如果一个函数需要调用另一个函数的返回值,但是这个返回值的是个函数内部的一个地址,如果函数被释放,这地址中的值,就是脏数据,如果当前函数去接受这个地址取数据,就有可能取到一些莫名其妙的数据,这个也称为野指针。这个时候就需要堆内存分配。
如果有两个函数,需要共享一个数据,这个时候就需要去堆内存中分配空间。
堆内存分配
第一种:使用栈的方式分配内存,如果需要内存则推高地址即可。
如果想要释放空间,则地址之下的内存如何释放。回退还是?
第二种:使用剩余空间进行分配,就会出现内存碎片的问题,还需要维护内存分配映射。
这个时候我们就可以使用两种方式的结合。
如果使用的大内存使用直接空余空间分配,如果使用的是小内存则使用推高地址分配内存。