Linking is the process of collecting and combining various pieces of code and data into a single file that can be loaded (copied) into memory and executed.
Most compilation systems provide acompiler driver that invokes the language preprocessor, compiler, assembler, and linker, as needed on behalf of the user.
The driver first runs the C preprocessor ( cpp ), which translates the C source file main.cinto an ASCII
intermediate file main.i:
这个很有意思
cpp [other arguments] main.c /tmp/main.i
Next, the driver runs the C compiler ( cc1 ), which translates main.iinto an ASCII assembly language file main.s.
cc1 /tmp/main.i main.c -O2 [other arguments] -o /tmp/main.s
Then, the driver runs the assembler (as), which translates main.sinto a relocatable object filemain.o:
as [other arguments] -o /tmp/main.o /tmp/main.s
The driver goes through the same process to generateswap.o. Finally, it runs the linker program ld, which combines main.oand swap.o, along with the necessary system object files, to create the executable object file p:
ld -o p [system object files and args] /tmp/main.o /tmp/swap.o
Static linkers such as the Unix ld program take as input a collection of relocatable object files and command-line arguments and generate as output a fully linked executable object file that can be loaded and run.
To build the executable, the linker must perform two main tasks:
Symbol resolution.Object files define and reference symbols . The purpose of symbol resolution is to associate each symbol reference with exactly one symbol definition.
Relocation. Compilers and assemblers generate code and data sections that start at address 0. The linkerrelocates these sections by associating a memory location with each symbol definition, and then modifying all of the references to those symbols so that they point to this memory location.
Object files come in three forms:
Relocatable object file. Contains binary code and data in a form that can be combined with other relocatable object files at compile time to create an executable object file.
Executable object file.Contains binary code and data in a form that can be copied directly into memory and executed.
Shared object file. A special type of relocatable object file that can be loaded into memory and linked dynamically, at either load time or run time.
Object file formats vary from system to system.
The Unix Executable and Linkable Format(ELF). Although our discussion will focus on ELF, the basic concepts are similar, regardless of the particular format.
Figure 7.3 shows the format of a typical ELF relocatable object file. The ELF header begins with a 16-byte sequence that describes the word size and byte ordering of the system that generated the file. The rest of the ELF header contains information that allows a linker to parse and interpret the object file. This includes the size of the ELF header, the object file type
Linux下使用 readelf 命令对ELF格式的文件进行信息读取
typedef struct { int name; /* String table offset */ int value; /* Section offset, or VM address */ int size; /* Object size in bytes */ char type:4, /* Data, func, section, or src file name (4 bits) */ binding:4; /* Local or global (4 bits) */ char reserved; /* Unused */ char section; /* Section header index, ABS, UNDEF, */ /* Or COMMON */ } Elf_Symbol;
对于hello world 程序的elf读取信息
There are 30 section headers, starting at offset 0x1178:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .interp PROGBITS 0000000000400238 00000238
000000000000001c 0000000000000000 A 0 0 1
[ 2] .note.ABI-tag NOTE 0000000000400254 00000254
0000000000000020 0000000000000000 A 0 0 4
[ 3] .note.gnu.build-i NOTE 0000000000400274 00000274
0000000000000024 0000000000000000 A 0 0 4
[ 4] .gnu.hash GNU_HASH 0000000000400298 00000298
000000000000001c 0000000000000000 A 5 0 8
[ 5] .dynsym DYNSYM 00000000004002b8 000002b8
0000000000000060 0000000000000018 A 6 1 8
[ 6] .dynstr STRTAB 0000000000400318 00000318
000000000000003d 0000000000000000 A 0 0 1
[ 7] .gnu.version VERSYM 0000000000400356 00000356
0000000000000008 0000000000000002 A 5 0 2
[ 8] .gnu.version_r VERNEED 0000000000400360 00000360
0000000000000020 0000000000000000 A 6 1 8
[ 9] .rela.dyn RELA 0000000000400380 00000380
0000000000000018 0000000000000018 A 5 0 8
[10] .rela.plt RELA 0000000000400398 00000398
0000000000000048 0000000000000018 A 5 12 8
[11] .init PROGBITS 00000000004003e0 000003e0
000000000000001a 0000000000000000 AX 0 0 4
[12] .plt PROGBITS 0000000000400400 00000400
0000000000000040 0000000000000010 AX 0 0 16
[13] .text PROGBITS 0000000000400440 00000440
00000000000001a4 0000000000000000 AX 0 0 16
[14] .fini PROGBITS 00000000004005e4 000005e4
0000000000000009 0000000000000000 AX 0 0 4
[15] .rodata PROGBITS 00000000004005f0 000005f0
0000000000000011 0000000000000000 A 0 0 4
[16] .eh_frame_hdr PROGBITS 0000000000400604 00000604
0000000000000034 0000000000000000 A 0 0 4
[17] .eh_frame PROGBITS 0000000000400638 00000638
00000000000000d4 0000000000000000 A 0 0 8
[18] .init_array INIT_ARRAY 0000000000600e10 00000e10
0000000000000008 0000000000000000 WA 0 0 8
[19] .fini_array FINI_ARRAY 0000000000600e18 00000e18
0000000000000008 0000000000000000 WA 0 0 8
[20] .jcr PROGBITS 0000000000600e20 00000e20
0000000000000008 0000000000000000 WA 0 0 8
[21] .dynamic DYNAMIC 0000000000600e28 00000e28
00000000000001d0 0000000000000010 WA 6 0 8
[22] .got PROGBITS 0000000000600ff8 00000ff8
0000000000000008 0000000000000008 WA 0 0 8
[23] .got.plt PROGBITS 0000000000601000 00001000
0000000000000030 0000000000000008 WA 0 0 8
[24] .data PROGBITS 0000000000601030 00001030
0000000000000010 0000000000000000 WA 0 0 8
[25] .bss NOBITS 0000000000601040 00001040
0000000000000008 0000000000000000 WA 0 0 4
[26] .comment PROGBITS 0000000000000000 00001040
000000000000002a 0000000000000001 MS 0 0 1
[27] .shstrtab STRTAB 0000000000000000 0000106a
0000000000000108 0000000000000000 0 0 1
[28] .symtab SYMTAB 0000000000000000 000018f8
0000000000000618 0000000000000018 29 45 8
[29] .strtab STRTAB 0000000000000000 00001f10
0000000000000236 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
.data: Initialized global C variables. Local C variables are maintained at run time on the stack, and do not appear in either the .data or .bsssections.
.bss: Uninitialized global C variables. This section occupies no actual space in the object file; it is merely a place holder. Object file formats distin-guish between initialized and uninitialized variables for space efficiency: uninitialized variables do not have to occupy any actual disk space in the object file.
就贴这两个的说明,其他段看书或者wiki吧
Each relocatable object module, m, has a symbol table that contains information about the symbols that are defined and referenced by m. In the context of a linker, there are three different kinds of symbols:
Global symbols that are defined by module m and that can be referenced by other modules. Global linker symbols correspond tononstatic C functions and global variables that are defined withoutthe C staticattribute.
Global symbols that are referenced by modulem but defined by some other module. Such symbols are called externals and correspond to C functions and variables that are defined in other modules.
Local symbolsthat are defined and referenced exclusively by module m. Some local linker symbols correspond to C functions and global variables that are defined with the staticattribute. These symbols are visible anywhere within modulem, but cannot be referenced by other modules. The sections in an object file and the name of the source file that corresponds to module m also get local symbols.
It is important to realize that local linker symbols are not the same as local program variables. The symbol table in .symtab does not contain any symbols that correspond to local nonstatic program variables.
When the compiler encounters a symbol (either a variable or function name) that is not defined in the current module, it assumes that it is defined in some other module, gener-ates a linker symbol table entry, and leaves it for the linker to handle.
Functions and initialized global variables get strong symbols. Uninitialized global variables get weak symbols.
Rule 1: Multiple strong symbols are not allowed.
Rule 2: Given a strong symbol and multiple weak symbols, choose the strong symbol.
Rule 3: Given multiple weak symbols, choose any of the weak symbols.
这里仅仅给出主要的判断依据Rule,具体的demo书上讲的很好,还有跟着的习题都有。不一一贴出来了。
In practice, all compilation systems provide a mechanism for packaging related object modules into a single file called a static library
A big disadvantage is that every executable file in a system would now contain a complete copy of the collection of standard functions, which would be extremely wasteful of disk space.
Another big disadvantage is that any change to any standard function, no matter how small, would require the library developer to recompile the entire source file, a time-consuming operation that would complicate the development and maintenance
of the standard functions.
Figure 7.7 summarizes the activity of the linker. The -static argument tells the compiler driver that the linker should build a fully linked executable object file that can be loaded into memory and run without any further linking at load time.
Relocating sections and symbol definitions.In this step, the linker merges all sections of the same type into a new aggregate section of the same type.
Relocating symbol references within sections.In this step, the linker modifies every symbol reference in the bodies of the code and data sections so that they point to the correct run-time addresses.
When an assembler generates an object module, it does not know where the code and data will ultimately be stored in memory. Nor does it know the locations of any externally defined functions or global variables that are referenced by the module. So whenever the assembler encounters a reference to an object whose ultimate location is unknown
1 typedef struct { 2 int offset; /* Offset of the reference to relocate */ 3 int symbol:24, /* Symbol the reference should point to */ 4 type:8; /* Relocation type */ 5 } Elf32_Rel;
To run an executable object file p, we can type its name to the Unix shell’s command line:
unix> ./p
用户空间程序究竟怎么开始的,怎么结束的:
When the loader runs, it creates the memory image shown in Figure 7.13. Guided by the segment header table in the executable, it copies chunks of the executable into the code and data segments. Next, the loader jumps to the pro-gram’s entry point, which is always the address of the _start symbol. The startup codeat the _start address is defined in the object file crt1.oand is the same for all C programs. Figure 7.14 shows the specific sequence of calls in the startup code. After calling initialization routines from the .text and .init sections, the
startup code calls theatexitroutine, which appends a list of routines that should be called when the application terminates normally. The exitfunction runs the functions registered by atexit, and then returns control to the operating system by calling _exit . Next, the startup code calls the application’s mainroutine, which begins executing our C code. After the application returns, the startup code calls the _exit routine, which returns control to the operating system
A key purpose of shared libraries is to allow multiple running processes to share the same library code in memory and thus save precious memory resources.