From http://www.symantec.com/connect/articles/dynamic-linking-linux-and-windows-part-one
by Reji Thomas and Bhasker Reddy
This article discusses the shared libraries concept in both Windows and Linux, and offers a walk-through through various data structures to explain how dynamic linking is done in these operating systems. The paper will be useful for developers interested in the security implications and the relative speed of dynamic linking, and assumes some prior cursory knowledge with dynamic linking.
Part one introduces the concepts for both Linux and Windows, but will focus primarily on Linux. Next time in part two, we'll discuss how it works in Windows and then continue to compare the two environments.
A library is a collection of sub-programs which allow code to be shared and changed in a modular fashion. Executables and libraries make references to each other through a process known as linking which is done by a linker.
In the most basic sense, libraries can be divided into two categories: static libraries and shared libraries.
Static libraries are a collection of object files, and conventionally they end with a ".a" suffix in UNIX variants, and ".lib" in Windows. When a program is linked against a static library, the machine code from the object files for any external functions used by the program is copied from the library into the final executable.
In contrast to static libraries, with shared libraries the library code is not bound to the executable at link time. Depending on when and how the address bindings are done, the linking process can be categorized into prelinking, load time linking, implicit run-time linking and explicit run-time linking.
Position independent code can be copied to any memory location without modification and then executed, unlike relocatable code which requires special processing by a linker to make it suitable for execution at a given location.
Win32 DLLs are not position independent. They need to be relocated during loading, unless the fixed base it was built with happens to be unused in the loading process. Relocations to the same address can be shared, but if different processes have conflicting memory layouts, the loader needs to generate multiple copies of the DLL in memory. When the Windows loader maps a DLL into memory, it opens the file and tries to map the file into memory at its preferred base address. As these mapped pages are touched, the paging system will see whether the pages are already present in memory. If they are, it just remaps the pages to the new process since the relocation has already been done by the loader at the preferred base address. Otherwise the pages are fetched from the disk.
If the preferred address range for the DLL is not available, the loader maps the pages into a free location in the process address space. In this case, it marks the code page as COW (copy-on-write) which previously was marked read+execute, since the linker will have to perform code fix ups at the time of relocation, necessitating the page to be backed up by a paging file.
Linux solves this issue by the use of PIC (Position Independent Code). Shared objects in Linux usually contain PIC which avoid the need to relocate the library at load time. All code pages can be shared amongst all processes using the same library and can be paged to/from the file system. In x86, there is no simple way to address data relative to the current location since all jumps and calls are instruction-pointer relative. Hence, all references to the static globals were directed through a table known as Global Offset Table (GOT).
As this is not an article specifically on the ELF format, we will discuss in brief only those data structures which are relevant to our discussion. For dynamic linking, the ELF linker primarily uses two processor-specific tables, the Global Offset Table (GOT) and the Procedure Linkage Table (PLT).
ELF linkers support PIC Code through the GOT in each shared library. The GOT contains absolute addresses to all of the static data referenced in the program. The address of the GOT is normally stored in a register (EBX) which is a relative address from the code that references it.
Both the executables that use the shared libraries and the shared library itself has a PLT. Similar to how the GOT redirects any position-independent address calculations to absolute locations, the PLT redirects position-independent function calls to absolute locations.
Apart from these two tables, the linker also refers to .dynsym
, which contains all of the file's imported and exported symbols, .dynstr
, which contains name strings for the symbols, .hash which contains the hash table which the runtime linker can use to lookup symbols quickly, and .dynamic
, which is a list of tagged values and pointers.
In the .dynamic
section, the important tag types are:
Symbol Hashtable
nBuckets //no of bucket entries
nChains //no of chain entries bucket[] chain[]
Both the bucket and chain array contain symbol table indexes. For the symbol to be searched, a hash of the symbol is computed and hash%nBuckets is used as an index into the bucket[] array. The bucket element gives an index, symindx, into the chain array as well as to the symbol table. If the symbol table entry does not match, it looks up the next symbol table entry with the same hash value using the index retrieved from Chain [symindx].
In Linux, the dynamic linker ld.so is itself an ELF shared library. At program startup, the system maps the ld.so to a part of the address space and runs its bootstrap code. The main entry point for the loader is defined in dl_main(elf/rtld.c). The linker relocates and resolves references to its own routines which are needed to load everything else.
The dynamic segment (pointed to by the program header) in the ELF file contains a pointer to the file's string table (DT_STRTAB) as well as to the DT_NEEDED entries, each of which contains the offset in the string table for the name of a required library. The dynamic linker creates a scope list for the executable, consisting of libraries to be loaded.
We have two ways to specify objects to preload, either through the environment variable LD_PRELOAD or via the file /etc/ld.so.preload. The latter can be used when security prevents the use of an environment variable. The loader adds the DT_NEEDED entries of the executable as well as the scope, after the preload entries.
For each of the entries in the scope, the linker searches for the file containing the library. Once the file is found, the linker reads the ELF Header to find the program header, which points to the dynamic segment .The linker maps the library to the process address space. From the dynamic segment, it adds the library's symbol table to the chain of symbol tables - and if the libraries has further dependencies, it adds those libraries to the list to be loaded and the process is continued. For clarification, note that in fact it actually creates a struct link_map for each of the library and adds it into a global linked list.
The Linker keeps in memory a linked list of the tables (of type struct link_map, referenced by the dl_loaded parameter in struct rtld_global) in each file. The linker utilizes the hashtable present in the ELF file (referred to by the DT_HASH) to speed symbol lookup.
Once the loader finishes constructing the linked list of tables of all the dependencies, it revisits each library and handles the library's relocation entries, filling in the library's GOT and performing the relocations needed.
The LD_BIND_NOW variable determines the dynamic linking behavior. If its set, the dynamic linker evaluates the PLT entries, which is all entries of type R_386_JMP_SLOT, at the load time itself. Otherwise, the dynamically linker does lazy linking of procedure addresses and hence the addresses are not bound unless the routines are called.
In this section, we will trace how a function defined in the shared library libtest.so gets resolved at runtime. The executable, disassembled using gdb below, was created by linking with a PIC library, libtest.so.
Let's start tracing from the call instruction shown in Figure 1.
The address in the call instruction (0x80483a4) is an entry in the PLT . The first 4 entries in the PLT (whereby the 3rd and 4th entries are reserved) are common to all function calls. The rest of the entries are grouped into blocks of three, one block for each function. This is shown in Figure 2.
Figure 2. First few entries in the PLT.
Figure 3. The GOT as shown on disk.
The instruction consists of a jump to the address in the GOT entry *(GOT + 0x14), which points back to the next entry in the PLT viz 0x80483aa (as was shown in Figure 2).
The next instructions are for resolving the address using the dynamic linker. The jump instruction pushes an offset (0x10). This is an offset into the file's relocation table, which points to the desired symbol in the symbol table, and the address points to the GOT entry (0x804963c).
Figure 4. Relocation entry (RELSZ) of 8 bytes .
As seen in Figure 4, the size of a relocation entry (RELSZ) is 8 bytes. The offset 0x10 gives us the 3rd relocation entry in the .rel.plt table, which is the relocation entry for m. The offset entry inside the table gives the corresponding GOT address which has to be updated.
The code then jumps into the first entry in the PLT, which is the common part.
Figure 5. Breakpoint 1.
The entry in the PLT again does a jump to the address in GOT + 8.The loader at load time has updated the entries in GOT +4 and GOT +8 (which were 0x000000 earlier, as was seen in Figure 3) . Now GOT + 8 (0x 4000bcb0) points to an address which is mapped with ld-2.3.2.0 (the runtime linker), as can be seen below in Figure 6.
Figure 6. Address mapped with the runtime linker.
Once the dynamic linker routine looks up the symbol value using the concatenated run time table, and stores the routines address (0x400177db) into the GOT entry (0x 804963c) as seen in Figure 5, the subsequent calls jump directly to the routine itself.
Let's go back to the GOT entries. As we have seen, the GOT + 8 contains the address of the linker's symbol resolution routine. The GOT + 4 was filled by the loader with the address of a structure struct link_map, defined in include/ link.h. The GOT gets filled by the routine elf_machine_runtime_setup defined in dl-machine.h.
Let's look at this further.
Struct link_map {
ElfW(Addr) l_addr; /* Base address shared object is loaded at. */ char *l_name; /* Absolute file name object was found in. */
ElfW(Dyn) *l_ld; /* Dynamic section of the shared object. */
struct link_map *l_next, *l_prev; /* Chain of loaded objects. */
/* All following members are internal to the dynamic linker. They may change without notice. */
/* Indexed pointers to dynamic section.*/
ElfW(Dyn) *l_info[DT_NUM + DT_THISPROCNUM + DT_VERSIONTAGNUM
+ DT_EXTRANUM + DT_VALNUM + DT_ADDRNUM];
const ElfW(Phdr) *l_phdr; /* Pointer to program header table in core. */
ElfW(Addr) l_entry; /* Entry point location. */
ElfW(Half) l_phnum; /* Number of program header entries. */
ElfW(Half) l_ldnum; /* Number of dynamic segment entries. */
/* Array of DT_NEEDED dependencies and their dependencies, in
dependency order for symbol lookup (with and without duplicates). There is no entry before the dependencies have
been loaded. */
struct r_scope_elem l_searchlist; /* Symbol hash table. */
Elf_Symndx l_nbuckets;
const Elf_Symndx *l_buckets, *l_chain; . .
}
At the time the execution jumps into the symbol resolution routine, we have the address of the link map and the relocation offset in the stack. The relocation offset as discussed above gives the index in the symbol table for the symbol name and the corresponding GOT address, where the resolved address should be written.The symbol resolution address (GOT +8) points to a trampoline ELF_MACHINE_RUNTIME_TRAMPOLINE.
Let's look into the ELF_MACHINE_RUNTIME_TRAMPOLINE defined in dl-machine.h. The code preserves the registers and does a call to the fixup() function:
movl 16(%esp), %edx # Copy args pushed by PLT in register. Note
movl 12(%esp), %eax # that `fixup' takes its parameters in regs.
call fixup # Call resolver.
The fixup function is defined in dl-runtime.c
.
The l_info array inside the struct link_map contains indexed pointers to the dynamic section:
const ElfW(Sym) *const symtab = (const void *) D_PTR (l, l_info[DT_SYMTAB]);
const char *strtab = (const void *) D_PTR (l, l_info[DT_STRTAB]);
const PLTREL *const reloc = (const void *) (D_PTR (l, l_info[DT_JMPREL]) + reloc_offset);
From the l_info, the code retrieves the pointers to the symbol table and relocation table. It calculates the relocation entry for the symbol by adding the relocation offset to the relocation base:
const ElfW(Sym) *sym = &symtab[ELFW(R_SYM) (reloc->r_info)]; void *const rel_addr = (void *)(l->l_addr + reloc->r_offset);
From reloc->r_info, it gets the symbol index which is used to index into the symbol table to get the symbol table info as well as the address to be updated from reloc->r_offset +l->l_addr.
The fixup function calls the _dl_lookup_symbol() for doing the lookup job with the above information.
The dl_lookup_symbol() in turn calls do_lookup() for each entry in the scope array. The scope array contains elements of type struct r_scope_elem for the libraries, which forms part of the global search scope. This structure gets filled at loadtime.
struct r_scope_elem {
struct link_map **r_list; /* Array of maps for the scope. */
unsigned int r_nlist; /* Number of entries in the scope. */
};
The do_lookup is defined to FCT in do-lookup.h. Let's take a look at this now from a logical perspective in plain English, to make it it easier to follow:
do {
Get the symtable address from link_map->l_info
Get the strtable address from link_map->l_info
Search the appropriate hash bucket in the objects symbol table using the hash produced from the symbol name in _dl_lookup_symbol().
Using the index entries in the hash chain, link_map ->l_chain,
Do {
Lookup the symbol table entry using the index.
Compare the symbol name with (strtab + sym->st_name).
If found, return the symbol table entry with the link_map structure;
}
}
Now let's go back to fixup() :
value = link_map->l_addr + sym->st_value; /*link_map ->l_addr points to the base load address */
return elf_machine_fixup_plt (l, result, reloc, rel_addr, value); /* Finally, fix up the plt itself. */
Back in dl-machine.h, it pops the saved registers:
xchgl %eax, (%esp) # Get %eax contents and store function address.
ret $8 # Jump to function address.
If you remember, except for the call from the main function, all the other code paths were through jumps. This unwinds the stack and does a jump to the resolved function address.
In part one of this paper we've discussed the use of dynamic linking for both Linux and Windows, but have focused primarily on the Linux side of things. Next time in part two we'll take a closer look at dynamic linking in Windows in much the same way, including the lazy linking process and the Delay Load Helper, and then look at how we can speed things up in both environments. Stay tuned.