学习Linker也有一段时日了,但是还是不太清楚,对于Linker的重要性不多说,无论是对于加固还是对于脱壳都有至关重要的作用,我们作为一名安全爱好者,不需要细细了解源码的每句意思,但是大致的框架理清还是很有必要的,接下来几篇围绕这这个进行展开研究,一来帮助需要之人,二来加深自己的理解。
整体框架篇:
每每研究这个的时候,我都特别喜欢拿一张图来进行看:
可以看到这里面的重点函数为:
do_dlopen下面的findlibrary函数,因为最重要的装载、链接就是发生在里面。接下来就是初始化的过程。因此分为以下几个模块:
第一部分:判断so的加载,以及装载过程;
第二部分:装载进内存后,为这些so分配相应的结构也就是soinfo结构;
第三部分:完成链接的过程包括a.定位动态节 b.解析动态节 c.加载依赖so d.重定位等过程;
第四部分:初始化调用CallConstructors函数.init->.init_array->JNI_Onload->java_com_XXXX的过程;
正式篇:
接下来本文就主要讲第一部分判断so的加载,以及装载过程;
前面有一大堆的过程,直接略,来到重要的函数dlopen函数
soinfo* do_dlopen(const char* name, int flags) {
if ((flags & ~(RTLD_NOW|RTLD_LAZY|RTLD_LOCAL|RTLD_GLOBAL)) != 0) {
DL_ERR("invalid flags to dlopen: %x", flags);
return NULL;
}
set_soinfo_pool_protection(PROT_READ | PROT_WRITE);
soinfo* si = find_library(name);
if (si != NULL) {
si->CallConstructors();
}
set_soinfo_pool_protection(PROT_READ);
return si;
}
看到这里调用的两个重要的函数:find_library()和CallConstructors()函数,后面是初始化函数,会在第四部分讲。因此接下来看find_library()函数。
static soinfo* find_library(const char* name) {
soinfo* si = find_library_internal(name);
if (si != NULL) {
si->ref_count++;
}
return si;
}
接下来看find_library_internal()函数:
static soinfo* find_library_internal(const char* name) {
if (name == NULL) {
return somain;
}
soinfo* si = find_loaded_library(name);
if (si != NULL) {
if (si->flags & FLAG_LINKED) {
return si;
}
DL_ERR("OOPS: recursive link to \"%s\"", si->name);
return NULL;
}
TRACE("[ '%s' has not been loaded yet. Locating...]", name);
si = load_library(name);
if (si == NULL) {
return NULL;
}
// At this point we know that whatever is loaded @ base is a valid ELF
// shared library whose segments are properly mapped in.
TRACE("[ init_library base=0x%08x sz=0x%08x name='%s' ]",
si->base, si->size, si->name);
if (!soinfo_link_image(si)) {
munmap(reinterpret_cast(si->base), si->size);
soinfo_free(si);
return NULL;
}
return si;
}
首先调用
find_loaded_library函数
static soinfo *find_loaded_library(const char *name)
{
soinfo *si;
const char *bname;
// TODO: don't use basename only for determining libraries
// http://code.google.com/p/android/issues/detail?id=6670
bname = strrchr(name, '/');
bname = bname ? bname + 1 : name;
for (si = solist; si != NULL; si = si->next) {
if (!strcmp(bname, si->name)) {
return si;
}
}
return NULL;
}
主要判断so有没有加载,如果没有加载就调用
load_library函数,来加载因此接下来看
这个函数:soinfo_link_image这个函数用来链接,会在第三部分讲。
static soinfo* load_library(const char* name) {
// Open the file.
int fd = open_library(name);
if (fd == -1) {
DL_ERR("library \"%s\" not found", name);
return NULL;
}
// Read the ELF header and load the segments.
ElfReader elf_reader(name, fd);
if (!elf_reader.Load()) {
return NULL;
}
const char* bname = strrchr(name, '/');
soinfo* si = soinfo_alloc(bname ? bname + 1 : name);
if (si == NULL) {
return NULL;
}
si->base = elf_reader.load_start();
si->size = elf_reader.load_size();
si->load_bias = elf_reader.load_bias();
si->flags = 0;
si->entry = 0;
si->dynamic = NULL;
si->phnum = elf_reader.phdr_count();
si->phdr = elf_reader.loaded_phdr();
return si;
}
在这个函数里面与装载链接相关的就是
open_library、
ElfReader elf_reader函数,
对于soinfo_alloc函数是用来创建soinfo结构的,会在下一节中讲。
首先先来看open_library这个函数:
static int open_library(const char* name) {
TRACE("[ opening %s ]", name);
// If the name contains a slash, we should attempt to open it directly and not search the paths.
if (strchr(name, '/') != NULL) {
int fd = TEMP_FAILURE_RETRY(open(name, O_RDONLY | O_CLOEXEC));
if (fd != -1) {
return fd;
}
// ...but nvidia binary blobs (at least) rely on this behavior, so fall through for now.
}
// Otherwise we try LD_LIBRARY_PATH first, and fall back to the built-in well known paths.
int fd = open_library_on_path(name, gLdPaths);
if (fd == -1) {
fd = open_library_on_path(name, gSoPaths);
}
return fd;
}
就是打开so文件获得文件句柄。
ElfReader elf_reader(name, fd);
这个是创建ElfReader 对象,接下来我们看一下这个类中的load方法:
bool ElfReader::Load() {
return ReadElfHeader() &&
VerifyElfHeader() &&
ReadProgramHeader() &&
ReserveAddressSpace() &&
LoadSegments() &&
FindPhdr();
}
家下来我们一个个的看对应的方法:
先来看ReadElfHeader():
bool ElfReader::ReadElfHeader() {
ssize_t rc = TEMP_FAILURE_RETRY(read(fd_, &header_, sizeof(header_)));
if (rc < 0) {
DL_ERR("can't read file \"%s\": %s", name_, strerror(errno));
return false;
}
if (rc != sizeof(header_)) {
DL_ERR("\"%s\" is too small to be an ELF executable", name_);
return false;
}
return true;
}
作用:就是读取ELF文件的头部分;
接下来看VerifyElfHeader()这个函数:
bool ElfReader::VerifyElfHeader() {
if (header_.e_ident[EI_MAG0] != ELFMAG0 ||
header_.e_ident[EI_MAG1] != ELFMAG1 ||
header_.e_ident[EI_MAG2] != ELFMAG2 ||
header_.e_ident[EI_MAG3] != ELFMAG3) {
DL_ERR("\"%s\" has bad ELF magic", name_);
return false;
}
if (header_.e_ident[EI_CLASS] != ELFCLASS32) {
DL_ERR("\"%s\" not 32-bit: %d", name_, header_.e_ident[EI_CLASS]);
return false;
}
if (header_.e_ident[EI_DATA] != ELFDATA2LSB) {
DL_ERR("\"%s\" not little-endian: %d", name_, header_.e_ident[EI_DATA]);
return false;
}
if (header_.e_type != ET_DYN) {
DL_ERR("\"%s\" has unexpected e_type: %d", name_, header_.e_type);
return false;
}
if (header_.e_version != EV_CURRENT) {
DL_ERR("\"%s\" has unexpected e_version: %d", name_, header_.e_version);
return false;
}
if (header_.e_machine !=
#ifdef ANDROID_ARM_LINKER
EM_ARM
#elif defined(ANDROID_MIPS_LINKER)
EM_MIPS
#elif defined(ANDROID_X86_LINKER)
EM_386
#endif
) {
DL_ERR("\"%s\" has unexpected e_machine: %d", name_, header_.e_machine);
return false;
}
return true;
}
作用:看到是对e_ident数据结构中的各部分数据进行验证;
bool ElfReader::ReadProgramHeader() {
phdr_num_ = header_.e_phnum;
// Like the kernel, we only accept program header tables that
// are smaller than 64KiB.
if (phdr_num_ < 1 || phdr_num_ > 65536/sizeof(Elf32_Phdr)) {
DL_ERR("\"%s\" has invalid e_phnum: %d", name_, phdr_num_);
return false;
}
Elf32_Addr page_min = PAGE_START(header_.e_phoff);
Elf32_Addr page_max = PAGE_END(header_.e_phoff + (phdr_num_ * sizeof(Elf32_Phdr)));
Elf32_Addr page_offset = PAGE_OFFSET(header_.e_phoff);
phdr_size_ = page_max - page_min;
void* mmap_result = mmap(NULL, phdr_size_, PROT_READ, MAP_PRIVATE, fd_, page_min);
if (mmap_result == MAP_FAILED) {
DL_ERR("\"%s\" phdr mmap failed: %s", name_, strerror(errno));
return false;
}
phdr_mmap_ = mmap_result;
phdr_table_ = reinterpret_cast(reinterpret_cast(mmap_result) + page_offset);
return true;
}
作用:读取programHeader的数量,然后对齐内存页,用mmap映射进内存;
接着看ReserveAddressSpace()这个函数:
bool ElfReader::ReserveAddressSpace() {
Elf32_Addr min_vaddr;
load_size_ = phdr_table_get_load_size(phdr_table_, phdr_num_, &min_vaddr);
if (load_size_ == 0) {
DL_ERR("\"%s\" has no loadable segments", name_);
return false;
}
uint8_t* addr = reinterpret_cast(min_vaddr);
int mmap_flags = MAP_PRIVATE | MAP_ANONYMOUS;
void* start = mmap(addr, load_size_, PROT_NONE, mmap_flags, -1, 0);
if (start == MAP_FAILED) {
DL_ERR("couldn't reserve %d bytes of address space for \"%s\"", load_size_, name_);
return false;
}
load_start_ = start;
load_bias_ = reinterpret_cast(start) - addr;
return true;
}
首先调用
phdr_table_get_load_size这个函数具体为:
size_t phdr_table_get_load_size(const Elf32_Phdr* phdr_table,
size_t phdr_count,
Elf32_Addr* out_min_vaddr,
Elf32_Addr* out_max_vaddr)
{
Elf32_Addr min_vaddr = 0xFFFFFFFFU;
Elf32_Addr max_vaddr = 0x00000000U;
bool found_pt_load = false;
for (size_t i = 0; i < phdr_count; ++i) {
const Elf32_Phdr* phdr = &phdr_table[i];
if (phdr->p_type != PT_LOAD) {
continue;
}
found_pt_load = true;
if (phdr->p_vaddr < min_vaddr) {
min_vaddr = phdr->p_vaddr;
}
if (phdr->p_vaddr + phdr->p_memsz > max_vaddr) {
max_vaddr = phdr->p_vaddr + phdr->p_memsz;
}
}
if (!found_pt_load) {
min_vaddr = 0x00000000U;
}
min_vaddr = PAGE_START(min_vaddr);
max_vaddr = PAGE_END(max_vaddr);
if (out_min_vaddr != NULL) {
*out_min_vaddr = min_vaddr;
}
if (out_max_vaddr != NULL) {
*out_max_vaddr = max_vaddr;
}
return max_vaddr - min_vaddr;
}
作用:计算加载So的空间
bool ElfReader::LoadSegments() {
for (size_t i = 0; i < phdr_num_; ++i) {
const Elf32_Phdr* phdr = &phdr_table_[i];
if (phdr->p_type != PT_LOAD) {
continue;
}
// Segment addresses in memory.
Elf32_Addr seg_start = phdr->p_vaddr + load_bias_;
Elf32_Addr seg_end = seg_start + phdr->p_memsz;
Elf32_Addr seg_page_start = PAGE_START(seg_start);
Elf32_Addr seg_page_end = PAGE_END(seg_end);
Elf32_Addr seg_file_end = seg_start + phdr->p_filesz;
// File offsets.
Elf32_Addr file_start = phdr->p_offset;
Elf32_Addr file_end = file_start + phdr->p_filesz;
Elf32_Addr file_page_start = PAGE_START(file_start);
Elf32_Addr file_length = file_end - file_page_start;
if (file_length != 0) {
void* seg_addr = mmap((void*)seg_page_start,
file_length,
PFLAGS_TO_PROT(phdr->p_flags),
MAP_FIXED|MAP_PRIVATE,
fd_,
file_page_start);
if (seg_addr == MAP_FAILED) {
DL_ERR("couldn't map \"%s\" segment %d: %s", name_, i, strerror(errno));
return false;
}
}
// if the segment is writable, and does not end on a page boundary,
// zero-fill it until the page limit.
if ((phdr->p_flags & PF_W) != 0 && PAGE_OFFSET(seg_file_end) > 0) {
memset((void*)seg_file_end, 0, PAGE_SIZE - PAGE_OFFSET(seg_file_end));
}
seg_file_end = PAGE_END(seg_file_end);
// seg_file_end is now the first page address after the file
// content. If seg_end is larger, we need to zero anything
// between them. This is done by using a private anonymous
// map for all extra pages.
if (seg_page_end > seg_file_end) {
void* zeromap = mmap((void*)seg_file_end,
seg_page_end - seg_file_end,
PFLAGS_TO_PROT(phdr->p_flags),
MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE,
-1,
0);
if (zeromap == MAP_FAILED) {
DL_ERR("couldn't zero fill \"%s\" gap: %s", name_, strerror(errno));
return false;
}
}
}
return true;
}
作用:就是根据遍历 program header table,找到类型为 PT_LOAD 的 segment,使用 mmap 将 segment 映射到内存。
bool ElfReader::FindPhdr() {
const Elf32_Phdr* phdr_limit = phdr_table_ + phdr_num_;
// If there is a PT_PHDR, use it directly.
for (const Elf32_Phdr* phdr = phdr_table_; phdr < phdr_limit; ++phdr) {
if (phdr->p_type == PT_PHDR) {
return CheckPhdr(load_bias_ + phdr->p_vaddr);
}
}
// Otherwise, check the first loadable segment. If its file offset
// is 0, it starts with the ELF header, and we can trivially find the
// loaded program header from it.
for (const Elf32_Phdr* phdr = phdr_table_; phdr < phdr_limit; ++phdr) {
if (phdr->p_type == PT_LOAD) {
if (phdr->p_offset == 0) {
Elf32_Addr elf_addr = load_bias_ + phdr->p_vaddr;
const Elf32_Ehdr* ehdr = (const Elf32_Ehdr*)(void*)elf_addr;
Elf32_Addr offset = ehdr->e_phoff;
return CheckPhdr((Elf32_Addr)ehdr + offset);
}
break;
}
}
DL_ERR("can't find loaded phdr for \"%s\"", name_);
return false;
}
作用:最后在装载到内存的 SO 中找到program header,相关的段,方便之后的链接过程使用。等到链接的时候再说。
总结篇:
load_library的具体过程://装载进内存
bool ElfReader::Load(const Android_dlextinfo* extinfo) {
return ReadElfHeader() && // 读取 elf header
VerifyElfHeader() && // 验证 elf header
ReadProgramHeader() && // 读取 program header
ReserveAddressSpace(extinfo) &&// 分配空间
LoadSegments() && // 按照 program header 指示装载 segments
FindPhdr(); // 找到装载后的 phdr 地址
}
第一步:ElfReader::Load 方法首先读取 SO 的elf header;
第二步:再对elf header进行验证;
第三步:之后读取program header;
第四步:根据program header 计算 SO 需要的内存大小并分配相应的空间;
第五步: 遍历 program header table,找到类型为 PT_LOAD 的 segment,使用 mmap 将 segment 映射到内存
第六步:最后在装载到内存的 SO 中找到program header,方便之后的链接过程使用。