ELF Linker学习篇(一)关于ELF文件装载进内存

学习Linker也有一段时日了,但是还是不太清楚,对于Linker的重要性不多说,无论是对于加固还是对于脱壳都有至关重要的作用,我们作为一名安全爱好者,不需要细细了解源码的每句意思,但是大致的框架理清还是很有必要的,接下来几篇围绕这这个进行展开研究,一来帮助需要之人,二来加深自己的理解。

整体框架篇: 

每每研究这个的时候,我都特别喜欢拿一张图来进行看:

ELF Linker学习篇(一)关于ELF文件装载进内存_第1张图片

可以看到这里面的重点函数为:

do_dlopen下面的findlibrary函数,因为最重要的装载、链接就是发生在里面。接下来就是初始化的过程。因此分为以下几个模块:

第一部分:判断so的加载,以及装载过程;

第二部分:装载进内存后,为这些so分配相应的结构也就是soinfo结构;

第三部分:完成链接的过程包括a.定位动态节 b.解析动态节 c.加载依赖so d.重定位等过程;

第四部分:初始化调用CallConstructors函数.init->.init_array->JNI_Onload->java_com_XXXX的过程;


正式篇:

接下来本文就主要讲第一部分判断so的加载,以及装载过程;

前面有一大堆的过程,直接略,来到重要的函数dlopen函数

soinfo* do_dlopen(const char* name, int flags) {
  if ((flags & ~(RTLD_NOW|RTLD_LAZY|RTLD_LOCAL|RTLD_GLOBAL)) != 0) {
    DL_ERR("invalid flags to dlopen: %x", flags);
    return NULL;
  }
  set_soinfo_pool_protection(PROT_READ | PROT_WRITE);
  soinfo* si = find_library(name);
  if (si != NULL) {
    si->CallConstructors();
  }
  set_soinfo_pool_protection(PROT_READ);
  return si;
}
看到这里调用的两个重要的函数:find_library()和CallConstructors()函数,后面是初始化函数,会在第四部分讲。因此接下来看find_library()函数。

static soinfo* find_library(const char* name) {
  soinfo* si = find_library_internal(name);
  if (si != NULL) {
    si->ref_count++;
  }
  return si;
}
接下来看find_library_internal()函数:

static soinfo* find_library_internal(const char* name) {
  if (name == NULL) {
    return somain;
  }

  soinfo* si = find_loaded_library(name);
  if (si != NULL) {
    if (si->flags & FLAG_LINKED) {
      return si;
    }
    DL_ERR("OOPS: recursive link to \"%s\"", si->name);
    return NULL;
  }

  TRACE("[ '%s' has not been loaded yet.  Locating...]", name);
  si = load_library(name);
  if (si == NULL) {
    return NULL;
  }

  // At this point we know that whatever is loaded @ base is a valid ELF
  // shared library whose segments are properly mapped in.
  TRACE("[ init_library base=0x%08x sz=0x%08x name='%s' ]",
        si->base, si->size, si->name);

  if (!soinfo_link_image(si)) {
    munmap(reinterpret_cast(si->base), si->size);
    soinfo_free(si);
    return NULL;
  }

  return si;
}
首先调用 find_loaded_library函数
static soinfo *find_loaded_library(const char *name)
{
    soinfo *si;
    const char *bname;

    // TODO: don't use basename only for determining libraries
    // http://code.google.com/p/android/issues/detail?id=6670
    bname = strrchr(name, '/');
    bname = bname ? bname + 1 : name;

    for (si = solist; si != NULL; si = si->next) {
        if (!strcmp(bname, si->name)) {
            return si;
        }
    }
    return NULL;
}
主要判断so有没有加载,如果没有加载就调用 load_library函数,来加载因此接下来看

这个函数:soinfo_link_image这个函数用来链接,会在第三部分讲。

static soinfo* load_library(const char* name) {
    // Open the file.
    int fd = open_library(name);
    if (fd == -1) {
        DL_ERR("library \"%s\" not found", name);
        return NULL;
    }

    // Read the ELF header and load the segments.
    ElfReader elf_reader(name, fd);
    if (!elf_reader.Load()) {
        return NULL;
    }

    const char* bname = strrchr(name, '/');
    soinfo* si = soinfo_alloc(bname ? bname + 1 : name);
    if (si == NULL) {
        return NULL;
    }
    si->base = elf_reader.load_start();
    si->size = elf_reader.load_size();
    si->load_bias = elf_reader.load_bias();
    si->flags = 0;
    si->entry = 0;
    si->dynamic = NULL;
    si->phnum = elf_reader.phdr_count();
    si->phdr = elf_reader.loaded_phdr();
    return si;
}
在这个函数里面与装载链接相关的就是 open_library、 ElfReader elf_reader函数,

对于soinfo_alloc函数是用来创建soinfo结构的,会在下一节中讲。

首先先来看open_library这个函数:

static int open_library(const char* name) {
  TRACE("[ opening %s ]", name);

  // If the name contains a slash, we should attempt to open it directly and not search the paths.
  if (strchr(name, '/') != NULL) {
    int fd = TEMP_FAILURE_RETRY(open(name, O_RDONLY | O_CLOEXEC));
    if (fd != -1) {
      return fd;
    }
    // ...but nvidia binary blobs (at least) rely on this behavior, so fall through for now.
  }

  // Otherwise we try LD_LIBRARY_PATH first, and fall back to the built-in well known paths.
  int fd = open_library_on_path(name, gLdPaths);
  if (fd == -1) {
    fd = open_library_on_path(name, gSoPaths);
  }
  return fd;
}
就是打开so文件获得文件句柄。
ElfReader elf_reader(name, fd);
这个是创建ElfReader 对象,接下来我们看一下这个类中的load方法:

bool ElfReader::Load() {
  return ReadElfHeader() &&
         VerifyElfHeader() &&
         ReadProgramHeader() &&
         ReserveAddressSpace() &&
         LoadSegments() &&
         FindPhdr();
}
家下来我们一个个的看对应的方法:

先来看ReadElfHeader():

bool ElfReader::ReadElfHeader() {
  ssize_t rc = TEMP_FAILURE_RETRY(read(fd_, &header_, sizeof(header_)));
  if (rc < 0) {
    DL_ERR("can't read file \"%s\": %s", name_, strerror(errno));
    return false;
  }
  if (rc != sizeof(header_)) {
    DL_ERR("\"%s\" is too small to be an ELF executable", name_);
    return false;
  }
  return true;
}
作用:就是读取ELF文件的头部分;

接下来看VerifyElfHeader()这个函数:

bool ElfReader::VerifyElfHeader() {
  if (header_.e_ident[EI_MAG0] != ELFMAG0 ||
      header_.e_ident[EI_MAG1] != ELFMAG1 ||
      header_.e_ident[EI_MAG2] != ELFMAG2 ||
      header_.e_ident[EI_MAG3] != ELFMAG3) {
    DL_ERR("\"%s\" has bad ELF magic", name_);
    return false;
  }

  if (header_.e_ident[EI_CLASS] != ELFCLASS32) {
    DL_ERR("\"%s\" not 32-bit: %d", name_, header_.e_ident[EI_CLASS]);
    return false;
  }
  if (header_.e_ident[EI_DATA] != ELFDATA2LSB) {
    DL_ERR("\"%s\" not little-endian: %d", name_, header_.e_ident[EI_DATA]);
    return false;
  }

  if (header_.e_type != ET_DYN) {
    DL_ERR("\"%s\" has unexpected e_type: %d", name_, header_.e_type);
    return false;
  }

  if (header_.e_version != EV_CURRENT) {
    DL_ERR("\"%s\" has unexpected e_version: %d", name_, header_.e_version);
    return false;
  }

  if (header_.e_machine !=
#ifdef ANDROID_ARM_LINKER
      EM_ARM
#elif defined(ANDROID_MIPS_LINKER)
      EM_MIPS
#elif defined(ANDROID_X86_LINKER)
      EM_386
#endif
  ) {
    DL_ERR("\"%s\" has unexpected e_machine: %d", name_, header_.e_machine);
    return false;
  }

  return true;
}
作用:看到是对e_ident数据结构中的各部分数据进行验证;
接着分析 ReadProgramHeader这个函数,首次看到的是:

bool ElfReader::ReadProgramHeader() {
  phdr_num_ = header_.e_phnum;

  // Like the kernel, we only accept program header tables that
  // are smaller than 64KiB.
  if (phdr_num_ < 1 || phdr_num_ > 65536/sizeof(Elf32_Phdr)) {
    DL_ERR("\"%s\" has invalid e_phnum: %d", name_, phdr_num_);
    return false;
  }

  Elf32_Addr page_min = PAGE_START(header_.e_phoff);
  Elf32_Addr page_max = PAGE_END(header_.e_phoff + (phdr_num_ * sizeof(Elf32_Phdr)));
  Elf32_Addr page_offset = PAGE_OFFSET(header_.e_phoff);

  phdr_size_ = page_max - page_min;

  void* mmap_result = mmap(NULL, phdr_size_, PROT_READ, MAP_PRIVATE, fd_, page_min);
  if (mmap_result == MAP_FAILED) {
    DL_ERR("\"%s\" phdr mmap failed: %s", name_, strerror(errno));
    return false;
  }

  phdr_mmap_ = mmap_result;
  phdr_table_ = reinterpret_cast(reinterpret_cast(mmap_result) + page_offset);
  return true;
}
作用:读取programHeader的数量,然后对齐内存页,用mmap映射进内存;

接着看ReserveAddressSpace()这个函数:

bool ElfReader::ReserveAddressSpace() {
  Elf32_Addr min_vaddr;
  load_size_ = phdr_table_get_load_size(phdr_table_, phdr_num_, &min_vaddr);
  if (load_size_ == 0) {
    DL_ERR("\"%s\" has no loadable segments", name_);
    return false;
  }

  uint8_t* addr = reinterpret_cast(min_vaddr);
  int mmap_flags = MAP_PRIVATE | MAP_ANONYMOUS;
  void* start = mmap(addr, load_size_, PROT_NONE, mmap_flags, -1, 0);
  if (start == MAP_FAILED) {
    DL_ERR("couldn't reserve %d bytes of address space for \"%s\"", load_size_, name_);
    return false;
  }

  load_start_ = start;
  load_bias_ = reinterpret_cast(start) - addr;
  return true;
}
首先调用 phdr_table_get_load_size这个函数具体为:

 size_t phdr_table_get_load_size(const Elf32_Phdr* phdr_table,
                                size_t phdr_count,
                                Elf32_Addr* out_min_vaddr,
                                Elf32_Addr* out_max_vaddr)
{
    Elf32_Addr min_vaddr = 0xFFFFFFFFU;
    Elf32_Addr max_vaddr = 0x00000000U;

    bool found_pt_load = false;
    for (size_t i = 0; i < phdr_count; ++i) {
        const Elf32_Phdr* phdr = &phdr_table[i];

        if (phdr->p_type != PT_LOAD) {
            continue;
        }
        found_pt_load = true;

        if (phdr->p_vaddr < min_vaddr) {
            min_vaddr = phdr->p_vaddr;
        }

        if (phdr->p_vaddr + phdr->p_memsz > max_vaddr) {
            max_vaddr = phdr->p_vaddr + phdr->p_memsz;
        }
    }
    if (!found_pt_load) {
        min_vaddr = 0x00000000U;
    }

    min_vaddr = PAGE_START(min_vaddr);
    max_vaddr = PAGE_END(max_vaddr);

    if (out_min_vaddr != NULL) {
        *out_min_vaddr = min_vaddr;
    }
    if (out_max_vaddr != NULL) {
        *out_max_vaddr = max_vaddr;
    }
    return max_vaddr - min_vaddr;
}
作用:计算加载So的空间
总的作用就是:根据program header 计算 SO 需要的内存大小并分配相应的空间;
紧接着看 LoadSegments这个函数:

bool ElfReader::LoadSegments() {
  for (size_t i = 0; i < phdr_num_; ++i) {
    const Elf32_Phdr* phdr = &phdr_table_[i];

    if (phdr->p_type != PT_LOAD) {
      continue;
    }

    // Segment addresses in memory.
    Elf32_Addr seg_start = phdr->p_vaddr + load_bias_;
    Elf32_Addr seg_end   = seg_start + phdr->p_memsz;

    Elf32_Addr seg_page_start = PAGE_START(seg_start);
    Elf32_Addr seg_page_end   = PAGE_END(seg_end);

    Elf32_Addr seg_file_end   = seg_start + phdr->p_filesz;

    // File offsets.
    Elf32_Addr file_start = phdr->p_offset;
    Elf32_Addr file_end   = file_start + phdr->p_filesz;

    Elf32_Addr file_page_start = PAGE_START(file_start);
    Elf32_Addr file_length = file_end - file_page_start;

    if (file_length != 0) {
      void* seg_addr = mmap((void*)seg_page_start,
                            file_length,
                            PFLAGS_TO_PROT(phdr->p_flags),
                            MAP_FIXED|MAP_PRIVATE,
                            fd_,
                            file_page_start);
      if (seg_addr == MAP_FAILED) {
        DL_ERR("couldn't map \"%s\" segment %d: %s", name_, i, strerror(errno));
        return false;
      }
    }

    // if the segment is writable, and does not end on a page boundary,
    // zero-fill it until the page limit.
    if ((phdr->p_flags & PF_W) != 0 && PAGE_OFFSET(seg_file_end) > 0) {
      memset((void*)seg_file_end, 0, PAGE_SIZE - PAGE_OFFSET(seg_file_end));
    }

    seg_file_end = PAGE_END(seg_file_end);

    // seg_file_end is now the first page address after the file
    // content. If seg_end is larger, we need to zero anything
    // between them. This is done by using a private anonymous
    // map for all extra pages.
    if (seg_page_end > seg_file_end) {
      void* zeromap = mmap((void*)seg_file_end,
                           seg_page_end - seg_file_end,
                           PFLAGS_TO_PROT(phdr->p_flags),
                           MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE,
                           -1,
                           0);
      if (zeromap == MAP_FAILED) {
        DL_ERR("couldn't zero fill \"%s\" gap: %s", name_, strerror(errno));
        return false;
      }
    }
  }
  return true;
}
作用:就是根据遍历 program header table,找到类型为 PT_LOAD 的 segment,使用 mmap 将 segment 映射到内存。
最后是FindPhdr 这个函数:

bool ElfReader::FindPhdr() {
  const Elf32_Phdr* phdr_limit = phdr_table_ + phdr_num_;

  // If there is a PT_PHDR, use it directly.
  for (const Elf32_Phdr* phdr = phdr_table_; phdr < phdr_limit; ++phdr) {
    if (phdr->p_type == PT_PHDR) {
      return CheckPhdr(load_bias_ + phdr->p_vaddr);
    }
  }

  // Otherwise, check the first loadable segment. If its file offset
  // is 0, it starts with the ELF header, and we can trivially find the
  // loaded program header from it.
  for (const Elf32_Phdr* phdr = phdr_table_; phdr < phdr_limit; ++phdr) {
    if (phdr->p_type == PT_LOAD) {
      if (phdr->p_offset == 0) {
        Elf32_Addr  elf_addr = load_bias_ + phdr->p_vaddr;
        const Elf32_Ehdr* ehdr = (const Elf32_Ehdr*)(void*)elf_addr;
        Elf32_Addr  offset = ehdr->e_phoff;
        return CheckPhdr((Elf32_Addr)ehdr + offset);
      }
      break;
    }
  }

  DL_ERR("can't find loaded phdr for \"%s\"", name_);
  return false;
}
作用:最后在装载到内存的 SO 中找到program header,相关的段,方便之后的链接过程使用。等到链接的时候再说。


总结篇:

load_library的具体过程://装载进内存 
bool ElfReader::Load(const Android_dlextinfo* extinfo) {
   return ReadElfHeader() &&             // 读取 elf header
          VerifyElfHeader() &&           // 验证 elf header
          ReadProgramHeader() &&         // 读取 program header
          ReserveAddressSpace(extinfo) &&// 分配空间
          LoadSegments() &&              // 按照 program header 指示装载 segments
          FindPhdr();                    // 找到装载后的 phdr 地址
 }
第一步:ElfReader::Load 方法首先读取 SO 的elf header;
第二步:再对elf header进行验证;
第三步:之后读取program header;
第四步:根据program header 计算 SO 需要的内存大小并分配相应的空间;
第五步: 遍历 program header table,找到类型为 PT_LOAD 的 segment,使用 mmap 将 segment 映射到内存
第六步:最后在装载到内存的 SO 中找到program header,方便之后的链接过程使用。













你可能感兴趣的:(android逆向安全,android,源码阅读,Android,代码保护与逆向)