【翻译】粉碎Gadgets:使用就地代码随机化防御面向返回的编程——Smashing the Gadgets: Hindering Return-Oriented Programming...

粉碎Gadgets:使用就地代码随机化防御面向返回的编程

【文章为google-translate的直译结果,最近暂时没有时间修改翻译内容。google-translate的翻译结果中有很多明显的错误,遇到类似的问题,请读者结合英文仔细揣摩。】

Abstract—The wide adoption of non-executable page protections in recent versions of popular operating systems has given rise to attacks that employ return-oriented programming (ROP) to achieve arbitrary code execution without the injection of any code. Existing defenses against ROP exploits either require source code or symbolic debugging information, or impose a significant runtime overhead, which limits their applicability for the protection of third-party applications.

摘要—在最新版本的流行操作系统中,非可执行页面保护的广泛采用引发了采用定向返回编程(ROP)来实现任意代码执行而无需注入任何代码的攻击。 现有的针对ROP漏洞的防御措施要么需要源代码或符号调试信息,要么会带来大量的运行时开销,从而限制了它们在保护第三方应用程序方面的适用性。

In this paper we present in-place code randomization, a practical mitigation technique against ROP attacks that can be applied directly on third-party software. Our method uses various narrow-scope code transformations that can be applied statically, without changing the location of basic blocks, allowing the safe randomization of stripped binaries even with partial disassembly coverage. These transformations effectively eliminate about 10%, and probabilistically break about 80% of the useful instruction sequences found in a large set of PE files. Since no additional code is inserted, in-place code randomization does not incur any measurable runtime overhead, enabling it to be easily used in tandem with existing exploit mitigations such as address space layout randomization. Our evaluation using publicly available ROP exploits and two ROP code generation toolkits demonstrates that our technique prevents the exploitation of the tested vulnerable Windows 7 applications, including Adobe Reader, as well as the automated construction of alternative ROP payloads that aim to circumvent in-place code randomization using solely any remaining unaffected instruction sequences.

在本文中,我们介绍了就地代码随机化,这是一种针对ROP攻击的实用缓解技术,可以直接应用于第三方软件。我们的方法使用了各种窄范围代码转换,这些转换可以静态应用,而无需更改基本块的位置,从而即使在部分拆卸的情况下,也可以安全地对剥离后的二进制文件进行随机分配。这些转换有效消除了大约10%的概率,并概率破坏了在大量PE文件中找到的有用指令序列的大约80%。由于没有插入其他代码,因此就地代码随机化不会招致任何可测量的运行时开销,从而使其可以轻松地与现有的漏洞利用缓解措施(例如地址空间布局随机化)一起使用。我们使用公开的ROP漏洞和两个ROP代码生成工具包进行的评估表明,我们的技术可以防止对经过测试的易受攻击的Windows 7应用程序(包括Adobe Reader)的利用,以及旨在规避就地代码的替代ROP有效载荷的自动构建仅使用任何剩余的不受影响的指令序列进行随机化。

I. INTRODUCTION

一,引言

Attack prevention technologies based on the No execute (NX) memory page protection bit, which prevent the execution of malicious code that has been injected into a process, are now supported by most recent CPUs and operating systems [1]. The wide adoption of these protection mechanisms has given rise to a new exploitation technique, widely known as return- oriented programming (ROP) [2], which allows an attacker to circumvent non-executable page protections without injecting any code. Using return-oriented programming, the attacker can link together small fragments of code, known as gadgets, that already exist in the process image of the vulnerable application. Each gadget ends with an indirect control transfer instruction, which transfers control to the next gadget ac- cording to a sequence of gadget addresses injected on the stack or some other memory area. In essence, instead of injecting binary code, the attacker injects just data, which include the addresses of the gadgets to be executed, along with any required data arguments.

现在,最新的CPU和操作系统都支持基于No execute(NX)内存页保护位的攻击预防技术,该技术可以防止执行已注入进程的恶意代码[1]。这些保护机制的广泛采用引发了一种新的利用技术,即众所周知的面向返回的编程(ROP)[2],该技术使攻击者无需注入任何代码即可规避不可执行的页面保护。使用面向返回的程序,攻击者可以将易受攻击的应用程序的进程映像中已经存在的小片段代码(称为小工具)链接在一起。每个小工具都以间接控制转移指令结尾,该指令将控制权转移到下一个小工具,后者根据注入堆栈或某些其他存储区的一系列小工具地址进行。本质上,攻击者不会注入二进制代码,而只会注入数据(包括要执行的小工具的地址)以及任何必需的数据参数。

Several research works have demonstrated the great potential of this technique for bypassing defenses such as readonly memory [3], kernel code integrity protections [4], and non-executable memory implementations in mobile devices [5] and operating systems [6]–[9]. Consequently, it was only a matter of time for ROP to be employed in real-world attacks. Recent exploits against popular applications use ROP code to bypass exploit mitigations even in the latest OS versions, including Windows 7 SP1. ROP exploits are included in the most common exploit packs [10], [11], and are actively used in the wild for mounting drive-by download attacks.

多项研究表明,该技术在绕过防御方面有巨大的潜力,例如只读存储器[3],内核代码完整性保护[4]以及移动设备[5]和操作系统[6]-[ 9]。 因此,将ROP应用于实际攻击只是时间问题。 即使在最新的操作系统版本(包括Windows 7 SP1)中,针对流行应用程序的最新漏洞利用ROP代码也可以绕过漏洞缓解措施。 ROP漏洞利用程序包含在最常见的漏洞利用程序包中[10],[11],并在野外被积极使用,以进行直接驱动下载攻击。

Attackers are able to a priori pick the right code pieces because parts of the code image of the vulnerable application remain static across different installations. Address space layout randomization (ASLR) [1] is meant to prevent this kind of code reuse by randomizing the locations of the executable segments of a running process. However, in both Linux and Windows, parts of the address space do not change due to executables with fixed load addresses [12], or shared libraries incompatible with ASLR [6]. Furthermore, in some exploits, the base address of a DLL can be either calculated dynamically through a leaked pointer [9], [13], or brute-forced [14].

攻击者可以事先选择合适的代码段,因为易受攻击的应用程序的部分代码映像在不同的安装中保持静态。 地址空间布局随机化(ASLR)[1]旨在通过使正在运行的进程的可执行段的位置随机化来防止这种代码重用。 但是,在Linux和Windows中,由于具有固定加载地址的可执行文件[12]或与ASLR不兼容的共享库[6],部分地址空间不会更改。 此外,在某些漏洞利用中,可以通过泄漏的指针[9],[13]或蛮力[14]动态计算DLL的基地址。

Other defenses against code-reuse attacks complementary to ASLR include compiler extensions [15], [16], code randomization [17]–[19], control-flow integrity [20], and runtime solutions [21]–[23]. In practice, though, most of these approaches are almost never applied for the protection of the COTS software currently targeted by ROP attacks, either due to the lack of source code or debugging information, or due to their increased overhead. In particular, from the above techniques, those that operate directly on compiled binaries, e.g., by permuting the order of functions [18], [19] or through binary instrumentation [20], require precise and complete extraction of all code and data in the executable sections of the binary. This is possible only if the corresponding symbolic debugging information is available, which however is typically stripped from production binaries. On the other hand, techniques that do work on stripped binary executables using dynamic binary instrumentation [21]–[23], incur a significant runtime overhead that limits their adoption. At the same time, instruction set randomization (ISR) [24], [25] cannot prevent code-reuse attacks, and current implementations also rely on heavyweight runtime instrumentation or code emulation frameworks.

其他针对ASLR的代码重用攻击的防御措施包括编译器扩展[15],[16],代码随机化[17]-[19],控制流完整性[20]和运行时解决方案[21]-[23]。但是,实际上,由于缺少源代码或调试信息,或者由于增加了开销,这些方法中的大多数几乎从未应用于保护当前受ROP攻击的COTS软件。特别是,从上述技术中,那些直接对已编译的二进制文件进行操作的技术(例如,通过置换功能[18],[19]的顺序或通过二进制工具[20])需要精确而完整地提取其中的所有代码和数据二进制文件的可执行部分。仅当相应的符号调试信息可用时才有可能,但是通常会从生产二进制文件中删除该信息。另一方面,使用动态二进制工具[21]-[23]在剥离的二进制可执行文件上运行的技术会产生大量的运行时开销,从而限制了它们的采用。同时,指令集随机化(ISR)[24],[25]无法防止代码重用攻击,并且当前的实现还依赖于重量级的运行时工具或代码仿真框架。

Starting with the goal of a practical mitigation against the recent spate of ROP attacks, in this paper we present a novel code randomization method that can harden third- party applications against return-oriented programming. Our approach is based on narrow-scope modifications in the code segments of executables using an array of code transformation techniques, to which we collectively refer as in-place code randomization. These transformations are applied statically, in a conservative manner, and modify only the code that can be safely extracted from compiled binaries, without relying on symbolic debugging information. By preserving the length of instructions and basic blocks, these modifications do not break the semantics of the code, and enable the randomization of stripped binaries even without complete disassembly coverage. The goal of this randomization process is to eliminate or probabilistically modify as many of the gadgets that are available in the address space of a vulnerable process as possible. Since ROP code relies on the correct execution of all chained gadgets, altering the outcome of even a few of them will likely render the ROP code ineffective.

从实际缓解最近发生的ROP攻击的目标开始,本文提出了一种新颖的代码随机化方法,该方法可以使第三方应用程序更难于面向返回的编程。我们的方法基于使用一系列代码转换技术对可执行文件的代码段进行窄范围修改,我们将其统称为就地代码随机化。这些转换以保守的方式静态应用,并且仅修改可以从编译的二进制文件安全地提取的代码,而无需依赖于符号调试信息。通过保留指令和基本块的长度,这些修改不会破坏代码的语义,并且即使在没有完全拆卸的情况下,也可以实现剥离二进制文件的随机化。此随机化过程的目标是消除或概率性地修改易受攻击过程的地址空间中可用的所有小工具。由于ROP代码依赖于所有链式小工具的正确执行,因此即使更改其中的几个小工具的结果也可能会使ROP代码无效。

Our evaluation using real-world ROP exploits against widely used applications, such as Adobe Reader, shows the effectiveness and practicality of our approach, as in all cases the randomized versions of the applications rendered the ex- ploits non-functional. When aiming to circumvent the applied code randomization, Q [26] and Mona [27], two automated ROP payload construction tools, were unable to generate functional exploit code by relying solely on any remaining non-randomized gadgets.

我们针对大量使用的应用程序(例如Adobe Reader)使用现实世界中的ROP漏洞进行评估,显示了我们方法的有效性和实用性,因为在所有情况下,应用程序的随机版本都使该功能无法正常使用。 为了规避应用程序代码随机化,Q [26]和Mona [27]这两个自动化的ROP有效载荷构建工具无法仅依靠任何剩余的非随机化小工具来生成功能利用代码。

Although quite effective as a standalone mitigation, in-place code randomization is not meant to be a complete prevention solution, as it offers probabilistic protection and thus cannot deliver any protection guarantees. However, it can be applied in tandem with existing randomization techniques to increase process diversification. This is facilitated by the practically zero overhead of the applied transformations, and the ease with which they can be applied on existing third-party executables.

尽管就算是独立缓解措施也很有效,但是就地代码随机化并不意味着它是一个完整的预防方案,因为它提供了概率保护,因此无法提供任何保护保证。 但是,它可以与现有的随机化技术一起应用,以增加过程的多样化。 应用转换的开销几乎为零,并且可以轻松地将其应用到现有的第三方可执行文件中,这有助于实现这一点。

Our work makes the following main contributions:

我们的工作有以下主要贡献:

  • We present in-place code randomization, a novel and practical approach for hardening third-party software against ROP attacks. We describe in detail various narrow-scope code transformations that do not change the semantics of existing code, and which can be safely applied on compiled binaries without symbolic debugging information.
  • We have implemented in-place code randomization for x86 PE executables, and have experimentally verified the safety of the applied code transformations with extensive runtime code coverage tests using third-party executables.
  • We provide a detailed analysis of how in-place code randomization affects available gadgets using a large set of 5,235 PE files. On average, the applied transformations effectively eliminate about 10%, and probabilistically break about 80% of the gadgets in the tested files.
  • We evaluate our approach using publicly available ROP exploits and generic ROP payloads, as well as two ROP payload construction toolkits. In all cases, the randomized versions of the executables break the malicious ROP code, and prevent the automated construction of alternative payloads using the remaining unaffected gadgets.
  • 我们提出了就地代码随机化的方法,这是一种新颖的实用方法,可以针对ROP攻击加强第三方软件。我们详细描述了各种窄域代码转换,这些转换不会更改现有代码的语义,并且可以在没有符号调试信息的情况下安全地应用于已编译的二进制文件。
  • 我们已经为x86 PE可执行文件实现了就地代码随机化,并通过使用第三方可执行文件的广泛运行时代码覆盖率测试,通过实验验证了所应用代码转换的安全性。
  • 我们使用大量5,235个PE文件提供了就地代码随机化如何影响可用小工具的详细分析。平均而言,应用的转换有效消除了大约10%的概率,并概率破坏了测试文件中约80%的小工具。
  • 我们使用公开的ROP漏洞利用和通用的ROP有效负载以及两个ROP有效负载构建工具包来评估我们的方法。在所有情况下,可执行文件的随机版本都会破坏恶意的ROP代码,并阻止使用其余未受影响的小工具自动构建替代有效负载。

II. BACKGROUND

二。 背景

The introduction of non-executable memory page protections led to the development of the return-to-libc exploitation technique [28]. Using this method, a memory corruption vulnerability can be exploited by transferring control to code that already exists in the address space of the vulnerable process. By jumping to the beginning of a library function such as system() , the attacker can for example spawn a shell without the need to inject any code. Frequently though, especially for remote exploitation, calling a single function is not enough. In these cases, multiple return-to-libc calls can be “chained” together by first returning to a short instruction sequence such as pop reg; pop reg; ret; [29], [30]. When arguments need to be passed through registers, a few short instruction sequences ending with a ret instruction can be chained directly to set the proper registers with the desired arguments, before calling the library function [31].

不可执行的内存页面保护的引入导致了归还libc开发技术的发展[28]。 使用此方法,可以通过将控制权转移到易受攻击的进程的地址空间中已经存在的代码来利用内存损坏漏洞。 通过跳转到诸如system()之类的库函数的开头,攻击者可以例如生成一个shell,而无需注入任何代码。 但是,经常(尤其是对于远程利用而言)仅调用一个函数是不够的。 在这些情况下,可以通过首先返回短指令序列(例如pop reg)将多个return-to-libc调用“链接”在一起。 流行音乐 退 [29],[30]。 当参数需要通过寄存器传递时,可以在调用库函数[31]之前直接链接一些以ret指令结尾的简短指令序列,以使用所需的参数设置适当的寄存器。

In the above code-reuse techniques, the executed code consists of one or a few short instruction sequences followed by a large block of code belonging to a library function. Hovav Shacham demonstrated that using only a carefully selected set of short instruction sequences ending with a ret instruction, known as gadgets , it is possible to achieve arbitrary computa- tion, obviating the need for calling library functions [2]. This powerful technique, dubbed return-oriented programming , in essence gives the attacker the same level of flexibility offered by arbitrary code injection without injecting any code at all the injected payload comprises just a sequence of gadget addresses intermixed with any necessary data arguments.

在上述代码重用技术中,执行的代码由一个或几个短指令序列组成,后跟属于库函数的一大段代码。 Hovav Shacham证明,仅使用经过仔细选择的一组以ret指令结尾的短指令序列(称为小工具),就可以实现任意计算,从而无需调用库函数[2]。 这种强大的技术被称为面向返回的编程,本质上为攻击者提供了由任意代码注入提供的相同级别的灵活性,而无需在所有注入的有效负载上注入任何代码,而仅由一系列小工具地址与任何必要的数据参数混合而成。

In a typical ROP exploit, the attacker needs to control both the program counter and the stack pointer: the former for executing the first gadget, and the latter for allowing its ret instruction to transfer control to subsequent gadgets. Depending on the vulnerability, if the ROP payload is injected in a memory area other than the stack, then the stack pointer must first be adjusted to the beginning of the payload through a stack pivot [6], [32]. In a follow up work [33], Checkoway et al. demonstrated that the gadgets used in a ROP exploit need not necessarily end with a ret instruction, but with any other indirect control transfer instruction.

在典型的ROP攻击中,攻击者需要控制程序计数器和堆栈指针:前者用于执行第一个小工具,后者用于允许其ret指令将控制权转移到后续小工具。 根据漏洞的不同,如果将ROP有效负载注入到堆栈之外的其他存储区域中,则必须首先通过堆栈枢纽[6],[32]将堆栈指针调整为有效负载的开头。 在后续工作中[33],Checkoway等人。 证明了ROP漏洞中使用的小工具不一定必须以ret指令结尾,而是以任何其他间接控制转移指令结尾。

The ROP code used in recent exploits against Windows applications is mostly based on gadgets ending with ret instructions, which conveniently manipulate both the program counter and the stack pointer, although a couple of gadgets ending with call or jmp are also used for calling library functions. In all publicly available Windows exploits so far, attackers do not have to rely on a fully ROP-based implemen- tation for the whole malicious code that needs to be executed. Instead, ROP code is used only as a first stage for bypassing DEP [1]. Typically, once control flow has been hijacked, the ROP code allocates a memory area with write and execute permissions by calling a library function like VirtualAlloc , copies into it some plain shellcode included in the attack vector, and finally jumps to the copied shellcode which now has execute permission [32].

在最近针对Windows应用程序的攻击中使用的ROP代码主要基于以ret指令结尾的小工具,该工具可以方便地操纵程序计数器和堆栈指针,尽管也有几个以call或jmp结尾的小工具也用于调用库函数。 到目前为止,在所有公开的Windows利用中,攻击者不必依赖完全基于ROP的实现来执行需要执行的全部恶意代码。 相反,ROP代码仅用作绕过DEP [1]的第一阶段。 通常,一旦控制流被劫持,ROP代码将通过调用诸如VirtualAlloc之类的库函数来分配具有写和执行权限的内存区域,将攻击向量中包含的一些简单的shellcode复制到其中,最后跳转到复制的shellcode,现在 具有执行权限[32]。

III. APPROACH

三, 方法

Our approach is based on the randomization of the code sections of binary executable files that are part of third-party applications, using an array of binary code transformation techniques. The objective of this randomization process is to break the code semantics of the gadgets that are present in the executable memory segments of a running process, without affecting the semantics of the actual program code.

我们的方法基于使用一系列二进制代码转换技术对作为第三方应用程序一部分的二进制可执行文件的代码部分进行随机化。 该随机化过程的目的是在不影响实际程序代码的语义的情况下打破存在于正在运行的进程的可执行内存段中的小工具的代码语义。

The execution of a gadget has a certain set of consequences to the CPU and memory state of the exploited process. The attacker chooses how to link the different gadgets together based on which registers, flags, or memory locations each gadget modifies, and in what way. Consequently, the execution of a subsequent gadget depends on the outcome of all previously executed gadgets. Even if the execution of a single gadget has a different outcome than the one anticipated by the attacker, then this will affect the execution of all subsequent gadgets, and it is likely that the logic of the malicious return-oriented code will be severely impacted.

小工具的执行会对被利用进程的CPU和内存状态产生一定的影响。 攻击者根据每个小工具修改的寄存器,标志或内存位置以及以何种方式,选择如何将不同的小工具链接在一起。 因此,后续小工具的执行取决于所有先前执行的小工具的结果。 即使单个小工具的执行结果与攻击者预期的结果不同,这也将影响所有后续小工具的执行,并且很可能会严重影响面向恶意返回代码的逻辑。

A. Why In-Place?

A.为什么要In-Place?

The concept of software diversification [34] is the basis for a wide range of protections against the exploitation of memory corruption vulnerabilities. Besides address space layout randomization [1], many techniques focus on the internal randomization of the code segments of executable, and can be combined with ASLR to increase process diversity [17]. Metamorphic transformations [35] can shift gadgets from their original offsets and alter many of their instructions, rendering them unusable. Another simpler and probably more effective approach is to rearrange existing blocks of code either at the function level [18], [19], [36], [37], or with finer granularity, at the basic block level [38], [39]. If all blocks of code are reordered so that no one resides at its original location, then all the offsets of the gadgets that the attacker would assume to be present in the code sections of the process will now correspond to completely different code.

软件多元化的概念[34]是针对各种保护措施的基础,这些保护措施可防止利用内存损坏漏洞。 除了地址空间布局随机化[1]以外,许多技术还专注于可执行代码段的内部随机化,并且可以与ASLR结合使用以增加处理多样性[17]。 变形变换[35]可以将小工具从其原始偏移中移出并更改其许多指令,从而使它们无法使用。 另一种更简单且可能更有效的方法是在功能级别[18],[19],[36],[37]或在基础块级别[38],[[ 39]。 如果对所有代码块进行了重新排序,使得没有人驻留在其原始位置,那么攻击者会假定其存在于该过程的代码部分中的小工具的所有偏移现在将对应于完全不同的代码。

These transformations require a precise view of all the code and data objects contained in the executable sections of a PE file, including their cross-references, as existing code needs to be shifted or moved. Due to computed jumps and intermixed data [40], complete disassembly coverage is possible only if the binary contains relocation and symbolic debugging information (e.g., PDB files) [19], [41], [42]. Unfortunately, debugging information is typically stripped from release builds for compactness and intellectual property protection.

这些转换需要精确地查看PE文件的可执行部分中包含的所有代码和数据对象,包括它们的交叉引用,因为现有代码需要移位或移动。 由于计算出的跳转和混合数据[40],只有二进制文件包含重定位和符号调试信息(例如PDB文件)[19],[41],[42]时,才有可能实现完整的拆卸范围。 不幸的是,调试信息通常从发行版本中剥离出来,以实现紧凑性和知识产权保护。

For Windows software, in particular, PE files (both DLL and EXE) usually do retain relocation information even if no debugging information has been retained [43]. The loader needs this information in case a DLL must be loaded at an address other than its preferred base address, e.g., because another library has already been mapped to that location. or for ASLR. In contrast to Linux shared libraries and PIC executables, which contain position-independent code, Windows binaries contain absolute addresses, e.g., as immediate instruction operands or initialized data pointers, that are valid only if the executable has been loaded at its preferred base address. The .reloc section of PE files contains a list of offsets relatively to each PE section that correspond to all absolute addresses at which a delta value needs to be added in case the actual load address is different [44].

特别是对于Windows软件,即使没有保留任何调试信息,PE文件(DLL和EXE)通常也会保留重定位信息[43]。 万一必须将DLL加载到其首选基址以外的地址,例如因为另一个库已经映射到该位置,则加载程序需要此信息。 或针对ASLR。 与包含位置无关代码的Linux共享库和PIC可执行文件相比,Windows二进制文件包含绝对地址,例如作为立即指令操作数或初始化的数据指针,它们仅在可执行文件已加载到其首选基址时才有效。 PE文件的.reloc节包含相对于每个PE节的偏移量列表,这些偏移量对应于所有绝对地址,如果实际加载地址不同,则需要在其中添加增量值[44]。

Relocation information alone , however, does not suffice for extracting a complete view of the code within the executable sections of a PE file [38], [41]. Without the symbolic debugging information contained in PDB files, although the location of objects that are reached only via indirect jumps can be extracted from relocation information, their actual type—code or data—still remains unknown. In some cases, the actual type of these objects could be inferred using heuristics based on constant propagation, but such methods are usually prone to misidentifications of data as code and vice versa. Even a slight shift or size increase of a single object within a PE section will incur cascading shifts to its following objects. Typically, an unidentified object that actually contains code will include PC-relative branches to other code objects. In the absence of the debugging information contained in PDB files, moving such an unidentified code block (or any of its relatively referenced objects) without fixing the displacements of all its relative branch instructions that reference other objects, will result to incorrect code.

然而,仅重定位信息不足以在PE文件的可执行部分中提取代码的完整视图[38],[41]。没有PDB文件中包含的符号调试信息,尽管只能从重定位信息中提取仅通过间接跳转到达的对象的位置,但它们的实际类型(代码或数据)仍然未知。在某些情况下,可以使用基于恒定传播的启发式方法来推断这些对象的实际类型,但是这种方法通常容易将数据错误地标识为代码,反之亦然。即使PE部分中单个对象的轻微移动或大小增加,也会导致其后续对象的级联移动。通常,实际包含代码的未识别对象将包括PC相对于其他代码对象的分支。在PDB文件中没有调试信息的情况下,在不固定引用所有其他对象的所有相对分支指令的位移的情况下,移动此类未标识的代码块(或其任何相对引用的对象)将导致错误的代码。

Given the above constraints, we choose to use only binary code transformations that do not alter the size and location of code and data objects within the executable, allowing the randomization of third-party PE files without symbolic debugging information. Although this restriction does not allow us to apply extensive code transformations like basic block reordering or metamorphism, we can still achieve partial code randomization using narrow-scope modifications that can be safely applied even without complete disassembly coverage. This can be achieved through slight, in-place code modifications to the correctly identified parts of the code, that do not change the overall structure of basic blocks or functions, but which are enough to alter the outcome of short instruction sequences that can be used as gadgets.

鉴于上述限制,我们选择仅使用不更改可执行文件中代码和数据对象的大小和位置的二进制代码转换,从而允许在没有符号调试信息的情况下对第三方PE文件进行随机化。 尽管此限制不允许我们应用诸如基本块重新排序或变质之类的广泛代码转换,但我们仍然可以使用窄范围修改来实现部分代码随机化,即使没有完整的拆卸范围,也可以安全地应用。 这可以通过对正确识别的代码部分进行少量的就地代码修改来实现,这些修改不会改变基本块或功能的整体结构,但足以改变可以使用的简短指令序列的结果 作为小玩意。

B. Code Extraction and Modification

B.代码提取和修改

Although completely accurate disassembly of stripped x86 binaries is not possible, state-of-the-art disassemblers achieve decent coverage for code generated by the most commonly used compilers, using a combination of different disassembly algorithms [40], the identification of specific code constructs [45], and simple data flow analysis [46]. For our prototype implementation, we use IDA Pro [47] to extract the code and identify the functions of PE executables. IDA Pro is effective in the identification of function boundaries, even for functions with non-contiguous code and extensive use of basic block sharing [48], and also takes advantage of the relocation information present in Windows DLLs.

尽管不可能完全精确地反汇编剥离的x86二进制文件,但是最先进的反汇编程序结合使用不同的反汇编算法[40],可以识别最常用的编译器生成的代码,[40]可以识别特定的代码构造 [45]和简单的数据流分析[46]。 对于我们的原型实现,我们使用IDA Pro [47]提取代码并确定PE可执行文件的功能。 IDA Pro可以有效地识别功能边界,即使对于具有非连续代码和广泛使用基本块共享的功能[48],也可以利用Windows DLL中存在的重定位信息。

Typically, however, without the symbolic information of PDB files, a fraction of the functions in a PE executable are not identified, and parts of code remain undiscovered. Our code transformations are applied conservatively, only on parts of the code for which we can be confident that have been accurately disassembled. For instance, IDA Pro speculatively disassembles code blocks that are reached only through computed jumps, taking advantage of the relocation information contained in PE files. However, we do not enable such heuristic code extraction methods in order to avoid any disastrous modifications due to potentially misidentified code. In practice, for the code generated by most compilers, relocation information also ensures that the correctly identified basic blocks have no entry point other than their first instruc- tion. Similarly, some transformations that rely on the proper identification of functions are applied only on the code of correctly recognized functions. Our implementation is separate from the actual code extraction framework used, which means that IDA Pro can be replaced or assisted by alternative code extraction approaches [41], [49], [50], providing better disassembly coverage.

但是,通常,在没有PDB文件的符号信息的情况下,PE可执行文件中的部分功能无法识别,并且部分代码仍未被发现。我们的代码转换会保守地应用,仅应用于我们可以确信已经正确反汇编的部分代码。例如,IDA Pro利用PE文件中包含的重定位信息推测性地分解仅通过计算的跳转才能到达的代码块。但是,我们不会启用这种启发式代码提取方法,以避免由于可能错误识别代码而造成的任何灾难性修改。实际上,对于大多数编译器生成的代码,重定位信息还可以确保正确识别的基本块除其第一条指令外没有其他入口点。同样,某些依赖于正确识别功能的转换仅应用于正确识别的功能的代码。我们的实现与使用的实际代码提取框架是分开的,这意味着IDA Pro可以用替代的代码提取方法[41],[49],[50]代替或协助,以提供更好的拆卸范围。

After code extraction, disassembled instructions are first converted to our own internal representation, which holds ad- ditional information such as any implicitly used registers, and the registers and flags read or written by the instruction. For correctness, we also track the use of general purpose registers even in floating point, MMX, and SSE instructions. Although these type of instructions have their own set of registers, they do use general purpose registers for memory references (e.g., as the fmul instruction in Fig. 1). We then proceed and apply the in-place code transformations discussed in the following section. These are applied only on the parts of the executable segments that contain (intended or unintended [2]) instruction sequences that can be used as gadgets. As a result of some of the transformations, instructions may be moved from their original locations within the same basic block. In these cases, for instructions that contain an absolute address in some of their operands, the corresponding entries in the .reloc sections of the randomized PE file are updated with the new offsets where these absolute addresses are now located.

提取代码后,首先将反汇编后的指令转换为我们自己的内部表示形式,该内部表示形式包含其他信息,例如任何隐式使用的寄存器以及该指令读取或写入的寄存器和标志。为了正确起见,即使在浮点,MMX和SSE指令中,我们也跟踪通用寄存器的使用。尽管这些类型的指令具有它们自己的寄存器组,但是它们的确将通用寄存器用于存储器引用(例如,如图1中的fmul指令)。然后,我们继续并应用下一节中讨论的就地代码转换。这些仅适用于包含(意料之中或意料之外[2])可用作小工具的指令序列的可执行段部分。作为某些转换的结果,指令可以从它们在同一基本块内的原始位置移出。在这些情况下,对于某些操作数中包含绝对地址的指令,将使用这些绝对地址现在所在的新偏移量来更新随机PE文件.reloc节中的相应条目。

Our prototype implementation processes each PE file individually, and generates multiple randomized copies that can then replace the original. Given the complexity of the analysis required for generating a set of randomized instances of an input file (in the order of a few minutes on average for the PEs used in our tests), this allows the off-line generation of a pool of randomized PE files for a given application. Note that for most of the tested Windows applications, only some of the DLLs need to be randomized, as the rest are usually ASLR enabled (although they can also be randomized for increased protection). In a production deployment, a system service or a modified loader can then pick a different randomized version of the required PEs each time the application is launched, following the same way of operation as tools like EMET [51].

我们的原型实现单独处理每个PE文件,并生成多个随机副本,然后可以替换原始副本。 考虑到生成输入文件的随机实例集所需的分析复杂性(对于我们的测试中使用的PE,平均大约需要几分钟的时间),因此可以离线生成随机PE池 给定应用程序的文件。 请注意,对于大多数经过测试的Windows应用程序,只需要对某些DLL进行随机化,因为其余的DLL通常启用了ASLR(尽管也可以将它们随机化以增强保护)。 在生产部署中,系统服务或经过修改的加载器可以在每次启动应用程序时按照所需的PE选择不同的随机版本,遵循与EMET [51]之类的工具相同的操作方式。

IV. INPLACE CODE TRANSFORMATIONS

IV。 插入代码转换

In this section we present in detail the different code transformations used for in-place code randomization. Although some of the transformations such as instruction reordering and register reassignment are also used by compilers and polymorphic code engines for code optimization [52] and obfuscation [35], applying them at the binary level—without having access to the higher-level structural and semantic information available in these settings—poses significant challenges.

在本节中,我们详细介绍用于就地代码随机化的不同代码转换。 尽管编译器和多态代码引擎也使用了某些转换(例如指令重排序和寄存器重分配)来进行代码优化[52]和混淆处理[35],但将它们应用于二进制级别,而无需访问更高级别的结构和 这些设置中可用的语义信息提出了重大挑战。

A. Atomic Instruction Substitution

A.原子指令替代

One of the basic concepts of code obfuscation and metamorphism [35] is that the exact same computation can be achieved using a countless number of different instruction combinations. When applied for code randomization, substituting the instructions of a gadget with a functionally-equivalent—but different—sequence of instructions would not affect any ROP code that uses that gadget, since its outcome would be the same. However, by modifying the instructions of the original program code, this transformation in essence modifies certain bytes in the code image of the program, and consequently, can drastically alter the structure of non-intended instruction sequences that overlap with the substituted instructions.

代码混淆和变质的基本概念之一[35]是使用无数不同的指令组合可以实现完全相同的计算。 当应用于代码随机化时,用功能上等效但不同的指令序列代替小工具的指令不会影响使用该小工具的任何ROP代码,因为其结果将是相同的。 但是,通过修改原始程序代码的指令,此转换本质上修改了程序代码映像中的某些字节,因此,可以彻底更改与替换指令重叠的非预期指令序列的结构。

Many of the gadgets used in ROP code consist of unaligned instructions that have not been emitted by the compiler, but which happen to be present in the code image of the process due to the density and variable-length nature of the x86 instruction set. In the example of Fig. 1(a), the actual code generated by the compiler consists of the instructions mov; cmp; lea; starting at byte B0 . [The code of all examples throughout the paper comes from icucnv36.dll, included in Adobe Reader v9.3.4. This DLL was used for the ROP code of a DEP-bypass exploit for CVE-2010-2883 [53] (see Table II).] However, when disassembling from the next byte, a useful non-intended gadget ending with ret is found.

ROP代码中使用的许多小工具由未对齐的指令组成,这些指令未由编译器发出,但由于x86指令集的密度和可变长度性质,它们恰好出现在进程的代码映像中。 在图1(a)的示例中,由编译器生成的实际代码由mov指令组成; cmp; lea; 从字节B0开始。 [本文中所有示例的代码都来自icucnv36.dll,该文件包含在Adobe Reader v9.3.4中。 此DLL用于CVE-2010-2883的DEP-bypass漏洞的ROP代码[53](请参阅表II)。但是,当从下一个字节反汇编时,发现有用的以ret结尾的非预期小工具。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-rTpcoxCV-1590416248065)(./粉碎Gadgets:使用就地代码随机化防御面向返回的编程/fig-1.png)]

Compiled code is highly optimized, and thus the replacement of even a single instruction in the original program code usually requires either a longer instruction, or a combination of more than one instruction, for achieving the same purpose. Given that our aim is to randomize the code of stripped binaries, even a slight increase in the size of a basic block is not possible, which makes the most commonly used instruction substitution techniques unsuitable for our purpose.

编译后的代码经过高度优化,因此,即使要替换原始程序代码中的单个指令,通常也需要更长的指令或多个指令的组合来实现相同的目的。 鉴于我们的目标是使剥离二进制文件的代码随机化,因此即使基本块大小的微小增加也是不可能的,这使得最常用的指令替换技术不适合我们的目的。

In certain cases though, it is possible to replace an instruction with a single, functionally-equivalent instruction of the same length, thanks to the flexibility offered by the extensive x86 instruction set. Besides obvious candidates based on replacing addition with negative subtraction and inversely, there are also some instructions that come in different forms, with different opcodes, depending on the supported operand types. For example, add r/m32,r32 stores the result of the addition in a register or memory operand (r/m32), while add r32,r/m32 stores the result in a register (r32). Although these two forms have different opcodes, the two instructions are equivalent when both operands happen to be registers. Many arithmetic and logical instructions have such dual equivalent forms, while in some cases there can be up to five equivalent instructions (e.g., test r/m8,r8 , or r/m8,r8 , or r8, r/m8 , and r/m8,r8 , and r8,r/m8 , affect the flags of the EFLAGS register in the same way when both operands are the same register). In our prototype implementation we use the sets of equivalent instructions used in Hydan [54], a tool for hiding information in x86 executables, with the addition of one more set that includes the equivalent versions of the xchg instruction.

但是,在某些情况下,由于扩展的x86指令集提供了灵活性,因此可以用一条相同长度的功能上等效的指令替换一条指令。除了基于用负减和反替换来代替加法的明显候选者外,还有一些指令以不同的形式出现,具有不同的操作码,具体取决于所支持的操作数类型。例如,add r / m32,r32将加法的结果存储在寄存器或内存操作数(r / m32)中,而add r32,r / m32将结果存储在寄存器(r32)中。尽管这两种形式具有不同的操作码,但是当两个操作数都恰好是寄存器时,两条指令是等效的。许多算术和逻辑指令具有这种双重等效形式,而在某些情况下,最多可以有五个等效指令(例如,测试r / m8,r8或r / m8,r8或r8,r / m8和r /当两个操作数是同一寄存器时,m8,r8和r8,r / m8以相同的方式影响EFLAGS寄存器的标志。在我们的原型实现中,我们使用Hydan [54]中使用的等效指令集,该工具用于在x86可执行文件中隐藏信息,并另外添加了一组包含xchg指令的等效版本。

As shown in Fig. 1(b), both operands of the cmp instruction are registers, and thus it can be replaced by its equivalent form, which has different opcode and ModR/M bytes [55]. Although the actual program code does not change, the ret instruction that was “included” in the original cmp instruction has now disappeared, rendering the gadget unusable. In this case, the transformation completely eliminates the gadget, and thus will be applied in all instances of the randomized binary. In contrast, when a substitution does not affect the gadget’s final indirect jump, then it is applied probabilistically.

如图1(b)所示,cmp指令的两个操作数都是寄存器,因此可以用等效形式替换,该等效形式具有不同的操作码和ModR / M字节[55]。 尽管实际的程序代码没有更改,但是原来包含在原始cmp指令中的ret指令现在已消失,从而使小工具无法使用。 在这种情况下,转换会完全消除小工具,因此将应用于随机二进制的所有实例。 相反,当替换不影响小工具的最终间接跳转时,则可以概率地应用它。

B. Instruction Reordering

B.指令重新排序

In certain cases, it is possible to reorder the instructions of small self-contained code fragments without affecting the correct operation of the program. This transformation can significantly impact the structure of non-intended gadgets, but can also break the attacker’s assumptions about gadgets that are part of the actual machine code.

在某些情况下,可以对小的独立代码片段的指令重新排序,而不会影响程序的正确操作。 这种转换可能会严重影响非预期小工具的结构,但也会破坏攻击者对实际机器代码中包含的小工具的假设。

  1. Intra Basic Block Reordering: The actual instruction scheduling chosen during the code generation phase of a compiler depends on many factors, including the cost of instructions in cycles, and the applied code optimization techniques [52]. Consequently, the code of a basic block is often just one among several possible instruction orderings that are all equivalent in terms of correctness. Based on this observation, we can partially modify the code within a basic block by reordering some of its instructions according to an alternative instruction scheduling.

1)内部基本块重排序:在编译器的代码生成阶段选择的实际指令调度取决于许多因素,包括循环中指令的成本以及所应用的代码优化技术[52]。 因此,基本块的代码通常只是几种可能的指令顺序之一,这些指令顺序在正确性方面都是等效的。 基于这种观察,我们可以通过根据替代指令调度对一些指令进行重新排序来部分修改基本块中的代码。

The basis for deriving an alternative instruction scheduling is to determine the ordering relationships among the instructions, which must always be satisfied to maintain code correctness. The dependence graph of a basic block represents the instruction interdependencies that constrain the possible instruction schedules [56]. Since a basic block contains straightline code, its dependence graph is a directed acyclic graph with machine instructions as vertices, and dependencies between instructions as edges. We apply dependence analysis on the code of disassembled basic blocks to build their dependence graph using an adaptation of a standard dependence DAG construction algorithm [56, Fig. 9.6] for machine code. Applying dependence analysis directly on machine code requires a careful treatment of the dependencies between x86 instructions. Compared to the analysis of code expressed in an intermediate representation form, this includes the identification of data dependencies not only between register and memory operands, but also between CPU flags and implicitly used registers and memory locations.

得出备用指令调度的基础是确定指令之间的排序关系,必须始终满足这些关系才能保持代码正确性。基本块的依存关系图表示约束可能的指令调度的指令相互依存关系[56]。由于基本块包含直线代码,因此其依存关系图是有向无环图,以机器指令为顶点,指令之间的依存关系为边。我们使用对机器代码的标准依赖性DAG构造算法[56,图9.6]的改编,对分解后的基本块的代码进行依赖性分析,以构建其依赖性图。直接在机器代码上应用依赖性分析需要对x86指令之间的依赖性进行仔细的处理。与以中间表示形式表示的代码分析相比,这不仅包括对寄存器和内存操作数之间,而且还包括CPU标志与隐式使用的寄存器和内存位置之间的数据依赖性的标识。

For each instruction i , we derive the sets use[i] and def [i] with the registers used and defined by the instruction. Besides register operands and registers used as part of effective address computations, this includes any implicitly used registers. For example, the use and def sets for pop eax are {esp} and {eax, esp} , while for rep stosb[stosb (Store Byte to String) copies the least significant byte from the eax register to the memory location pointed by the edi register and increments edi ’s value by one. The rep prefix repeats this instruction until ecx ’s value reaches zero, while decreasing it after each repetition.] are {ecx, eax, edi} and {ecx, edi} , respectively. We initially assume that all instructions in the basic block depend on each other, and then check each pair for read-after-write (RAW), write-after-read (WAR), and write-after-write (WAW) dependencies. For example, i 1 i_1 i1 and i 2 i_2 i2 have a RAW dependency if any of the following conditions is true: i) def [ i 1 i_1 i1 ] ∩ use[ i 2 i_2 i2 ] 6 = ∅ , ii) the destination operand of i 1 i_1 i1 and the source operand of i 2 i_2 i2 are both a memory location, iii) i 1 i_1 i1 writes at least one flag read by i 2 i_2 i2 .

对于每条指令i,我们使用指令使用和定义的寄存器派生集合use [i]def [i]。除了用作有效地址计算一部分的寄存器操作数和寄存器外,还包括任何隐式使用的寄存器。例如,pop eax的use和def设置分别是{esp}和{eax,esp},而rep stosb [stosb(将字节存储到字符串)则复制最低有效位。从eax寄存器到edi寄存器指向的存储位置的一个字节,并将edi的值加1。 rep前缀会重复执行此指令,直到ecx的值达到零为止,然后在每次重复执行后将其减小。]分别是{{ecx,eax,edi}{ecx,edi}`。我们最初假定基本块中的所有指令都相互依赖,然后检查每对指令的写后读(RAW),读后写(WAR)和写后写(WAW)依赖性。例如,如果满足以下任一条件,则$ i_1 和 和 i_2 具 有 R A W 依 赖 关 系 : i ) d e f [ 具有RAW依赖关系:i)def [ RAWidef[ i_1 ] ∩ u s e [ ]∩use [ ]use[ i_2 ] 6 = ∅ , i i ) ] 6 =∅,ii) ]6=iii_1 的 目 标 操 作 数 的目标操作数 和$ $i_2 的 源 操 作 数 都 是 一 个 内 存 位 置 , i i i ) 的源操作数都是一个内存位置,iii) iii i_1 写 入 至 少 一 个 写入至少一个 i_2 $读取的标志。

Note that condition ii) is quite conservative, given that i 2 i_2 i2 will actually depend on i 1 i_1 i1 only if i 2 i_2 i2 reads the same memory location written by i 1 i_1 i1 . However, unless both memory operands use absolute addresses, it is hard to determine statically if the two effective addresses point to the same memory location. In our future work, we plan to use simple data flow analysis to relax this condition. Besides instructions with memory operands, this condition should also be checked for instructions with implicitly accessed memory locations, e.g., push and pop . The conditions for WAR and WAW dependencies are analogous. If no conflict is found between two instructions, then there is no constraint in their execution order.

请注意,条件ii)相当保守,因为仅当$ i_2 读 取 由 读取由 i_1 写 入 的 相 同 内 存 位 置 时 , 写入的相同内存位置时, i_2 实 际 上 将 依 赖 于 实际上将依赖于 i_1 $。 但是,除非两个内存操作数都使用绝对地址,否则很难静态确定两个有效地址是否指向同一内存位置。 在未来的工作中,我们计划使用简单的数据流分析来缓解这种情况。 除了带有内存操作数的指令之外,还应该检查此条件是否有隐式访问的内存位置的指令,例如,pushpop。 WAR和WAW依赖项的条件类似。 如果在两条指令之间没有发现冲突,则它们的执行顺序没有任何约束。

Figure 2(a) shows the code of a basic block that contains a non-intended gadget, and Fig. 3 its corresponding dependence DAG. Instructions not connected via a direct edge are independent, and have no constraint in their relative execution order. Given the dependence DAG of a basic block, the possible orderings of its instructions correspond to the different topological sorting arrangements of the graph [57]. Fig. 2(b) shows one of the possible alternative orderings of the original code. The locations of all but one of the instructions and the values of all but one of the bytes have changed, eliminating the non-intended gadget contained in the original code. Although a new gadget has appeared a few bytes further into the block, (ending again with a ret instruction at byte C3 ), an attacker cannot depend on it since alternative orderings will shift it to other locations, and some of its internal instructions will always change (e.g., in this example, the useful pop ecx is gone). In fact, the ret instruction can be eliminated altogether using atomic instruction substitution.

图2(a)显示了包含非预期小工具的基本块的代码,图3显示了其对应的依赖项DAG。未通过直接边连接的指令是独立的,并且在它们的相对执行顺序上没有约束。给定基本块的依赖DAG,其指令的可能排序对应于图形的不同拓扑排序安排[57]。图2(b)显示了原始代码的一种可能的替代顺序。除了一条指令之外的所有指令的位置以及字节之一以外的所有字节的值均已更改,从而消除了原始代码中包含的非预期小工具。尽管一个新的小工具已出现在该块的后面几个字节处(再次以字节C3处的ret指令结束),但是攻击者无法依赖它,因为其他顺序会将其转移到其他位置,并且其某些内部指令将始终更改(例如,在此示例中,有用的pop ecx消失了)。实际上,可以使用原子指令替换完全消除ret指令。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-kIE1uaZS-1590416248068)(./粉碎Gadgets:使用就地代码随机化防御面向返回的编程/fig-2.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-eq2ZaWUo-1590416248070)(./粉碎Gadgets:使用就地代码随机化防御面向返回的编程/fig-3.png)]

An underlying assumption we make here is that basic block boundaries will not change at runtime. If a computed control transfer instruction targets a basic block instruction other than its first, then reordering may break the semantics of the code. Although this may seem restrictive, we note that throughout our evaluation we did not encounter any such case. For compiler-generated code, IDA Pro is able to compute all jump targets even for computed jumps based on the PE relocation information. In the most conservative case, users may choose to disable instruction reordering and still benefit from the randomization of the other techniques—Section V includes results for each technique individually.

我们在此做出的基本假设是,基本块边界在运行时不会改变。 如果计算出的控制传递指令的目标不是第一条指令,那么重新排序可能会破坏代码的语义。 尽管这似乎是限制性的,但我们注意到在整个评估过程中,我们没有遇到任何此类情况。 对于编译器生成的代码,IDA Pro甚至可以基于PE重定位信息为已计算的跳转计算所有跳转目标。 在最保守的情况下,用户可以选择禁用指令重新排序,并且仍然可以从其他技术的随机化中受益—第V节分别包括每种技术的结果。

  1. Reordering of Register Preservation Code: The calling convention followed by the majority of compilers for Windows on x86 architectures, similarly to Linux, specifies that the ebx , esi , edi , and ebp registers are callee-saved [58]. The remaining general purpose registers, known as scratch or volatile registers, are free for use by the callee without restrictions. Typically, a function that needs to use more than the available scratch registers, preserves any non-volatile registers before modifying them by storing their values on the stack. This is usually done at the function prologue through a series of push instructions, as in the example of Fig. 4(a), which shows the very first and last instructions of a function. At the function epilogue, a corresponding series of pop instructions restores the saved values from the stack, right before returning to the caller. Sequences that contain pop instructions followed by ret are among the most widely used gadgets found in ROP exploits, since they allow the attacker to load registers with values that are supplied as part of the injected payload [59]. The order of the pop instructions is crucial for initializing each register with the appropriate value.

2)寄存器保存代码的重新排序:与Linux类似,在x86架构上Windows的大多数编译器遵循的调用约定指定ebx,esi,edi和ebp寄存器是被调用者保存的[58]。其余的通用寄存器(称为暂存器或易失性寄存器)可以不受限制地供被调用方免费使用。通常,需要使用比可用暂存寄存器更多的功能的函数会在将所有非易失性寄存器存储在堆栈中之前对其进行保存,以保留它们。如图4(a)的示例所示,这通常是在功能序言中通过一系列推入指令完成的,图4(a)显示了功能的最开始和最后一条指令。在函数结尾处,相应的一系列弹出指令将在返回调用者之前立即从堆栈中恢复保存的值。包含弹出指令后跟ret的序列是ROP攻击中使用最广泛的小工具之一,因为它们使攻击者可以加载作为注入的有效载荷一部分提供的值的寄存器[59]。弹出指令的顺序对于用适当的值初始化每个寄存器至关重要。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-oyakXQfq-1590416248072)(./粉碎Gadgets:使用就地代码随机化防御面向返回的编程/fig-4.png)]

As seen in the function prologue, the compiler stores the values of the callee-saved registers in arbitrary order, and sometimes the relevant push instructions are interleaved with instructions that use previously-preserved registers. At the function epilogue, the saved values are pop ’ed from the stack in reverse order, so that they end up to the proper register. Consequently, as long as the saved values are restored in the right order, their actual order on the stack is irrelevant. Based on this observation, we can randomize the order of the push and pop instructions of register preservation code by maintaining the first-in-last-out order of the stored values, as shown in Fig. 4(b). In this example, there are six possible orderings of the three pop instructions, which means that any assumption that the attacker may make about which registers will hold the two supplied values, will be correct with a probability of one in six (or one in three, if only one register needs to be initialized). In case only two registers are preserved, there are two possible orderings, allowing the gadget to operate correctly half of the time.

从函数序言中可以看出,编译器以任意顺序存储被调用方保存的寄存器的值,有时相关的推入指令与使用以前保留的寄存器的指令交错。在函数结尾处,将保存的值以相反的顺序从堆栈中弹出,以便最终保存到正确的寄存器中。因此,只要以正确的顺序恢复保存的值,它们在堆栈中的实际顺序就无关紧要。基于这种观察,我们可以通过保持存储值的先进先出顺序来随机化寄存器保存代码的推入和弹出指令的顺序,如图4(b)所示。在此示例中,三个弹出指令有六种可能的顺序,这意味着攻击者可能做出关于哪个寄存器将保存两个提供的值的任何假设,都是正确的,概率为六分之一(或三分之一) ,如果只需要初始化一个寄存器)。如果仅保留两个寄存器,则有两种可能的排序方式,使小工具可以在一半时间内正确运行。

This transformation is applied conservatively, only to functions with accurately disassembled prologue and epilogue code. To make sure that we properly match the push and pop instructions that preserve a given register, we monitor the stack pointer delta throughout the whole function, as shown in the second column of Fig. 4(a). If the deltas at the prologue and epilogue do not match, e.g., due to call sites with unknown calling conventions throughout the function, or indirect manipulation of the stack pointer, then no randomization is applied. As shown in Fig. 4(b), any non-preservation instructions in the function prologue are reordered along with the push instructions by maintaining any interdependencies, as discussed in the previous section. For functions with multiple exit points, the preservation code at all epilogues should match the function’s prologue. Note that there can be multiple push and pop pairs for the same register, in case the register is preserved only throughout some of the execution paths of a function.

这种转换是保守地应用的,仅适用于具有正确反序的序言和结尾代码的函数。为了确保我们正确匹配保留给定寄存器的推入和弹出指令,我们在整个函数中监视堆栈指针增量,如图4(a)的第二列所示。如果在序言和结尾处的增量不匹配,例如,由于整个函数中的调用约定具有未知的调用约定,或者由于堆栈指针的间接操作,则不应用随机化。如图4(b)所示,通过保持任何相互依赖性,函数序言中的所有非保留指令都与推入指令一起重新排序,如上一节所述。对于具有多个退出点的函数,所有结尾处的保存代码应与函数的序言相匹配。请注意,如果仅在函数的某些执行路径中保留寄存器,则同一寄存器可以有多个推入和弹出对。

C. Register Reassignment

C.重新分配注册

Although the program points at which a certain variable should be stored in a register or spilled into memory are chosen by the compiler using sophisticated allocation algorithms, the actual name of the general purpose register that will hold a particular variable is mostly an arbitrary choice. Based on this observation, we can reassign the names of the register operands in the existing code according to a different—but equivalent—register assignment, without affecting the semantics of the original code. When considering each gadget as an autonomous code sequence, this transformation can alter the outcome of many gadgets, which will now read or modify different registers than those assumed by the attacker.

尽管编译器使用复杂的分配算法选择了将某个变量存储在寄存器中或溢出到内存中的程序点,但是将保存特定变量的通用寄存器的实际名称大部分是任意选择。 基于此观察,我们可以根据不同但等效的寄存器分配,在现有代码中重新分配寄存器操作数的名称,而不会影响原始代码的语义。 当将每个小工具视为自治代码序列时,此转换可以更改许多小工具的结果,这些小工具现在将读取或修改与攻击者假定的寄存器不同的寄存器。

Due to the much higher cost of memory accesses compared to register accesses, compilers strive to map as many variables as possible to the available registers. Consequently, at any point in a large program, multiple registers are usually in use, or live at the same time. Given the control flow graph (CFG) of a compiled program, a register r is live at a program point p iff there is a path from p to a use of r that does not go through a definition of r . The live range of r is defined as the set of program points where r is live, and can be represented as a subgraph of the CFG [60]. Since the same register can hold different variables at different points in the program, a register can have multiple disjoint live regions in the same CFG.

由于与寄存器访问相比,存储器访问的成本要高得多,因此编译器努力将尽可能多的变量映射到可用寄存器。 因此,在大型程序中的任何时候,通常都在使用多个寄存器,或者同时使用多个寄存器。 给定已编译程序的控制流程图(CFG),如果存在从p到使用r的路径,而没有经过r的定义,则寄存器r驻留在程序点p上。 r的有效范围定义为r所在的程序点集,并且可以表示为CFG的子图[60]。 由于同一个寄存器可以在程序中的不同点保留不同的变量,因此一个寄存器可以在同一CFG中具有多个不相交的活动区域。

For each correctly identified function, we compute the live ranges of all registers used in its body by performing liveness analysis [52] directly on the machine code. Given the CFG of the function and the sets use[i] and def [i] for each instruction i , we derive the sets in[i] and out[i] with the registers that are live-in and live-out at each instruction. For this purpose, we use a modified version of a standard live-variable analysis algorithm [52, Fig. 9.16] that computes the in and out sets at the instruction level, instead of the basic block level. The algorithm computes the two sets by iteratively reaching a fixed point for the following data-flow equations: in[i] = use[i] ∪ S (out[i]−def [i]) and out[i] = {in[s] : s ∈ succ[i]} , were succ[i] is the set of all possible successors of instruction i .

对于每个正确识别的功能,我们通过直接在机器代码上执行活动性分析[52]来计算其主体中使用的所有寄存器的活动范围。 给定函数的CFG以及每个指令i的集合use [i]和def [i],我们导出集合in [i]和out [i],其中每个寄存器都存在和不存在 指令。 为此,我们使用标准实时变量分析算法[52,图9.16]的修改版本,该算法在指令级别而不是基本块级别计算输入和输出集。 该算法通过迭代到达以下数据流方程的固定点来计算两组:in [i] = use [i]∪S(out [i] -def [i])和out [i] = {in [s]:s∈succ [i]},而succ [i]是指令i的所有可能后继者的集合。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-NxrhJLuO-1590416248074)(./粉碎Gadgets:使用就地代码随机化防御面向返回的编程/fig-5.png)]

Figure 5 shows part of the CFG of a function and the corresponding live ranges for eax and edi . Initially, we assume that all registers are live, since some of them may hold values that have been set by the caller. In this example, edi is live when entering the function, and the push instruction at line 2 stores (uses) its current value on the stack. The following mov instruction initializes (defines) edi , ending its previous live range ( d 0 ). Note that although a live range is a sub-graph of the CFG, we illustrate and refer to the different live ranges as linear regions for the sake of convenience.

图5显示了函数的CFG的一部分以及eax和edi的相应有效范围。 最初,我们假定所有寄存器都处于活动状态,因为其中一些寄存器可能包含调用者已设置的值。 在此示例中,edi在进入函数时处于活动状态,并且第2行的push指令将其当前值存储(使用)在堆栈中。 以下mov指令初始化(定义)edi,结束其先前的有效范围(d 0)。 请注意,尽管有效范围是CFG的子图,但为方便起见,我们将不同的有效范围说明并称为线性区域。

The next definition of edi is at line 15, which means that the last use of its previous value at line 11 also ends its previous live region d 1 . Region d 1 is a self-contained region, within which we can be confident that edi holds the same variable. The eax register also has a self-contained live region ( a 0 ) that runs in parallel with d 1 . Conceptually, the two live ranges can be extended to share the same boundaries. Therefore, the two registers can be swapped across all the instructions located within the boundaries of the two regions, without altering the semantics of the code.

edi的下一个定义是在第15行,这意味着在第11行最后一次使用其先前值也将结束其先前的有效区域d 1。 区域d 1是一个独立区域,在其中我们可以确信edi拥有相同的变量。 eax寄存器还具有一个与d 1并行运行的独立活动区域(a 0)。 从概念上讲,两个有效范围可以扩展为共享相同的边界。 因此,两个寄存器可以在位于两个区域边界内的所有指令之间交换,而无需更改代码的语义。

The call eax instruction at line 12 can be conveniently used by an attacker for calling a library function or another gadget. By reassigning eax and edi across their parallel live regions, any ROP code that would depend on eax for transferring control to the next piece of code, will now jump to an incorrect memory location, and probably crash. For code fragments with just two parallel live regions, an attacker can guess the right register half of the times. In many cases though, there are three or more general purpose registers with parallel live regions, or other available registers that are live before or after another register’s live region, allowing for a higher number of possible register assignments.

攻击者可以方便地使用第12行的call eax指令来调用库函数或其他小工具。 通过在它们的并行活动区域中重新分配eax和edi,任何依赖eax将控制权转移到下一段代码的ROP代码现在都将跳转到错误的内存位置,并且可能崩溃。 对于只有两个并行活动区域的代码片段,攻击者可以一半时间猜测正确的寄存器。 但是,在许多情况下,存在三个或更多具有并行有效区域的通用寄存器,或者在另一个寄存器的有效区域之前或之后有效的其他可用寄存器,从而允许进行更多可能的寄存器分配。

The registers used in the original code can be reassigned by modifying the ModR/M and sometimes the SIB byte of the relevant instructions. As in previous code transformations, besides altering the operands of instructions in the existing code, these modifications can also affect overlapping instructions that may be part of non-intended gadgets. Note that implicitly used registers in certain instructions cannot be replaced. For example, the one-byte “move data from string to string” instruction ( movs ) always uses esi and edi as its source and destination operands, and there is no other one-byte instruction for achieving the same operation using a different set of registers [55]. Consequently, if such an instruction is part of the live region of one of its implicitly used registers, then this register cannot be reassigned throughout that region. For the same reason, we exclude esp from liveness analysis. Finally, although calling conventions are followed for most of the functions, this is not always the case, as compilers are free to use any custom calling convention for private or static functions. Most of these cases are conservatively covered through a bottom-up call analysis that discovers custom register arguments and return value registers.

原始代码中使用的寄存器可通过修改相关指令的ModR / M有时是SIB字节来重新分配。与以前的代码转换一样,除了更改现有代码中的指令操作数外,这些修改还可能影响重叠的指令,这些指令可能是非预期小工具的一部分。请注意,某些指令中隐式使用的寄存器无法替换。例如,一字节的“将数据从字符串移动到字符串”指令(movs)始终使用esi和edi作为其源操作数和目标操作数,并且没有其他的一字节指令来使用一组不同的参数来实现相同的操作。寄存器[55]。因此,如果该指令是其隐式使用的寄存器之一的活动区域的一部分,则无法在该区域中重新分配该寄存器。出于同样的原因,我们从活动性分析中排除了esp。最后,尽管大多数函数都遵循调用约定,但并非总是如此,因为编译器可以自由地将任何自定义调用约定用于私有或静态函数。通过自底向上的调用分析可以保守地涵盖大多数情况,这些分析会发现自定义寄存器参数和返回值寄存器。

First, all the external function definitions found in the import table of the DLL are marked as level-0 functions. IDA Pro can effectively distinguish between different calling conventions that these external functions may follow, and reports their declaration in the C language. Thus, in most cases, the register arguments and the return value register (if any) for each of the level-0 functions are known. For any call instruction to a level-0 function, its register arguments are added to call ’s set of implicitly read registers, and its return value registers are added to call ’s set of implicitly written registers.

首先,在DLL的导入表中找到的所有外部函数定义都标记为0级函数。 IDA Pro可以有效地区分这些外部函数可能遵循的不同调用约定,并以C语言报告其声明。 因此,在大多数情况下,每个0级功能的寄存器参数和返回值寄存器(如果有)都是已知的。 对于任何对第0级功能的调用指令,其寄存器参数将添加到调用的隐式读取寄存器集中,其返回值寄存器将添加到调用的隐式写入寄存器集中。

In the next phase, level-1 functions are identified as the set of functions that call only level-0 functions or no other function. Any registers read by a level-1 function, without prior writing them, are marked as its register arguments. Similarly, any registers written and not read before a return instruction are marked as return value registers. Again, the sets of implicitly read and written register of all the call instructions to level-1 functions are updated accordingly. Similarly, level-2 functions are the ones that call level-1 or level-0 functions, or no other function, and so on. The same process is repeated until no more function levels can be computed. The intuition behind this approach is that private functions, which may use non-standard calling conventions, are called by other functions in the same DLL and, in most cases, not through computed call instructions.

在下一阶段,将级别1功能标识为仅调用级别0功能或不调用其他功能的功能集。 一级函数读取的所有寄存器(未事先写入它们)都标记为其寄存器参数。 同样,任何在返回指令之前写入和未读取的寄存器都被标记为返回值寄存器。 同样,对一级功能的所有调用指令的隐式读写寄存器组也相应地更新。 同样,第2级功能是调用第1级或第0级功能的功能,或者没有其他功能,依此类推。 重复相同的过程,直到无法再计算功能级别为止。 这种方法背后的直觉是,可以使用非标准调用约定的私有函数由同一DLL中的其他函数调用,并且在大多数情况下,不是通过计算的调用指令来调用的。

V. EXPERIMENTAL EVALUATION

五,实验评价

A. Randomization Analysis

A.随机分析

  1. Coverage: A crucial aspect for the effectiveness of inplace code randomization is the randomization coverage in terms of what percentage of the gadgets found in an executable can be safely randomized. A gadget may remain intact for one of the following reasons: i) it is part of data embedded in a code segment, ii) it is part of code that could not be disassembled, or iii) it is not affected by any of our transformations. In this section, we explore the randomization coverage of our prototype implementation using a large data set of 5,235 PE files (both DLL and EXE), detailed in Table I.

1)覆盖率:就地代码随机化的有效性而言,至关重要的一个方面是随机覆盖率,即可执行文件中发现的小工具所占的百分比可以安全地随机化。 小工具可能由于以下原因之一保持完整:i)它是嵌入在代码段中的数据的一部分,ii)它是无法反汇编的代码的一部分,或iii)它不受我们任何转换的影响 。 在本节中,我们使用5235个PE文件(DLL和EXE)的大数据集探索原型实现的随机性范围,详细信息如表I所示。

We consider as a gadget [2] any intended or unintended instruction sequence that ends with an indirect control transfer instruction, and which does not contain i) a privileged or invalid instruction (can occur in non-intended instruction sequences), and ii) a control transfer instruction other than its final one, with the exception of indirect call (can be used in the middle of a gadget for calling a library function). We assume a maximum gadget length of five instructions, which is typical for existing ROP code implementations [2], [33]. For larger gadgets, it is possible that the modified part of the gadget may be irrelevant for the purpose of the attacker. For example, if only the first instruction of the gadget inc eax; pop ebx; ret; is randomized, this will not affect any ROP code that either does not rely on the value of eax at that point, or uses the shorter gadget pop ebx; ret; directly. For this reason, we consider all different subsequences with length between two to five instructions as separate gadgets.

我们将以间接控制转移指令结尾的任何有意或无意的指令序列视为小工具[2],并且该序列不包含i)特权或无效指令(可能出现在非预期的指令序列中),以及ii)a控制传递指令(除最后一条指令外),但间接调用除外(可以在小工具的中间使用,用于调用库函数)。我们假设小工具的最大长度为5条指令,这对于现有的ROP代码实现是典型的[2],[33]。对于较大的小工具,小工具的修改部分可能与攻击者的目的无关。例如,如果只有gadget inc eax的第一条指令;流行ebx;退是随机的,这不会影响任何不依赖该点eax值或使用较短的小工具pop ebx的ROP代码;退直。因此,我们将长度在2至5条指令之间的所有不同子序列视为单独的小工具。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-RhAq2iWK-1590416248076)(./粉碎Gadgets:使用就地代码随机化防御面向返回的编程/fig-6.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-TTRLkEHw-1590416248076)(./粉碎Gadgets:使用就地代码随机化防御面向返回的编程/fig-7.png)]

Figure 6 shows the percentage of modifiable gadgets out of all gadgets found in the executable sections of each PE file (solid line), as a cumulative fraction of all PE files in the data set. In about 85% of the PE files, more that 70% of the gadgets can be randomized by our code transformations. Many of the unmodified gadgets are located in parts of code that have not been extracted by IDA Pro, and which consequently will never be affected by our transformations. When considering only the gadgets that are contained within the disassembled code regions on which code randomization can be applied, the percentage of affected gadgets slightly increases (dashed line). Given that we do not take into account code blocks that have been identified by IDA Pro using speculative methods, this shows that the use of a more sophisticated code extraction mechanism will increase the number of gadgets that can be modified. Figure 7 shows the total percentage of gadgets modified by each code transformation technique for the same data set. Note that a gadget can be modified by more than one technique. Overall, the total percentage of modifiable gadgets across all PE files is about 76.9%, as shown in Table I.

图6显示了在每个PE文件的可执行部分(实线)中找到的所有小工具中可修改小工具的百分比,作为数据集中所有PE文件的累积分数。在大约85%的PE文件中,可以通过我们的代码转换将超过70%的小工具随机化。许多未修改的小工具位于IDA Pro尚未提取的代码部分中,因此,它们永远不会受到我们的转换的影响。当仅考虑可在其上应用代码随机化的反汇编代码区域中包含的小工具时,受影响的小工具的百分比会略有增加(虚线)。鉴于我们没有考虑IDA Pro使用推测性方法识别的代码块,这表明使用更复杂的代码提取机制将增加可修改的小工具的数量。图7显示了每种代码转换技术对同一数据集修改的小工具的总百分比。请注意,可以通过多种技术修改小工具。总体而言,所有PE文件中可修改小工具的总百分比约为76.9%,如表I所示。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-JVLX1hrG-1590416248077)(./粉碎Gadgets:使用就地代码随机化防御面向返回的编程/table-1.png)]

  1. Impact: We identify two qualitatively different ways in which a code transformation can impact a gadget. As discussed in Sec. IV-A, a gadget can be eliminated , if any of the applied transformations removes completely its final control transfer instruction. If the final control transfer instruction remains intact, a gadget can then be broken , if at least one of its internal instructions is altered, and the CPU and memory state after its execution is different than the original, i.e., the outcome of its computation is not the same. As shown in Table I, in the average case, about 9.5% of all gadgets contained in a PE file can be rendered completely unusable. For a vulnerable application, this already removes about one in ten of the available gadgets for the construction of ROP code. Although the rest of the modifiable gadgets (67.4%) is not eliminated, they can be “broken” by probabilistically modifying one or more of their instructions.

2)影响:我们确定代码转换可以影响小工具的两种质量上不同的方式。 如第二节所述。 如果应用的任何转换完全删除了其最终控制转移指令,则可以消除IV-A小工具。 如果最终控制转移指令保持不变,则小工具可以被破坏,如果其内部指令中的至少一个被更改,并且执行后的CPU和内存状态与原始状态不同,即其计算结果为 不一样。 如表I所示,在一般情况下,PE文件中包含的所有小工具中大约9.5%可以被完全禁用。 对于易受攻击的应用程序,这已经删除了大约十分之一的可用小工具来构建ROP代码。 尽管没有消除其余的可修改小工具(67.4%),但可以通过概率地修改其一个或多个指令来“破坏”它们。

In case some of the instructions in a broken gadget can never be altered, it is quite possible that part of its functionality will remain unaffected, and thus an attacker could still use it by relying only on its unmodifiable instructions. Especially for larger gadget sizes, if the possible modifications are clustered only around a certain part of the gadget, e.g., its first instructions, then an attacker could predictably rely on the rest of the gadget. We explore this issue by measuring the number of broken gadgets in which an instruction at a given position can be altered.

万一损坏的小工具中的某些指令无法更改,很有可能其部分功能将保持不受影响,因此攻击者仍可以仅依靠其不可修改的指令来使用它。 特别是对于较大的小工具,如果可能的修改仅集中在小工具的特定部分(例如其第一条指令)周围,则攻击者可以预见地依赖小工具的其余部分。 我们通过测量可更改给定位置的指令的损坏小工具的数量来探索此问题。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-V4nHjAtS-1590416248078)(./粉碎Gadgets:使用就地代码随机化防御面向返回的编程/fig-8.png)]

Figure 8 shows the impact of code randomization on a broken gadget’s instructions, according to their location within the gadget. Each group of bars corresponds to a different gadget length, and in each group, the leftmost bar corresponds to the leftmost instruction of the gadget. For all sizes, the probability that an instruction at a given position will be affected is quite evenly distributed and remains beyond 60%, with the exception of the final (control transfer) instruction. This is expected, since most of the transformations cannot affect the final instruction of intended gadgets (e.g., ret ). As we observe, the locations of the modified instructions in broken gadgets are almost equally unpredictable.

图8显示了代码随机化对损坏的小工具说明的影响,具体取决于它们在小工具中的位置。 每组条形对应于不同的小工具长度,并且在每组中,最左边的条形对应于小工具的最左指令。 对于所有大小,给定位置的一条指令将受到影响的概率分布得相当均匀,并且除最终(控制转移)指令外,其保持在60%以上。 这是预料之中的,因为大多数转换都不会影响预期的小工具的最终指令(例如ret)。 正如我们所观察到的,修改后的指令在损坏的小工具中的位置几乎同样不可预测。

  1. Entropy: Some of the code transformations can perturb a given instruction within a gadget only in a limited number of ways, while others can generate a larger number of permutations. For example, for instructions with only two equivalent forms, atomic instruction substitution can modify a particular location in a gadget only in one way, allowing for two possible states. On the other hand, intra basic block instruction reordering usually results to a large number of possible permutations, especially for larger basic blocks that contain many instructions with no interdependencies.

3)熵:某些代码转换只能以有限的几种方式干扰小工具中的给定指令,而其他一些则可以生成大量的排列。 例如,对于仅具有两种等效形式的指令,原子指令替换只能以一种方式修改小工具中的特定位置,并允许两种可能的状态。 另一方面,内部基本块指令的重新排序通常会导致大量可能的排列,特别是对于包含许多没有相互依赖性的指令的较大基本块。

Usually though, a broken gadget can be modified at multiple locations, and the same location can be altered in multiple ways by more than one code transformations. Consequently, the number of possible randomized states in which a broken gadget can exist, or its randomization entropy , corresponds to the product of the number of permutations that each of the different transformations can generate for that gadget. In the worst case, a broken gadget can exist in two possible states: its original form, or its alternative after modification. For example, there are only two possible orderings for the pop instructions in an intended gadget of the form pop reg; pop reg; ret; given that no other transformation can alter it.

不过,通常情况下,损坏的小工具可以在多个位置进行修改,并且同一位置可以通过一个以上的代码转换以多种方式进行更改。 因此,损坏的小工具可能存在的可能随机状态的数量或其随机熵对应于该小工具的每个不同转换可以生成的排列数量的乘积。 在最坏的情况下,损坏的小工具可能会以两种可能的状态存在:其原始形式或修改后的替代形式。 例如,在预期形式为pop reg的小工具中,pop指令只有两种可能的排序方式; 流行音乐 退 鉴于没有其他变换可以改变它。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-mBAqH7yS-1590416248079)(./粉碎Gadgets:使用就地代码随机化防御面向返回的编程/fig-9.png)]

Figure 9 shows the number of possible randomized versions of each gadget (including its original), as a cumulative fraction of all broken gadgets. As seen in the lower left corner, a small amount of about 12% of the gadgets can be modified only in one way, and thus can exist in two possible states. However, the randomization entropy increases exponentially, and the upper 80% of the gadgets have four or more randomized states. As more of the different transformations are applied on the same gadget, the randomization entropy increases to thousands of possible modified states.

图9显示了每个小工具(包括其原始版本)的可能随机版本的数量,作为所有损坏小工具的累积分数。 从左下角可以看到,只有大约一种方式的小配件中的12%只能以一种方式进行修改,因此可以两种可能的状态存在。 但是,随机化熵呈指数增长,并且小工具的上层80%具有四个或更多的随机状态。 随着将更多不同的变换应用于同一个小工具,随机化熵会增加到成千上万个可能的修改状态。

Although for a small amount of gadgets an attacker can have a 50% chance of guessing the actual behavior of a gadget, ROP code relies on a chain of many different gadgets to achieve its purpose (11–18 unique gadgets in the exploits we tested). Even if one of the gadgets behaves in a non-expected way, then the ROP code will fail. Given that code randomization typically breaks (or even eliminates) several of the gadgets used in a ROP exploit, the number of possible randomized states that can prevent the correct execution of the ROP code is usually very high, as demonstrated in Sec. V-C.

尽管对于少量的小工具,攻击者有50%的机会猜测小工具的实际行为,但ROP代码依赖于许多不同的小工具链来实现其目的(在我们测试的漏洞利用程序中,有11-18个独特的小工具) 。 即使其中一个小工具的行为异常,ROP代码也会失败。 鉴于代码随机化通常会破坏(甚至消除)ROP漏洞利用中使用的几个小工具,因此,如本节所述,可以阻止ROP代码正确执行的可能的随机状态数量通常很高。 V-C

B. Correctness and Performance

B.正确性和表现

One of the basic principles of our approach is that the different in-place code randomization techniques should be applied cautiously, without breaking the semantics of the program. A straightforward way to verify the correctness of our code transformations is to apply them on existing code and compare the outcome before and after modification. Simply running a randomized version of a third-party application and verifying that it behaves in the expected way can provide a first indication. However, using this approach, it is hard to exercise a significant part of the code, and potentially incorrect modifications may go unnoticed.

我们方法的基本原理之一是应谨慎应用不同的就地代码随机化技术,而不会破坏程序的语义。 验证我们代码转换正确性的一种直接方法是将它们应用于现有代码并比较修改前后的结果。 简单地运行随机版本的第三方应用程序并验证其是否以预期的方式运行可以提供第一个指示。 但是,使用这种方法很难执行大部分代码,并且可能会忽略潜在的错误修改。

For this purpose, we used the test suite of Wine [61], a compatibility layer that allows Windows applications to run on Unix-like operating systems. Wine provides alternative implementations of the DLLs that comprise the Windows API, and comes with an extensive test suite that covers the implementations of most functions exported by the core Windows DLLs. Each function is executed multiple times using various inputs that test different conditions, and the outcome of each execution is compared against a known, expected result. We ported the test code for about one third of the 109 DLLs included in the test suite of Wine v1.2.2, and used it directly on the actual DLLs from a Windows 7 installation. Using multiple randomized versions of each tested DLL, we verified that in all runs, all tests completed successfully.

为此,我们使用了Wine [61]测试套件,该套件是一个兼容性层,允许Windows应用程序在类似Unix的操作系统上运行。 Wine提供了构成Windows API的DLL的替代实现,并附带了一个扩展的测试套件,该套件涵盖了由核心Windows DLL导出的大多数功能的实现。 使用测试不同条件的各种输入多次执行每个功能,并将每个执行的结果与已知的预期结果进行比较。 我们移植了Wine v1.2.2测试套件中包含的109个DLL中约三分之一的测试代码,并将其直接用于Windows 7安装中的实际DLL。 使用每个测试的DLL的多个随机版本,我们验证了在所有运行中所有测试均成功完成。

We took advantage of the extensive and diverse code execution coverage of this experiment to also evaluate the impact of in-place code randomization to the runtime performance of the modified code. Among the different code transformations, instruction reordering is the only one that could potentially introduce some non-negligible overhead, given that sometimes the chosen ordering may be sub-optimal. We measured the overall CPU user time for the completion of all tests by taking the average time across multiple runs, using both the original and the randomized versions of the DLLs. In all cases, there was no observable difference in the two times, within measurement error.

我们利用了本实验广泛而多样的代码执行范围,还评估了就地代码随机化对修改后的代码的运行时性能的影响。 在不同的代码转换中,指令重新排序是唯一可能带来一些不可忽略的开销的指令,因为有时选择的排序可能不是最优的。 我们使用原始版本和随机版本的DLL,通过多次运行的平均时间来衡量完成所有测试所需的总体CPU用户时间。 在所有情况下,两次测量误差之间都没有可观察到的差异。

C. Effectiveness

C.效力

  1. ROP Exploits: We evaluated the effectiveness of in-place code randomization using publicly available ROP exploits against vulnerable Windows applications [53], [62], [63], as well as generic ROP payloads based on commonly used DLLs [64], [65]. These seven different ROP code implementations, listed in Table II, bypass Windows DEP and execute a second-stage shellcode, as described in Sec. II, and work even in the latest version of Windows, with DEP and ASLR enabled. The ROP code used in the three exploits is implemented with gadgets from one or a few DLLs that do not support ASLR, as shown in the second column of Table II. The number of unique gadgets used in each case varies between 10–18, and typically a large part of the gadgets is repeatedly executed at many points throughout the ROP code. When replacing the original non-ASLR DLLs of each application with randomized versions, in all cases the exploits were rendered unsuccessful. Similarly, we used a custom application to test the generic ROP payloads and verified that the ROP code did not succeed when the corresponding DLL was randomized.

1)ROP漏洞:我们使用公开可用的ROP漏洞针对易受攻击的Windows应用程序[53],[62],[63]以及基于常用DLL [64]的通用ROP有效负载评估了就地代码随机化的有效性,[65]。表II中列出的这七个不同的ROP代码实现绕过Windows DEP并执行第二阶段的shellcode,如本节中所述。 II,并且即使在启用DEP和ASLR的最新版本的Windows中也可以工作。如表II的第二栏中所示,这三个漏洞利用中的ROP代码是通过一个或几个不支持ASLR的DLL中的小工具实现的。每种情况下使用的唯一小工具的数量在10到18之间变化,并且通常在整个ROP代码的许多点重复执行大部分小工具。当用随机版本替换每个应用程序的原始非ASLR DLL时,在所有情况下,利用均未成功。同样,我们使用一个自定义应用程序来测试通用ROP有效负载,并验证了在随机化相应DLL时ROP代码没有成功。

The ROP code of the exploit against Acrobat Reader uses just 11 unique gadgets, all coming from a single non-ASLR DLL (icucnv36.dll). From these gadgets, in-place code randomization can alter six of them: one gadget is completely eliminated, while the other five broken gadgets have 2, 2, 3, 4, and 6 possible states, respectively, resulting to a total of 287 randomized states ( in addition to the always eliminated gadget, which also alone breaks the ROP code). Even if we assume that no elimination were possible, the exploit would still succeed only in one out of the 288 (0.35%) possible instances (including the original) of the given gadget set. Considering that this is a client-side exploit, in which the attacker will probably have only one or a few opportunities for tricking the user to open the malicious PDF file, the achieved randomization entropy is quite high—always assuming that none of the gadgets could have been eliminated. As shown in Table II, the number of possible randomized states in the rest of the cases is several orders of magnitude higher. This is mostly due to the larger number of broken gadgets, as well as due to a few broken gadgets with tens of possible modified states, which both increase the number of states exponentially.

针对Acrobat Reader的漏洞利用的ROP代码仅使用11个独特的小工具,全部来自单个非ASLR DLL(icucnv36.dll)。从这些小工具中,就地代码随机化可以更改其中的六个:一个小工具被完全消除,而其他五个损坏的小工具分别具有2、2、3、4和6个可能的状态,从而总共产生287个随机状态状态(除了总是删除的小工具之外,它还单独破坏了ROP代码)。即使我们假定无法消除,也只能在给定小工具集的288个实例(包括原始实例)中的288个(包括原始实例)中,只有一个实例能够成功利用该漏洞。考虑到这是一种客户端攻击,攻击者可能只有一个或几个机会诱骗用户打开恶意PDF文件,因此,实现的随机熵很高-始终假设所有小工具都不能已被淘汰。如表II所示,其余情况下可能的随机状态数要高几个数量级。这主要是由于损坏的小工具数量较多,以及少数具有数十个可能已修改状态的损坏小工具,这两种状态均以指数方式增加。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Njv996q9-1590416248080)(./粉碎Gadgets:使用就地代码随机化防御面向返回的编程/table-2.png)]

Next, we explored whether the affected gadgets could be directly replaced with unmodifiable gadgets in order to reliably circumvent our technique. Out of the six affected gadgets in the Adobe Reader exploit, only four can be directly replaced, meaning that the exploit cannot be trivially modified to bypass randomization. Furthermore, two of the gadgets have only one replacement each, and both replacements are found in code regions that are not discovered by IDA Pro—both could be randomized using a more precise code extraction method. For the rest of the ROP payloads, there are at least three irreplaceable gadgets in each case.

接下来,我们探究了是否可以用无法修改的小工具直接替换受影响的小工具,以便可靠地规避我们的技术。 在Adobe Reader漏洞利用程序中的六个受影响的小工具中,只能直接替换四个,这意味着无法轻易修改漏洞利用程序以绕过随机化。 此外,两个小工具每个只有一个替换,并且两个替换都在IDA Pro未发现的代码区域中找到-都可以使用更精确的代码提取方法将它们随机化。 对于其余的ROP有效负载,每种情况下至少有三个不可替代的小工具。

We should note that the relatively small number of gadgets used in most of these ROP payloads is a worst-case scenario for our technique, which however not only is able to prevent these exploits, but also does not allow the attacker to directly replace all the affected gadgets. Indeed, besides the more complex ROP payloads used in the Integard and Mplayer exploits, the rest of the payloads use API functions that are already imported by a non-ASLR DLL, and simply call them directly using hard-coded addresses. This type of API invocation is much simpler and requires fewer gadgets [26] compared to ROP code like the one used in the Integard and Mplayer exploits (16 and 18 unique gadgets, respectively), which first dynamically locates a pointer to kernel32.dll (always ASLR-enabled in Windows 7) and then gets a handle to VirtualProtect.

我们应该注意的是,大多数ROP有效载荷中使用的小工具数量对于我们的技术来说是最坏的情况,但是这不仅能够防止这些攻击,而且还不允许攻击者直接替换所有小工具。 受影响的小工具。 实际上,除了Integard和Mplayer漏洞中使用的更复杂的ROP有效负载外,其余有效负载还使用非ASLR DLL已导入的API函数,并直接使用硬编码地址直接调用它们。 与ROP代码(如Integard和Mplayer漏洞利用程序(分别为16和18个唯一的小工具)中使用的ROP代码)相比,这种类型的API调用要简单得多并且需要的小工具[26]少,ROP代码首先动态地定位指向kernel32.dll的指针( Windows 7中始终启用ASLR),然后获取VirtualProtect的句柄。

  1. Automated ROP Payload Generation: The fact that some of the randomized gadgets are not directly replaceable does not necessarily mean that the same outcome cannot be achieved using solely unmodifiable gadgets. To assess whether an attacker could construct a ROP payload resistant to inplace code randomization based on gadgets that cannot be randomized, we used Q [26] and Mona [27], two automated ROP code construction tools.

2)自动生成ROP有效负载:某些随机小工具不能直接替换的事实并不一定意味着仅使用不可修改的小工具无法实现相同的结果。 为了评估攻击者是否可以基于无法随机化的小工具构建抵御就地代码随机化的ROP有效负载,我们使用了Q [26]和Mona [27]这两种自动化的ROP代码构建工具。

Q is a general-purpose ROP compiler that uses semantic program verification techniques to identify the functionality of gadgets, and provides a custom language, named QooL, for writing input programs. Its current implementation only supports simple QooL programs that call a single function or system call, while passing a single custom argument. In case the function to be called belongs to an ASLR-enabled DLL, Q can compute a handle to it through the import table of a non-ASLR DLL [12], when applicable. We should note that although Q currently compiles only basic QooL programs that call a single API function, this does not limit our evaluation, but on the contrary, stresses even more our technique. The simpler the programs, the fewer the gadgets used, which makes it easier for Q to generate ROP code even when our technique limits the number of available gadgets.

Q是通用的ROP编译器,它使用语义程序验证技术来识别小工具的功能,并提供一种名为QooL的自定义语言来编写输入程序。 它的当前实现仅支持简单的QooL程序,这些程序调用单个函数或系统调用,同时传递单个自定义参数。 如果要调用的函数属于启用了ASLR的DLL,则在适用时,Q可以通过非ASLR DLL的导入表计算该函数的句柄[12]。 我们应该注意,尽管Q目前仅编译调用单个API函数的基本QooL程序,但这并不限制我们的评估,相反,它会更加强调我们的技术。 程序越简单,使用的小工具就越少,即使我们的技术限制了可用小工具的数量,Q也可以更轻松地生成ROP代码。

Mona is a plug-in for Immunity Debugger [66] that automates the process of building Windows ROP payloads for bypassing DEP. Given a set of non-ASLR DLLs, Mona searches for available gadgets, categorizes them according to their functionality, and then attempts to automatically generate four alternative ROP payloads for giving execute permission to the embedded shellcode and then invoking it, based on the VirtualProtect , VirtualAlloc , NtSetInformationProcess , and SetProcessDEPPolicy API functions (the latter two are not supported in Windows 7).

Mona是Immunity Debugger [66]的插件,可自动构建Windows ROP有效负载以绕过DEP的过程。 给定一组非ASLR DLL,Mona会搜索可用的小工具,并根据其功能对其进行分类,然后尝试自动生成四个替代ROP有效负载,以对嵌入式shellcode赋予执行权限,然后根据VirtualProtect对其进行调用 ,VirtualAlloc,NtSetInformationProcess和SetProcessDEPPolicy API函数(Windows 7不支持后两个函数)。

Considering the functionality of the ROP payloads generated by the two tools, Mona generates slightly more complex payloads, but its gadget composition engine is less sophisticated compared to Q’s. Q generates payloads that compute a function address, construct its single argument, and call it. Payloads generated by Mona also call a single memory allocation API function (which though requires the construction of several arguments), copy the shellcode to the newly allocated area, and transfer control to it. Note that the complexity of the ROP code used in the tested exploits is even higher, since they rely on up to four different API functions [53], or “walk up” the stack to discover pointers to non-imported functions from ASLR-enabled DLLs [62], [63].

考虑到这两个工具生成的ROP有效负载的功能,Mona生成的负载稍微复杂一些,但与Q相比,其小工具合成引擎不那么复杂。 Q生成有效载荷,该有效载荷计算函数地址,构造其单个参数并进行调用。 由Mona生成的有效载荷还调用单个内存分配API函数(尽管它需要构造多个参数),将shellcode复制到新分配的区域,然后将控制权转移到该区域。 请注意,经过测试的漏洞利用程序中使用的ROP代码的复杂度甚至更高,因为它们依赖于多达四个不同的API函数[53],或者“遍历”堆栈以发现启用了ASLR的未导入函数的指针。 DLL [62],[63]。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-9Y5TDpWk-1590416248081)(./粉碎Gadgets:使用就地代码随机化防御面向返回的编程/table-3.png)]

Table III shows the results of running Q and Mona on the same set of applications and DLLs used in the previous section (for applications, all non-ASLR DLLs are analyzed collectively), for two different cases: when all gadgets are available to the ROP compiler, and when only the nonrandomized gadgets are available. The second case aims to build a payload that will be functional even when code randomization is applied. Although both Q and Mona were able to create payloads when applied on the original DLLs in almost all cases, they failed to construct any payload using only non-randomized gadgets in all cases.

表III显示了在两种不同情况下,在上一节中使用的同一组应用程序和DLL(对于应用程序,所有非ASLR DLL都进行了综合分析)上运行Q和Mona的结果:两种情况:当所有小工具都可用于ROP时 编译器,以及只有非随机小工具可用时。 第二种情况旨在构建即使在应用代码随机化后仍能起作用的有效负载。 尽管在几乎所有情况下,Q和Mona都能在将其应用于原始DLL时创建有效载荷,但是在所有情况下,它们都无法仅使用非随机小工具来构造任何有效载荷。

Although our technique was able to prevent two different tools from automatically constructing reliable ROP code, this favorable outcome does not exclude the possibility that a functional payload could still be constructed based solely on non-randomized gadgets, e.g., in a manual way or using an even more sophisticated ROP compiler. However, it clearly demonstrates that in-place code randomization significantly raises the bar for attackers, and makes the construction of reliable ROP code much harder, even in an automated way.

尽管我们的技术能够防止两个不同的工具自动构建可靠的ROP代码,但这种良好的结果并不排除仍然可以仅基于非随机化小工具(例如,以手动方式或使用 甚至更复杂的ROP编译器。 但是,它清楚地表明,就地代码随机化显着提高了攻击者的门槛,并使可靠的ROP代码的构建变得更加困难,即使采用自动化方式也是如此。

This is reflected on the reduction in the number of available (non-randomized) gadgets after code randomization. Both tools operate in two phases: gadget discovery and code compilation. During the first phase, they search for useful gadgets and categorize them according to their functionality. Tables IV and V show the number of useful gadgets as reported by Q and Mona, respectively, that are available before and after randomization. As shown by the percentage of the remaining gadgets (last column), many gadget types have very few available gadgets or are eliminated completely, which makes the construction of reliable ROP code much harder.

这反映在代码随机化之后,可用(非随机)小工具数量的减少。 两种工具都分两个阶段运行:小工具发现和代码编译。 在第一阶段,他们搜索有用的小工具,并根据其功能对其进行分类。 表IV和表V分别显示了由Q和Mona报告的在随机化之前和之后可用的有用小工具的数量。 如剩余小工具的百分比所示(最后一列),许多小工具类型具有很少的可用小工具或被完全淘汰,这使可靠的ROP代码的构建变得更加困难。

VI. DISCUSSION

VI。 讨论

In-place code randomization may not always randomize a significant part of the executable address space, and it is hard to give a definitive answer on whether the remaining unmodifiable gadgets would be sufficient for constructing useful ROP code. This depends on the code in the nonASLR address space of the particular vulnerable process, as well as on the actual operations that need to be achieved using ROP code. Note that Turing-completeness is irrelevant for practical exploitation [26], and none of the gadget sets used in the tested ROP payloads is Turing-complete. For this reason, we emphasize that in-place code randomization should be used as a mitigation technique, in the same fashion as application armoring tools like EMET [51], and not as a complete prevention solution.

就地代码随机化可能并不总是将可执行地址空间的重要部分随机化,并且很难就剩余的不可修改的小工具是否足以构成有用的ROP代码给出明确的答案。 这取决于特定漏洞进程的nonASLR地址空间中的代码,以及取决于需要使用ROP代码实现的实际操作。 注意,图灵完备性与实际开发无关[26],并且在测试的ROP有效载荷中使用的小工具集都不是图灵完备性。 因此,我们强调就地代码随机化应作为缓解技术,以与EMET [51]等应用程序防护工具相同的方式使用,而不应作为完整的预防解决方案。

As previous studies [2], [5], [26] have shown, though, the feasibility of building a ROP payload is proportional to the size of the non-ASLR code base, and reversely proportional to the complexity of the desired functionality. Our experimental evaluation shows that in all cases, the space of the remaining useful gadgets after randomization is sufficiently small to prevent the automated generation of a ROP payload. At the same time, the tested ROP payloads are far from the complexity of a fully blown ROP-based implementation of the operations required for carrying out an attack, such as dumping a malicious executable on disk and executing it. Currently, this functionality is handled by the embedded shellcode, which in essence allows us to view these ROP payloads as sophisticated versions of return-to-libc. We should stress that the randomization coverage of our prototype implementation is a lower bound for what would be possible using a more sophisticated code extraction method [41], [49]. In our future work, we also plan to relax some of the conservative assumptions that we have made in instruction reordering and register reassignment, using data flow analysis based on constant propagation.

但是,如先前的研究[2],[5],[26]所示,构建ROP有效负载的可行性与非ASLR代码库的大小成正比,而与所需功能的复杂度成反比。我们的实验评估表明,在所有情况下,随机化后剩余的有用小工具的空间都足够小,可以防止自动生成ROP有效负载。同时,经过测试的ROP有效负载远非完全基于ROP的实施攻击所需的操作的复杂性,例如,将恶意可执行文件转储到磁盘上并执行它。当前,此功能由嵌入式shellcode处理,从本质上讲,它使我们可以将这些ROP有效负载视为return-to-libc的复杂版本。我们应该强调,原型实现的随机覆盖范围是使用更复杂的代码提取方法可能实现的范围的下限[41],[49]。在未来的工作中,我们还计划使用基于恒定传播的数据流分析来放松一些在指令重排序和寄存器重分配中所做的保守假设。

Given its practically zero overhead and direct applicability on third-party executables, in-place code randomization can be readily combined with existing techniques to improve diversity and reduce overheads. For instance, compiler-level techniques against ROP attacks [15], [16] increase significantly the size of the generated code, and also affect the runtime overhead. Incorporating code randomization for eliminating some of the gadgets could offer savings in code expansion and runtime overheads. Our technique is also applicable in conjunction with randomization methods based on code block reordering [17]–[19], to further increase randomization entropy. In-place code randomization at the binary level is not applicable for software that performs self-checksumming or other runtime code integrity checks. Although not encountered in the tested applications, some third-party programs may use such checks for hindering reverse engineering. Similarly, packed executables cannot be modified directly. However, in most third-party applications, only the setup executable used for software distribution is packed, and after installation all extracted PE files are available for randomization.

鉴于其实际的开销为零,并且直接适用于第三方可执行文件,因此就地代码随机化可以轻松地与现有技术结合使用,以改善多样性并减少开销。例如,针对ROP攻击的编译器级技术[15],[16]显着增加了所生成代码的大小,并且还影响了运行时开销。合并代码随机化以消除某些小工具可以节省代码扩展和运行时开销。我们的技术还可与基于代码块重排序的随机化方法结合使用[17] – [19],以进一步增加随机化熵。二进制级别的就地代码随机化不适用于执行自校验和或其他运行时代码完整性检查的软件。尽管在经过测试的应用程序中未遇到,但是某些第三方程序可能会使用此类检查来阻碍反向工程。同样,打包的可执行文件不能直接修改。但是,在大多数第三方应用程序中,仅打包用于软件分发的安装可执行文件,并且在安装后,所有提取的PE文件都可用于随机化。

VII. RELATED WORK

七。 相关工作

Almost a decade after the introduction of the return-to-libc technique [28], the wide adoption of non-executable memory page protections in popular OSes sparked a new interest in more advanced forms of code-reuse attacks. The introduction of return-oriented programming [2] and its advancements [3]– [6], [8], [26], [33], [67]–[69] led to its adoption in realworld attacks [10], [11]. ROP exploits are facilitated by the lack of complete address space layout randomization in both Linux [12], and Windows [6], which otherwise would prevent or at least hinder [14] these attacks.

引入“返回libc”技术后近十年[28],流行的OS中广泛采用了不可执行的内存页面保护,这引起了人们对更高级形式的代码重用攻击的新兴趣。 面向返回编程[2]的引入及其进步[3]-[6],[8],[26],[33],[67]-[69]导致其在现实世界中的攻击中得到采用[10] ,[11]。 Linux [12]和Windows [6]中都缺乏完整的地址空间布局随机性,从而促进了ROP攻击,否则将阻止或至少阻止[14]这些攻击。

Besides address space randomization, process diversity can also be increased by randomizing the code of each executable segment, e.g., by permuting the order of functions or basic blocks [17]–[19]. However, these techniques are applicable only if the source code or the symbolic debugging information of the application to be protected is available. Our approach is inspired by these works, and attempts to bring the benefits of code randomization on COTS software, for which usually no source code or debugging information is available.

除了地址空间随机化之外,还可以通过随机化每个可执行段的代码来增加过程多样性,例如,通过置换功能或基本块的顺序[17]-[19]。 但是,只有在要保护的应用程序的源代码或符号调试信息可用时,这些技术才适用。 我们的方法受到这些工作的启发,并尝试在COTS软件上带来代码随机化的好处,而对于这些代码,通常没有源代码或调试信息。

Return-oriented code disrupts the normal control flow of a process by diverting its execution to (potentially unintended) code fragments, most of which otherwise would never be targets of control transfer instructions. Enforcing the integrity of control transfers [20] can effectively protect against code-reuse attacks. Compile-time techniques also prevent the construction of ROP code by generating machine code that does not contain unintended instruction sequences ending with indirect control transfer instructions, and by safeguarding any indirect branches in the actual code using canaries or additional indirection [15], [16]. In contrast to the above approaches, although in-place code randomization does not completely preclude the possibility that working ROP code can be constructed, it can be applied directly on third-party software without access to source code or debugging information.

面向返回的代码通过将其执行转移到(可能是非预期的)代码片段来破坏流程的正常控制流程,否则这些代码片段绝不会成为控制传递指令的目标。 加强控制传输的完整性[20]可以有效地防止代码重用攻击。 编译时技术还通过生成不包含以间接控制传递指令结尾的意外指令序列的机器代码,以及通过使用金丝雀或其他间接方法保护实际代码中的任何间接分支来防止ROP代码的构造[15],[16] ]。 与上述方法相比,尽管就地代码随机化并不完全排除可以构造有效的ROP代码的可能性,但可以将其直接应用于第三方软件而无需访问源代码或调试信息。

Another line of defenses are based on runtime solutions that monitor either the frequency of ret instructions [22], [23], or the integrity of the stack [21]. Besides the fact these techniques are ineffective against ROP code that uses indirect control transfer instructions other than ret , their increased runtime overhead limits their adoption.

另一道防线是基于运行时解决方案,该解决方案监视ret指令的频率[22],[23]或堆栈的完整性[21]。 除了这些技术对使用非ret的间接控制传输指令的ROP代码无效之外,它们增加的运行时开销还限制了它们的采用。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-nVVsOV76-1590416248081)(./粉碎Gadgets:使用就地代码随机化防御面向返回的编程/table-4.png)]

VIII. CONCLUSION

八。 结论

The increasing number of exploits against Windows applications that rely on return-oriented programming to bypass exploit mitigations such as DEP and ASLR, necessitates the deployment of additional protection mechanisms that can harden imminently vulnerable third-party applications against these threats. Towards this goal, we have presented in-place code randomization, a technique that offers probabilistic protection against ROP attacks, by randomizing the code of third-party applications using various narrow-scope code transformations.

针对依赖于面向返回的程序绕过DEP和ASLR等漏洞缓解措施的Windows应用程序的攻击越来越多,因此有必要部署其他保护机制,以使迫在眉睫的第三方应用程序能够抵御这些威胁。 为了实现这一目标,我们提出了就地代码随机化技术,该技术通过使用各种窄范围代码转换对第三方应用程序的代码进行随机化,从而提供了针对ROP攻击的概率保护。

Our approach is practical: it can be applied directly on third-party executables without relying on debugging information, and does not introduce any runtime overhead. At the same time, it is effective: our experimental evaluation using in-the-wild ROP exploits and two automated ROP code construction toolkits shows that in-place code randomization can thwart ROP attacks against widely used applications, including Adobe Reader on Windows 7, and can prevent the automated generation of ROP code resistant to randomization. Our prototype implementation is publicly available, and as part of our future work, we plan to improve its randomization coverage using more advanced data flow analysis methods, and extend it to support ELF and 64-bit executables.

我们的方法很实用:它可以直接应用于第三方可执行文件,而无需依赖调试信息,并且不会引入任何运行时开销。 同时,它是有效的:我们使用野生的ROP漏洞和两个自动化的ROP代码构建工具包进行的实验评估表明,就地代码随机化可以阻止针对广泛使用的应用程序的ROP攻击,包括Windows 7上的Adobe Reader, 并可以防止自动生成防随机化的ROP代码。 我们的原型实现是公开可用的,作为我们未来工作的一部分,我们计划使用更高级的数据流分析方法来改善其随机性覆盖范围,并将其扩展以支持ELF和64位可执行文件。

AVAILABILITY

可用性

Our prototype implementation is publicly available at http://nsl.cs.columbia.edu/projects/orp

我们的原型实现可在http://nsl.cs.columbia.edu/projects/orp上公开获得。

ACKNOWLEDGEMENTS

致谢

We are grateful to the authors of Q for making it available to us, and especially to Edward Schwartz for his assistance. We also thank Úlfar Erlingsson and Periklis Akritidis for their valuable feedback on earlier versions of this paper. This work was supported by DARPA and the US Air Force through Contracts DARPA-FA8750-10-2-0253 and AFRL-FA8650-10-C-7024, respectively, and by the FP7-PEOPLE-2009-IOF project MALCODE, funded by the European Commission under Grant Agreement No. 254116. Any opinions, findings, conclusions, or recommendations expressed herein are those of the authors, and do not necessarily reflect those of the US Government, DARPA, or the Air Force.

我们感谢Q的作者将其提供给我们,尤其是爱德华·施瓦茨(Edward Schwartz)的协助。 我们也感谢ÚlfarErlingsson和Periklis Akritidis对本文早期版本的宝贵反馈。 这项工作分别由DARPA和美国空军分别通过DARPA-FA8750-10-2-0253和AFRL-FA8650-10-C-7024合同以及FP7-PEOPLE-2009-IOF MALCODE项目提供支持,该项目由 欧盟委员会根据254116号拨款协议提出的意见。此处表达的任何观点,发现,结论或建议均为作者的观点,不一定反映美国政府,DARPA或空军的观点,发现,结论或建议。

REFERENCES

参考资料

[1] M. Miller, T. Burrell, and M. Howard, “Mitigating software vulnerabilities,” Jul. 2011, http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=26788.

[2] H. Shacham, “The geometry of innocent flesh on the bone: return-into-libc without function calls (on the x86),” in Proceedings of the 14th ACM conference on Computer and Communications Security (CCS), 2007.

[3] S. Checkoway, A. J. Feldman, B. Kantor, J. A. Halderman, E. W. Felten, and H. Shacham, “Can DREs provide long-lasting security? the case of return-oriented programming and the AVC advantage,” in Proceedings of the 2009 conference on Electronic Voting Technology/Workshop on Trustworthy Elections (EVT/WOTE), 2009.

[4] R. Hund, T. Holz, and F. C. Freiling, “Return-oriented rootkits: bypassing kernel code integrity protection mechanisms,” in Proceedings of the 18th USENIX Security Symposium, 2009.

[5] T. Dullien, T. Kornau, and R.-P. Weinmann, “A framework for automated architecture-independent gadget search,” in Proceedings of the 4th USENIX Workshop on Offensive Technologies (WOOT), 2010.

[6] D. A. D. Zovi, “Practical return-oriented programming.” SOURCE Boston, 2010.

[7] P. Solé, “Hanging on a ROPe,” http://www.immunitysec.com/downloads/DEPLIB20 ekoparty.pdf.

[8] D. A. D. Zovi, “Mac OS X return-oriented exploitation.” RECON, 2010.

[9] P. Vreugdenhil, “Pwn2Own 2010 Windows 7 Internet Explorer 8 exploit,” http://vreugdenhilresearch.nl/Pwn2Ownl2010-Windows7-InternetExplorer8.pdf.

[10] K. Baumgartner, “The ROP pack,” in Proceedings of the 20th Virus Bulletin International Conference (VB), 2010.

[11] M. Parkour, “An overview of exploit packs (update 9) April 5 2011,” http://contagiodump.blogspot.com/2010/06/overview-of-exploit-packs-update.html.

[12] G. Fresi Roglia, L. Martignoni, R. Paleari, and D. Bruschi, “Surgically returning to randomized lib©,” in Proceedings of the 25th Annual Computer Security Applications Conference (ACSAC), 2009.

[13] H. Li, “Understanding and exploiting Flash ActionScript vulnerabilities.” CanSecWest, 2011.

[14] H. Shacham, M. Page, B. Pfaff, E.-J. Goh, N. Modadugu, and D. Boneh, “On the effectiveness of address-space randomization,” in Proceedings of the 11th ACM conference on Computer and Communications Security (CCS), 2004.

[15] J. Li, Z. Wang, X. Jiang, M. Grace, and S. Bahram, “Defeating return-oriented rootkits with “return-less” kernels,” in Proceedings of the 5th European conference on Computer Systems (EuroSys), 2010.

[16] K. Onarlioglu, L. Bilge, A. Lanzi, D. Balzarotti, and E. Kirda, “G-Free: defeating return-oriented programming through gadget-less binaries,” in Proceedings of the 26th Annual Computer Security Applications Conference (ACSAC), 2010.

[17] S. Forrest, A. Somayaji, and D. Ackley, “Building diverse computer systems,” in Proceedings of the 6th Workshop on Hot Topics in Operating Systems (HotOS-VI), 1997.

[18] S. Bhatkar, R. Sekar, and D. C. DuVarney, “Efficient techniques for comprehensive protection from memory error exploits,” in Proceedings of the 14th USENIX Security Symposium, August 2005.

[19] C. Kil, J. Jun, C. Bookholt, J. Xu, and P. Ning, “Address space layout permutation (ASLP): Towards fine-grained randomization of commodity software,” in Proceedings of the 22nd Annual Computer Security Applications Conference (ACSAC), 2006.

[20] M. Abadi, M. Budiu, U. Erlingsson, and J. Ligatti, “Control-flow integrity,” in Proceedings of the 12th ACM conference on Computer and Communications Security (CCS), 2005.

[21] L. Davi, A.-R. Sadeghi, and M. Winandy, “ROPdefender: A practical protection tool to protect against return-oriented programming,” in Proceedings of the 6th Symposium on Information, Computer and Communications Security (ASIACCS), 2011.

[22] P. Chen, H. Xiao, X. Shen, X. Yin, B. Mao, and L. Xie, “DROP: Detecting return-oriented programming malicious code,” in Proceedings of the 5th International Conference on Information Systems Security (ICISS), 2009.

[23] L. Davi, A.-R. Sadeghi, and M. Winandy, “Dynamic integrity measurement and attestation: towards defense against return-oriented programming attacks,” in Proceedings of the 2009 ACM workshop on Scalable Trusted Computing (STC), 2009.

[24] G. S. Kc, A. D. Keromytis, and V. Prevelakis, “Countering code-injection attacks with instruction-set randomization,” in Proceedings of the 10th ACM conference on Computer and Communications Security (CCS), 2003.

[25] E. G. Barrantes, D. H. Ackley, T. S. Palmer, D. Stefanovic, and D. D. Zovi, “Randomized instruction set emulation to disrupt binary code injection attacks,” in Proceedings of the 10th ACM conference on Computer and Communications Security (CCS), 2003.

[26] E. J. Schwartz, T. Avgerinos, and D. Brumley, “Q: Exploit hardening made easy,” in Proceedings of the 20th USENIX Security Symposium, 2011.

[27] Corelan Team, “Mona,” http://redmine.corelan.be/projects/mona.

[28] S. Designer, “Getting around non-executable stack (and fix),” http://seclists.org/bugtraq/1997/Aug/63.

[29] T. Newsham, “Non-exec stack,” 2000, http://seclists.org/bugtraq/2000/May/90.

[30] Nergal, “The advanced return-into-lib© exploits: PaX case study,” Phrack, vol. 11, no. 58, Dec. 2001.

[31] S. Krahmer, “x86-64 buffer overflow exploits and the borrowed code chunks exploitation technique,” http://www.suse.de/ ∼ krahmer/no-nx.pdf.

[32] Ú. Erlingsson, “Low-level software security: Attack and defenses,” Microsoft Research, Tech. Rep. MSR-TR-07-153, 2007, http://research.microsoft.com/pubs/64363/tr-2007-153.pdf.

[33] S. Checkoway, L. Davi, A. Dmitrienko, A.-R. Sadeghi, H. Shacham, and M. Winandy, “Return-oriented programming without returns,” in Proceedings of the 17th ACM conference on Computer and Communications Security (CCS), 2010.

[34] F. B. Cohen, “Operating system protection through program evolution,” Computers and Security, vol. 12, pp. 565–584, Oct. 1993.

[35] P. Ször, The Art of Computer Virus Research and Defense. AddisonWesley Professional, February 2005.

[36] E. Bhatkar, D. C. Duvarney, and R. Sekar, “Address obfuscation: an efficient approach to combat a broad range of memory error exploits,” in In Proceedings of the 12th USENIX Security Symposium, 2003.

[37] “/ORDER (put functions in order),” http://msdn.microsoft.com/en-us/library/00kh39zz.aspx.

[38] “Syzygy - profile guided, post-link executable reordering,” http://code.google.com/p/sawbuck/wiki/SyzygyDesign.

[39] “Profile-guided optimizations,” http://msdn.microsoft.com/en-us/library/e7k32f4k.aspx.

[40] C. Kruegel, W. Robertson, F. Valeur, and G. Vigna, “Static disassembly of obfuscated binaries,” in Proceedings of the 13th USENIX Security Symposium, 2004.

[41] M. Smithson, K. Anand, A. Kotha, K. Elwazeer, N. Giles, and R. Barua, “Binary rewriting without relocation information,” University of Maryland, Tech. Rep., 2010, http://www.ece.umd.edu/ ∼ barua/ without- relocation-technical-report10.pdf.

[42] P. Saxena, R. Sekar, and V. Puranik, “Efficient fine-grained binary instrumentation with applications to taint-tracking,” in Proceedings of the 6th annual IEEE/ACM international symposium on Code Generation and Optimization (CGO), 2008.

[43] Skape, “Locreate: An anagram for relocate,” Uninformed, vol. 6, 2007.

[44] M. Pietrek, “An in-depth look into the Win32 portable executable file format, part 2,” http://msdn.microsoft.com/en-us/magazine/cc301808.aspx.

[45] I. Guilfanov, “Jump tables,” http://www.hexblog.com/?p=68.

[46] ——, “Decompilers and beyond.” Black Hat USA, 2008.

[47] Hex-Rays, “IDA Pro Disassembler,” http://www.hex-rays.com/idapro/.

[48] X. Hu, T.-c. Chiueh, and K. G. Shin, “Large-scale malware indexing using function-call graphs,” in Proceedings of the 16th ACM conference on Computer and Communications Security (CCS), 2009.

[49] S. Nanda, W. Li, L.-C. Lam, and T.-c. Chiueh, “Bird: Binary interpretation using runtime disassembly,” in Proceedings of the International Symposium on Code Generation and Optimization (CGO), 2006.

[50] L. C. Harris and B. P. Miller, “Practical analysis of stripped binary code,” SIGARCH Comput. Archit. News, vol. 33, pp. 63–68, December 2005.

[51] Microsoft, “Enhanced Mitigation Experience Toolkit v2.1,” http://www.microsoft.com/download/en/details.aspx?id=1677.

[52] A. V. Aho, M. S. Lam, R. Sethi, and J. D. Ullman, Compilers: Principles, Techniques, and Tools (2nd Edition). Boston, MA, USA: AddisonWesley Longman Publishing Co., Inc., 2006.

[53] “Adobe CoolType SING Table “uniqueName” Stack Buffer Overflow,” http://www.exploit-db.com/exploits/16619/.

[54] R. El-Khalil and A. D. Keromytis, “Hydan: Hiding information in program binaries,” in Proceedings of the International Conference on Information and Communications Security, (ICICS), 2004.

[55] Intel 64 and IA-32 Architectures Software Developer’s Manual, ser. Volume 2 (2A & 2B): Instruction Set Reference, A-Z, 2011, http://www.intel.com/Assets/PDF/manual/325383.pdf.

[56] S. S. Muchnick, Advanced compiler design and implementation. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1997.

[57] Y. L. Varol and D. Rotem, “An algorithm to generate all topological sorting arrangements,” Comput. J., vol. 24, no. 1, pp. 83–84, 1981.

[58] A. Fog, “Calling conventions for different C++ compilers and operating systems,” http://agner.org/optimize/calling conventions.pdf.

[59] Skape and Skywing, “Bypassing Windows hardware-enforced DEP,” Uninformed, vol. 2, Sep. 2005.

[60] F. Bouchez, “A study of spilling and coalescing in register allocation as two separate phases,” Ph.D. dissertation, École normale supérieure de Lyon, April 2009.

[61] “Wine,” http://www.winehq.org.

[62] “Integard Pro 2.2.0.9026 (Win7 ROP-Code Metasploit Module),” http://www.exploit-db.com/exploits/15016/.

[63] “MPlayer (r33064 Lite) Buffer Overflow + ROP exploit,” http://www.exploit-db.com/exploits/17124/.

[64] “White Phosphorus Exploit Pack,” http://www.whitephosphorus.org/.

[65] Corelan Team, “Corelan ROPdb,” https://www.corelan.be/index.php/security/corelan-ropdb/.

[66] “Immunity Debugger,” http://www.immunityinc.com/products-immdbg.shtml.

[67] E. Buchanan, R. Roemer, H. Shacham, and S. Savage, “When good instructions go bad: generalizing return-oriented programming to RISC,” in Proceedings of the 15th ACM conference on Computer and Communications Security (CCS), 2008.

[68] T. Bletsch, X. Jiang, V. Freeh, and Z. Liang, “Jump-oriented programming: A new class of code-reuse attack,” in Proceedings of the 6th Symposium on Information, Computer and Communications Security (ASIACCS), 2011.

[69] P. Solé, “Defeating DEP, the Immunitiy Debugger way,” http://www.immunitysec.com/downloads/DEPLIB.pdf.

你可能感兴趣的:(编译安全技术论文翻译,rop)