翻译一篇IoT安全的文章

ARM汇编基础教程(ARM汇编简介)

[原文链接] :https://azeria-labs.com/writing-arm-assembly-part-1/

Welcome to this tutorial series on ARM assembly basics. This is the preparation for the followup tutorial series on ARM exploit development. Before we can dive into creating ARM shellcode and build ROP chains, we need to cover some ARM Assembly basics first.

欢迎使用本系列教程,了解有关ARM组件的基础知识。 这篇文章为学习ARM利用开发的后续教程系列作准备。 在深入研究创建ARM shellcode和构建ROP链之前,我们需要首先介绍一些ARM Assembly基础知识。

The following topics will be covered step by step:
ARM Assembly Basics Tutorial Series:
Part 1: Introduction to ARM Assembly
Part 2: Data Types Registers
Part 3: ARM Instruction Set
Part 4: Memory Instructions: Loading and Storing Data
Part 5: Load and Store Multiple
Part 6: Conditional Execution and Branching
Part 7: Stack and Functions

ARM汇编基础教程系列:
第1部分:ARM汇编简介
第2部分:数据类型及寄存器
第3部分:ARM指令集
第4部分:内存指令:加载和存储数据
第5部分:高级内存加载和存储
第6部分:条件分支
第7部分:堆栈和函数

To follow along with the examples, you will need an ARM based lab environment. If you don’t have an ARM device (like Raspberry Pi), you can set up your own lab environment in a Virtual Machine using QEMU and the Raspberry Pi distro by following this tutorial. If you are not familiar with basic debugging with GDB, you can get the basics in this tutorial. In this tutorial, the focus will be on ARM 32-bit, and the examples are compiled on an ARMv6.

要跟随示例学习,您需要搭建一个基于ARM的实验室环境。 如果您没有ARM设备(例如树莓派),那么您可以按照本教程使用QEMU和Raspberry Pi发行版在虚拟机中搭建一个实验室环境。 如果您不熟悉GDB的基本调试,则可以跟着本教程中学习基本知识。 在本教程中,重点将放在ARM 32位上,且示例是在ARMv6上编译的。

为什么要学习ARM?

This tutorial is generally for people who want to learn the basics of ARM assembly. Especially for those of you who are interested in exploit writing on the ARM platform. You might have already noticed that ARM processors are everywhere around you. When I look around me, I can count far more devices that feature an ARM processor in my house than Intel processors. This includes phones, routers, and not to forget the IoT devices that seem to explode in sales these days. That said, the ARM processor has become one of the most widespread CPU cores in the world. Which brings us to the fact that like PCs, IoT devices are susceptible to improper input validation abuse such as buffer overflows. Given the widespread usage of ARM based devices and the potential for misuse, attacks on these devices have become much more common.

本教程适用于想要学习ARM汇编基础知识的人。 尤其适合那些对在ARM平台上进行漏洞利用以及编写感兴趣的人。 您可能已经注意到ARM处理器无处不在。 当我开始察觉时,我才发现与intel处理器相比,在家中看到搭载了ARM处理器的设备数量要比intel的多得多。 例如电话、
路由器等,不要忘记销售火爆的IoT设备。 换句话说,ARM处理器已成为世界上最广泛使用的CPU内核之一。 我们可以得出一个这样的结论,与PC一样,IoT设备也容易受到不合法的输入验证滥用(例如缓冲区溢出)的影响。 鉴于基于ARM的设备的广泛使用和滥用的可能性,黑客对这些设备的攻击进行攻击的事件更加常见。

Yet, we have more experts specialized in x86 security research than we have for ARM, although ARM assembly language is perhaps the easiest assembly language in widespread use. So, why aren’t more people focusing on ARM? Perhaps because there are more learning resources out there covering exploitation on Intel than there are for ARM. Just think about the great tutorials on Intel x86 Exploit writing by [Fuzzy Security]:(https://www.fuzzysecurity.com/tutorials/expDev/1.html) or the Corelan Team – Guidelines like these help people interested in this specific area to get practical knowledge and the inspiration to learn beyond what is covered in those tutorials. If you are interested in x86 exploit writing, the Corelan and Fuzzysec tutorials are your perfect starting point. In this tutorial series here, we will focus on assembly basics and exploit writing on ARM.

作为最简单的汇编语言,尽管ARM汇编语言已经得到广泛使用,但是比起拥有大量专门从事x86安全研究的专家的x86语言,arm汇编语言的安全问题更加急迫。 那么,为什么没有更多的人专注于ARM呢? 也许是因为,与ARM相比,在Intel领域有更多的学习资源,用户只需考虑一下由Fuzzy Security或Corelan Team撰写的有关Intel x86 Exploit的出色教程,这些指南就可以帮助对此特定领域感兴趣的人们获得实践知识,并从中学到的启发来学习这些教程所涵盖的内容。 如果您对x86漏洞利用程序编写感兴趣,那么Corelan和Fuzzysec教程就是您的理想起点:(https://www.corelan.be/index.php/2009/07/19/exploit-writing-tutorial-part-1-stack-based-overflows/)。 在本教程系列中,我们将重点介绍汇编基础知识和利用ARM进行开发。

ARM处理器VS Intel处理器

There are many differences between Intel and ARM, but the main difference is the instruction set. Intel is a CISC (Complex Instruction Set Computing) processor that has a larger and more feature-rich instruction set and allows many complex instructions to access memory. It therefore has more operations, addressing modes, but less registers than ARM. CISC processors are mainly used in normal PC’s, Workstations, and servers.

Intel和ARM之间有很大的区别,但主要区别是指令集。 英特尔是CISC(复杂指令集计算)处理器,具有更大,功能更丰富的指令集,并允许许多复杂的指令访问内存。 因此,与ARM相比,它具有更多的操作、寻址模式,但同时寄存器更少。 CISC处理器主要用于普通PC,工作站和服务器

ARM is a RISC (Reduced instruction set Computing) processor and therefore has a simplified instruction set (100 instructions or less) and more general purpose registers than CISC. Unlike Intel, ARM uses instructions that operate only on registers and uses a Load/Store memory model for memory access, which means that only Load/Store instructions can access memory. This means that incrementing a 32-bit value at a particular memory address on ARM would require three types of instructions (load, increment and store) to first load the value at a particular address into a register, increment it within the register, and store it back to the memory from the register.

ARM是RISC(精简指令集计算)处理器,因此比CISC具有简化的指令集(100条或更少的指令)和更多通用寄存器。 与Intel不同的是,ARM使用仅在寄存器上操作的指令,并使用load/store内存模型进行内存访问,这意味着只有load/store指令才能访问内存。 这意味着递增ARM上特定内存地址上的32位值将需要三种类型的指令(load指令、加法指令和store指令),以便首先将特定地址上的值加载到寄存器中,在寄存器中递增并存储。 它从寄存器返回到存储器。

The reduced instruction set has its advantages and disadvantages. One of the advantages is that instructions can be executed more quickly, potentially allowing for greater speed (RISC systems shorten execution time by reducing the clock cycles per instruction). The downside is that less instructions means a greater emphasis on the efficient writing of software with the limited instructions that are available. Also important to note is that ARM has two modes, ARM mode and Thumb mode. Thumb instructions can be either 2 or 4 bytes (more on that in Part 3: ARM Instruction set).

精简指令集有其优点和缺点。 优点之一是指令可以更快地执行,从而有可能提高速度(RISC系统通过减少每条指令的时钟周期来缩短执行时间)。 不利的一面是, 较少的指令增加了软件(事实上是编译器)的复杂性。还需要注意的重要一点是ARM有两种模式,ARM模式和Thumb模式。 Thumb指令可以是2个字节或4个字节(有关第3部分:ARM指令集的更多信息https://azeria-labs.com/arm-instruction-set-part-3/
)。

More differences between ARM and x86 are:
In ARM, most instructions can be used for conditional execution.
The Intel x86 and x86-64 series of processors use the little-endian format
The ARM architecture was little-endian before version 3. Since then ARM processors became BI-endian and feature a setting which allows for switchable endianness.

ARM和x86之间的更多区别是:

  1. 在ARM中,大多数指令可用于分支跳转的条件执行。
  2. Intel x86和x86-64系列处理器使用小端字节序
  3. 在v3之前,ARM体系结构为小端字节序。此后,ARM处理器提供一个配置项,可以通过配置在大端和小端之间切换。

There are not only differences between Intel and ARM, but also between different ARM version themselves. This tutorial series is intended to keep it as generic as possible so that you get a general understanding about how ARM works. Once you understand the fundamentals, it’s easy to learn the nuances for your chosen target ARM version. The examples in this tutorial were created on an 32-bit ARMv6 (Raspberry Pi 1), therefore the explanations are related to this exact version.

Intel与ARM之间不仅存在差异,而且不同ARM版本之间也存在差异。 本教程系列旨在尽可能地保持通用性,以便您对ARM的工作原理有一个大致的了解。 只要您了解了基础知识,就可以轻松了解所选目标ARM版本的细微差别。 本教程中的示例是在32位ARMv6(Raspberry Pi 1)上创建的,因此说明与该确切版本有关。

对不同版本的ARM的命名可能还会造成混淆:


image.png

编写ARM汇编

Before we can start diving into ARM exploit development we first need to understand the basics of Assembly language programming, which requires a little background knowledge before you can start to appreciate it. But why do we even need ARM Assembly, isn’t it enough to write our exploits in a “normal” programming / scripting language? It is not, if we want to be able to do Reverse Engineering and understand the program flow of ARM binaries, build our own ARM shellcode, craft ARM ROP chains, and debug ARM applications.

在开始深入研究ARM利用开发之前,我们首先需要了解汇编语言编程的基础知识,这需要一些基础知识,然后您才能开始认识它的优点。 但是,我们为什么要使用ARM汇编来编程呢,我们不是有很多高级语言和脚本语言吗?如果你想通过了解ARM程序进行逆向工程进而了解更多的程序执行流程,或者构建ROP链来实现你自己的ARM shellcode,亦或者调试ARM程序,你都需要ARM汇编的知识作为基础。

You don’t need to know every little detail of the Assembly language to be able to do Reverse Engineering and exploit development, yet some of it is required for understanding the bigger picture. The fundamentals will be covered in this tutorial series. If you want to learn more you can visit the links listed at the end of this chapter.

您不需要了解汇编语言的每一个小细节就能进行逆向工程和开发,但是为了理解大局您需要清楚一些基本的知识。 基础知识将在本教程系列中介绍。 如果您想了解更多信息,可以访问本章末尾列出的链接。

So what exactly is Assembly language? Assembly language is just a thin syntax layer on top of the machine code which is composed of instructions, that are encoded in binary representations (machine code), which is what our computer understands. So why don’t we just write machine code instead? Well, that would be a pain in the ass. For this reason, we will write assembly, ARM assembly, which is much easier for humans to understand. Our computer can’t run assembly code itself, because it needs machine code. The tool we will use to assemble the assembly code into machine code is a GNU Assembler from the GNU Binutils project named as which works with source files having the *.s extension.

那么汇编语言到底是什么? 汇编语言只是机器代码之上的一个简单语法层,它由指令组成,这些指令以二进制表示形式(机器代码)编码,这是我们的计算机可以理解的。 那为什么我们不只是写机器代码呢? 好吧,那由人来写的话将是一团糟。 因此,我们将编写汇编程序,即ARM汇编程序,这对于人类来说更容易理解。 我们的计算机本身无法运行汇编代码,因为它需要机器代码。 我们将用于将汇编代码汇编成机器代码的工具是GNU Binutils项目中的GNU汇编器,它可与扩展名为* .s的源文件一起使用。

Once you wrote your assembly file with the extension *.s, you need to assemble it with as and link it with ld:

最终的过程是这样的,当你编写了后缀为“.s”的汇编文件,你可以使用as将它汇编,最后使用ld链接,如下所示:

$ as program.s -o program.o
$ ld program.o -o program

Let’s start at the very bottom and work our way up to the assembly language. At the lowest level, we have our electrical signals on our circuit. Signals are formed by switching the electrical voltage to one of two levels, say 0 volts (‘off’) or 5 volts (‘on’). Because just by looking we can’t easily tell what voltage the circuit is at, we choose to write patterns of on/off voltages using visual representations, the digits 0 and 1, to not only represent the idea of an absence or presence of a signal, but also because 0 and 1 are digits of the binary system. We then group the sequence of 0 and 1 to form a machine code instruction which is the smallest working unit of a computer processor. Here is an example of a machine language instruction:

这一节,让我们从最底层开始,自底向上,看看汇编语言是如何工作的。在计算机系统的最底层,是密布的传输着电信号的电路。信号是通过控制电压,在两个电平之间切换形成的,例如0伏(低电平代表关信号)和5伏(高电平代表开信号)。对于硬件系统,电路中电压的具体数值是没有意义的,所以我们用抽象的数字0和1来表示电路的开/关电平。有意思的是,0和1不仅代表了电信号,也构成了一个二进制系统。在这个基础上,我们将电信号序列(01序列)分组,每一组序列就是一个机器码指令。下面是机组机器码指令的示意(并非实际的机器码):

1110 0001 1010 0000 0010 0000 0000 0001

So far so good, but we can’t remember what each of these patterns (of 0 and 1) mean. For this reason, we use so called mnemonics, abbreviations to help us remember these binary patterns, where each machine code instruction is given a name. These mnemonics often consist of three letters, but this is not obligatory. We can write a program using these mnemonics as instructions. This program is called an Assembly language program, and the set of mnemonics that is used to represent a computer’s machine code is called the Assembly language of that computer. Therefore, Assembly language is the lowest level used by humans to program a computer. The operands of an instruction come after the mnemonic(s). Here is an example:

MOV R2, R1

到目前为止,一切都很好,但是我们马上就会迎来第一个困难,机器码序列难以记忆。 因此,我们使用了助记符(缩写词)来帮助我们记住这些二进制模式,其中每个机器代码指令都被赋予了一个名称。 这些助记符通常由三个字母组成,但这不是必须的。 我们可以使用这些助记符作为指令编写程序。 该程序称为汇编语言程序,用于表示计算机的机器代码的助记符集称为该计算机的汇编语言。 因此,汇编语言是人类用来编程计算机的最低级别。 指令的操作数位于助记符之后。 这是一个例子:

MOV R2, R1

Now that we know that an assembly program is made up of textual information called mnemonics, we need to get it converted into machine code. As mentioned above, in the case of ARM assembly, the GNU Binutils project supplies us with a tool called as. The process of using an assembler like as to convert from (ARM) assembly language to (ARM) machine code is called assembling.

我们现在已经知道了汇编程序代码是由许多汇编指令组成的文本信息,所以我们需要把它转化为对应的机器代码。根据上文,对于ARM汇编,GNU Binutils项目为我们提供了一个名为as的工具来完成这个转换。使用汇编器如as将ARM汇编程序代码转换成ARM机器码的过程称为汇编。

In summary, we learned that computers understand (respond to) the presence or absence of voltages (signals) and that we can represent multiple signals in a sequence of 0s and 1s (bits). We can use machine code (sequences of signals) to cause the computer to respond in some well-defined way. Because we can’t remember what all these sequences mean, we give them abbreviations – mnemonics, and use them to represent instructions. This set of mnemonics is the Assembly language of the computer and we use a program called Assembler to convert code from mnemonic representation to the computer-readable machine code, in the same way a compiler does for high-level languages.

总而言之,我们知道计算机可以读取并理解电信号序列,而我们可以用0和1来表示这种序列并告知计算机(这就是机器码)。我们可以使用机器码,令计算机以一些确定的方式做出响应,所以我们可以对计算机进行编程。但这些机器码序列难以记忆,所以我们给它们命名从而引入了助记符,并用它来表示指令。这些助记符和对应的操作数语法就构成了汇编语言,我们使用一个汇编器将汇编程序代码转换为机器码。这个过程和编译器将高级语言转换为汇编代码是类似的。

你可能感兴趣的:(翻译一篇IoT安全的文章)