CHAPTER 3 Machine-Level Representation of Programs

The gcc C compiler generates its output in the form of assembly code, a textual representation of the machine code giving the individual instructions in the program. gcc then invokes both an assembler and a linker to generate the executable machine code from the assembly code.

Our presentation is based on two related machine languages: Intel IA32 （ “Intel Architecture 32-bit,”）, the dominant language of most computers today, and x86-64, its extension to run on 64-bit machines.

IA32 ---- Intel processors
x86-64 ---- Advanced Micro Devices (AMD) processors （capable of running the exact same machine-level
programs as Intel processors ), the extension of IA32 to 64 bits

Whereas a 32-bit machine can only make use of around 4 gigabytes (2^32 bytes) of random-access memory, current 64-bit machines can use up to 256 terabytes (2^48 bytes). Most of the microprocessors in recent server and desktop machines, as well as in many laptops, support either 32-bit or 64-bit operation. However, most of the operating systems running on these machines support only 32-bit applications, and so the capabilities of the hardware are not fully utilized.

本章主要内容：

representation and manipulation of data and the implementation of control in IA32.
how 'if, while, and switch statements' are implemented.
the implementation of procedures, including how the program maintains a run-time stack to support the
passing of data and control between procedures, as well as storage for local variables.
how data structures such as arrays, structures, and unions are implemented at the machine level.
the problems of out of bounds memory references and the vulnerability of systems to buffer overﬂow attacks.
some tips on using the gdb debugger for examining the run-time behavior of a machine-level program.

补充内容：

Another Web Aside gives a brief presentation of ways to incorporate assembly code into C programs.

The more recent “SSE” instructions were developed to support multi-media applications, but in their more recent versions (version 2 and later), and with more recent versions of gcc, SSE has become the preferred method for mapping ﬂoating point onto both IA32 and x86-64 machines.

3.1 A Historical Perspective

Linux uses what is referred to as ﬂat addressing, where the entire memory space is viewed by the programmer as a large array of bytes.

A number of formats and instructions have been added to x86 for manipulating vectors of small integers and ﬂoating-
point numbers. These features were added to allow improved performance on multimedia applications, such as image processing, audio and video encoding and decoding, and three-dimensional computer graphics.

3.2 Program Encodings

compile code on an IA32 machine using a Unix command line:

unix>gcc -O1 -o p p1.c p2.c
//option -O1 instructs the compiler to apply level-one optimizations
unix>gcc -O1 -S code.c
unix> gcc -O1 -c code.c
unix>objdump -d code.o

3.2.1 Machine-Level Code

Two of computer system abstraction are especially important for machine-level programming.

First, the format and behavior of a machine-level program is defined by the instruction set architecture, or “ISA,” defining the processor state, the format of the instructions, and the effect each of these instructions will have on the state.
Second, the memory addresses used by a machine-level program are virtual addresses, providing a memory model that appears to be a very large byte array.

From IA32 machine code, we can see following processor state：

The program counter ( “PC,” and called %eip in IA32)
The integer register ﬁle contains eight named locations storing 32-bit values.
The condition code registers hold status information about the most recently executed arithmetic or logical instruction.
A set of ﬂoating-point registers store ﬂoating-point data.

3.2.2 Code Examples

027 Code Examples.png

Several features about machine code and its disassembled representation are worth noting:

IA32 指令不是固定长度的，是变长的。IA32 instructions can range in length from 1 to 15 bytes. The instruction encoding is designed so that commonly used instructions and those with fewer operands require a smaller number of bytes than do less common ones or ones with more operands.
IA32 指令的设计有一定的规则，例如，只有指令pushl %ebp 可以以55开头。The instruction format is designed in such a way that from a given starting position, there is a unique decoding of the bytes into machine instructions. For example, only the instruction pushl %ebp can start with byte value 55.
The disassembler determines the assembly code based purely on the byte sequences in the machine-code ﬁle. It does not require access to the source or assembly-code versions of the program.

028 Code Example 2.png

链接后的可执行文件大小比未链接前的大很多。The ﬁle prog has grown to 9,123 bytes, since it contains not just the code for our two procedures but also information used to start and terminate the program as well as to interact with the operating system.

链接后的汇编代码与之前未链接的汇编代码的不同点：

One important difference is that the addresses listed along the left are different—the linker has shifted the location of this code to a different range of addresses.
A second difference is that the linker has determined the location for storing global variable accum.

3.3 Data Formats

020 Sizes of C data types in IA32.png

As the table indicates, most assembly-code instructions generated by gcc have a single-character suffix denoting the size of the operand. For example, the data movement instruction has three variants: movb(move byte), movw(move word), and movl (move double word).

注意：用 l 表示double和4字节整数不会产生歧义，因为浮点数使用的是一套完全不同的指令和寄存器。

Note that the assembly code uses the suffix ‘l’ to denote both a 4-byte integer as well as an 8-byte double-precision floating-point number. This causes no ambiguity, since floating point involves an entirely different set of
instructions and registers.

3.4 Accessing Information

021 IA32 integer registers.png

022 IA32 registers.png

For the most part, the first six registers can be considered general-purpose registers with no restrictions placed on their use. The final two registers (%ebp and %esp) contain pointers to important places in the program stack.

The low-order 2 bytes of the first four registers can be independently read or written by the byte operation instructions. 这是为了向后兼容（backwards compatiblity），也就是能让更早的代码正常地工作。

3.4.1 Operand Specifiers 操作数

operands, specifying the source values （源值） to reference in performing an operation and the destination location （保存结果的目标地址） into which to place the result.

操作数可以被分为三个类型：

The first type, immediate （立即数）, is for constant values. （$-577 or $ 0x1F）. Any value that fits into
a 32-bit word can be used.
The second type, register （寄存器）, denotes the contents of one of the registers. 用符号来表示任意寄存器 a，用引用来表示它的值。这是将寄存器集合看成一个数组R，用寄存器的名称作为索引。
The third type of operand is a memory reference（存储器引用）, in which we access some memory location according to a computed address, often called the effective address. 用M[Addr]表示对存储器中的字节值的引用。there are many different addressing modes allowing different forms of memory references，包括：绝对寻址(Absolute)、间接寻址（就是熟悉的指针！）(Indirect)、基址+偏移量寻址(Base + displacement)、变址寻址(Indexed)、比例变址寻址(Scaled indexed).

023 Operand forms.png

This general form is often seen when referencing elements of arrays.

024 Practice Problem3.1.png

solution：参考Figure 3.3 的Form列，看看每个Operand对应哪个形式，再用Operand value列的公式计算得到结果值。

3.4.2 Data Movement Instructions

025 Data movement instructions.png

The instructions in the mov class copy their source values to their destinations. S, D 一个表示值，一个表示地址。The source operand designates a value that is immediate, stored in a register, or stored in memory. The destination operand designates a location that is either a register or a memory address.

IA32 imposes the restriction that a move instruction cannot have both operands refer to memory locations. Copying a value from one memory location to another requires two instructions—the first to load the source value into a register, and the second to write this register value to the destination.

example:

026 Data movement example.png

Both the movs and the movz instruction classes serve to copy a smaller amount of source data to a larger data location, filling in the upper bits by either sign expansion (movs) or by zero expansion (movz). With sign expansion, the upper
bits of the destination are filled in with copies of the most significant bit of the source value.

The final two data movement operations are used to push data onto and pop data from the program stack. With IA32, the program stack is stored in some region of memory. The stack pointer %esp holds the address of the top stack element.

030 Illustration of stack operation.png

the behavior of the instruction pushl %ebp is equivalent to the following pair of instructions:

subl $4,%esp     //Decrement stack pointer
movl %ebp,(%esp) //Store %ebp on stack

the instruction popl %eax is equivalent to the following pair of instructions:

movl (%esp),%eax //Read %eax from stack
addl $4,%esp //Increment stack pointer

Since the stack is contained in the same memory as the program code and other forms of program data, programs can access arbitrary positions within the stack using the standard memory addressing methods. For example, assuming the
topmost element of the stack is a double word, the instruction movl 4(%esp),%edx will copy the second double word from the stack to register %edx.

031 Problem 3.2.png

032 Problem 3.3.png

3.4.3 Data Movement Example

033 Problem 3.4.png

034 Problem 3.5.png

3.5 Arithmetic and Logical Operations

3.5.1 Load Effective Address

The load effective address instruction leal copies the effective address of S to the destination.

027 Integer arithmetic operations.png

035 Problem 3.6.png

3.5.2 Unary and Binary Operations

Binary Operations: The ﬁrst operand can be either an immediate value, a register, or a memory location. The second can be either a register or a memory location. As with the movl instruction, however, the two operands cannot both be memory locations.

036 Problem 3.7.png

3.5.3 Shift Operations

The shift amount k is encoded as a single byte, since only shift amounts between 0 and 31 are possible (only the low-order 5 bits of the shift amount are considered). The shift amount is given either as an immediate or in the single-
byte register element %cl. (These instructions are unusual in only allowing this speciﬁc register as operand.)

The destination operand of a shift operation can be either a register or a memory location.

037 Problem 3.8.png

3.5.4 Discussion

038 Problem 3.10.png

3.5.5 Special Arithmetic Operations

IA32 also provides two different “one-operand” multiply instructions to compute the full 64-bit product of two 32-bit values—one for unsigned (mull), and one for two’s-complement (imull) multiplication. For both of these, one argument must be in register %eax, and the other is given as the instruction. source operand. The product is then stored in registers %edx (high-order 32 bits) and %eax (low-order 32 bits).

029 special arithmetic operations.png

039 Problem 3.12.png

3.6 Control

Machine code provides two basic low-level mechanisms for implementing conditional behavior: it tests data values and then either alters the control flow or the data flow based on the result of these tests.

Data-dependent control flow / conditional data transfers
conditional control transfers

3.6.1 Condition Codes

CPU maintains a set of single-bit condition code registers describing attributes of the most recent arithmetic or logical operation.

CF: Carry Flag. The most recent operation generated a carry out of the most significant bit. Used to detect overflow for unsigned operations.
ZF: Zero Flag. The most recent operation yielded zero.
SF: Sign Flag. The most recent operation yielded a negative value.
OF: Overflow Flag. The most recent operation caused a two’s-complement overflow—either negative or positive.

正溢出与负溢出：

首先，一个正数与一个负数相加，不可能溢出，因为结果的绝对值一定小于两个加数的绝对值，既然两个加数能合理表示出来，结果一定也能合理表示出来。

其次，正溢出是由于两个很大的正数相加，导致符号位变成1的情况如0110+0011=1001(假设最大只能运算4位)。负溢出则是两个很小的负数相加，导致符号位变成0的情况，如1011(-5)+1011(-5)=10110->0110溢出,如1111(-1)+1111(-1)=11110->1110则没溢出。

因此，

正溢出的判断标准是符号位或最高位有进位。
负溢出的判断标准是符号位和最高位只有一个发生了进位。符号位和最高位同时发生进位则没溢出。

注意，这里的最高位指的是去掉符号位后的最高位，即符号位后面一位。

可以结合上面列举的负溢出的例子理解。

系统是怎么根据操作来设置条件码寄存器的呢？以什么为判断基准？

比如系统用一条ADD指令完成了等价于t=a+b的功能，这时候会用以下表达式为判断基准，来设置条件码寄存器：

042 ADD指令后设置条件码寄存器的判断基准.png

无符号操作溢出时的表现就是最高位出现了进位，对应CF。

OF:　OF代表发生了有符号数溢出，需要满足两个条件，一是两个加数符号相同，二是结果的符号要和任意一个加数相反。

CMP S1,S2会计算S2-S1并根据结果设置条件码。TEST S1,S2会计算S1&S2并根据结果设置条件码。

040 Comparison and test instructions.png

TEST : Typically, the same operand is repeated (e.g.,testl %eax,%eax to see whether %eax is negative, zero,
or positive), or one of the operands is a mask indicating which bits should be tested.

3.6.2 Accessing the Condition Codes

一般不直接访问条件码，而是根据条件码的组合设置某个字节为0或1，对应的就是SET指令。如下图

041 The SET instructions.png

一个例子：

A typical instruction sequence to compute the C expression a

043 a小于b.png

此时比较的是%edx(a)-%eax(b)，结果被设置到%al中。movzbl设置%eax的高3个字节为0。

044 Problem 3.13.png

045 Problem 3.14.png

3.6.3 Jump Instructions and Their Encodings

046 The jump instructions.png

跳转指令的目标地址是如何编码的呢？

understanding how the targets of jump instructions are encoded will become important when we study linking in Chapter 7.

There are several different encodings for jumps, but some of the most commonly used ones are PC relative.

Program Counter 相对位置编码：把目标地址和跳转指令后面那条指令对应的地址之差作为编码。

That is, they encode the difference between the address of the target instruction and the address of the instruction immediately following the jump. These offsets can be encoded using 1, 2, or 4 bytes.
编码绝对地址： A second encoding method is to give an “absolute” address, using 4 bytes to directly specify the target.

The assembler and linker select the appropriate encodings of the jump destinations.

047 PC relative encoding of jump target address .png

By using a PC-relative encoding of the jump targets, the instructions can be compactly encoded (requiring
just 2 bytes), and the object code can be shifted to different positions in memory without alteration. (位置无关码)

048 Problem 3.15.png

3.6.4 Translating Conditional Branches

049 Translating Conditional Branches.png

050 Problem 3.16.png

051 Problem 3.17.png

052 Problem 3.18.png

3.6.5 Loops

C provides several looping constructs—namely, do-while, while, and for. Most compilers generate loop code based on the do-while form of a loop, even though this form is relatively uncommon in actual programs. Other loops are transformed into do-while form and then compiled into machine code.

Do-While Loops

053 Do-While Loops.png

054 Problem 3.19.png

055 Problem 3.20.png

While Loops

056 while loops.png

057 assembly code for while.png

058 Problem 3.21.png

059 Problem 3.22.png

For Loops

060 for loops template.png

061 Problem 3.23.png

062 Problem 3.24.png

3.6.6 Conditional Move Instructions

条件控制转移指令存在一种缺陷，处理器是通过流水线的方式处理指令的，在执行前一条指令的算术运算时，同时去取下一条指令。因此需要预先确定好指令的执行序列。当出现条件跳转时，处理器会对分支进行预测，虽然准确率很高，但一旦预测失败，处理器需要丢掉它为此跳转指令所做的所有工作，重新填充流水线。这会导致程序性能下降。

As we will see in Chapters 4 and 5, processors achieve high performance through pipelining, where an instruction is processed via a sequence of stages, each performing one small portion of the required operations /*(e.g., fetching the instruction from memory, determining the instruction type, reading from memory, performing an arithmetic operation, writing to memory, and updating the program counter.) */This approach achieves high performance by overlapping the steps of the successive instructions, such as fetching one instruction while performing the arithmetic operations for a previous instruction. To do this requires being able to determine the sequence of
instructions to be executed well ahead of time in order to keep the pipeline full of instructions to be executed. When the machine encounters a conditional jump (referred to as a “branch”), it often cannot determine yet whether or not the jump will be followed. Processors employ sophisticated branch prediction logic to try to guess whether or not each jump instruction will be followed. As long as it can guess reliably (modern microprocessor designs try to achieve success rates on the order of 90%), the instruction pipeline will be kept full of instructions. Mispredicting a jump, on the other hand, requires that the processor discard much of the work it
has already done on future instructions and then begin ﬁlling the pipeline with instructions starting at the correct location. As we will see, such a misprediction can incur a serious penalty, say, 20–40 clock cycles of wasted effort, causing a serious degradation of program performance.

An alternate strategy is through a conditional transfer of data. This approach computes both outcomes of a conditional operation （先把条件分支的多个值计算出来）, and then selects one based on whether or not the condition holds. 优势在于无需为此丢掉跳转指令后面所做的工作，当然代价就是需要多做一次计算，因此条件传送指令的适用条件有限，编译器需要根据浪费的计算和分支预测错误导致的性能处罚中作权衡，然而实际上它无法很好地判断，因此，只有当两个表达式都十分容易计算时，编译器才会选用条件传送指令，有时候即使分支预测错误的开销更大，仍会选择条件控制转移指令。This strategy makes sense only in restricted cases, but it can then be implemented by a simple conditional move instruction that is better matched to the performance characteristics of modern processors.

063 Conditinal assignment.png

064 Problem 3.25.png

065 The conditional move instructions.png

For IA32, the source and destination values can be 16 or 32 bits long. Single-byte conditional moves are not supported.

Unlike conditional jumps, the processor can execute conditional move instructions without having to predict the outcome of the test. The processor simply reads the source value (possibly from memory), checks the condition code, and
then either updates the destination register or keeps it the same.

066 conditional data transfers.png

Not all conditional expressions can be compiled using conditional moves. If one of those two expressions could
possibly generate an error condition or a side effect, this could lead to invalid behavior.

invalid example 1:

067 invalid example 1.png

invalid example 2:

068 invalid example 2.png

Using conditional moves also does not always improve code efﬁciency. For example, if either the then-expr or the else-expr evaluation requires a signiﬁcant computation, then this effort is wasted when the corresponding condition does
not hold. Compilers must take into account the relative performance of wasted computation versus the potential for performance penalty due to branch misprediction. In truth, they do not really have enough information to make this decision reliably;

Overall, then, we see that conditional data transfers offer an alternative strategy to conditional control transfers for implementing conditional operations. They can only be used in restricted cases, but these cases are fairly common and
provide a much better match to the operation of modern processors.

069 Problem 3.26.png

070 Problem 3.27.png

3.6.7 Switch Statements

an efﬁcient implementation using a data structure called a jump table. A jump table is an array where entry i is the
address of a code segment implementing the action the program should take when the switch index equals i. gcc selects the method of translating a switch statement based on the number of cases and the sparsity of the case values. Jump tables are used when there are a number of cases (e.g., four or more) and they span a small range of values.

switch通过跳转表实现，它是一个数组，里面每一项都是一个代码段的地址，GCC根据开关数量决定是否使用跳转表(如大于4个，且值跨度较小会用)

071 switch statement.png

These locations are deﬁned by labels in the code, and indicated in the entries in jt by code pointers, consisting of the labels preﬁxed by ‘&&.’ (Recall that the operator & creates a pointer for a data value. In making this extension, the authors of gcc created a new operator && to create a pointer for a code location.)

why declare index as unsigned?

Answer: It further simpliﬁes the branching possibilities by treating index as an unsigned value, making use of the fact that negative numbers in a two’s-complement representation map to large positive numbers in an unsigned representation. It can therefore test whether index is outside of the range 0–6 by testing whether it is greater than 6.

The key step in executing a switch statement is to access a code location through the jump table. In our assembly-code version, on line 6, where the jmp instruction’s operand is preﬁxed with ‘*’, indicating an indirect jump, and the operand speciﬁes a memory location indexed by register %eax, which holds the value of index. (We will see in Section 3.8 how array references are translated into machine code.)

Examining all of this code requires careful study, but the key point is to see that the use of a jump table allows a very efﬁcient way to implement a multiway branch.

072 Problem 3.28.png

073 Problem 3.29.png

3.7 Procedures

3.7.1 Stack Frame Structure

The portion of the stack allocated for a single procedure call is called a stack frame.

074 Stack frame structure.png

The stack pointer can move while the procedure is executing, and hence most information is accessed relative to the frame pointer.

Suppose procedure P (the caller) calls procedure Q (the callee). The arguments to Q are contained within the stack frame for P. In addition, when P calls Q, the return address within P where the program should resume execution when
it returns from Q is pushed onto the stack, forming the end of P’s stack frame. The stack frame for Q starts with the saved value of the frame pointer (a copy of register %ebp), followed by copies of any other saved register values.

3.7.2 Transferring Control

075 instructions supporting procedure calls and returns.png

The effect of a call instruction is to push a return address on the stack and jump to the start of the called procedure. The return address is the address of the instruction immediately following the call in the program, so that execution will
resume at this location when the called procedure returns.

The ret instruction pops an address off the stack and jumps to this location. The proper use of this instruction is to have prepared the stack so that the stack pointer points to the place where the preceding call instruction stored its return address.

076 Illustration of call and ret functions.png

The leave instruction can be used to prepare the stack for returning. It is equivalent to the following code sequence:

077 Problem 3.30.png

3.7.3 Register Usage Conventions

The set of program registers acts as a single resource shared by all of the procedures. Although only one procedure can be active at a given time, we must make sure that when one procedure (the caller) calls another (the callee), the callee does not overwrite some register value that the caller planned to use later. For this reason, IA32 adopts a uniform set of conventions for register usage that must be respected by all procedures, including those in program libraries.

By convention, registers %eax, %edx, and %ecx are classiﬁed as caller-save registers. When procedure Q is called by P, it can overwrite these registers without destroying any data required by P. On the other hand, registers %ebx, %esi, and %edi are classiﬁed as callee-save registers. This means that Q must save the values of any of these registers on the stack before overwriting them, and restore them before returning, because P (or some higher-level procedure) may need these values for its future computations. In addition, registers %ebp and %esp must be maintained according to the conventions described here.

021 IA32 integer registers.png

078 Problem 3.31.png

3.7.4 Procedure Example

079 Procedure Example.png

080 Problem 3.32.png

081 Problem 3.33.png

3.7.5 Recursive Procedures

082 Recursive Procedures.png

083 Problem 3.34.png

3.8 Array Allocation and Access

3.8.1 Basic Principles

For data type T and integer constant N, the declaration

T A[N];

has two effects. First, it allocates a contiguous region of L * N bytes in memory, where L is the size (in bytes) of data type T . Let us denote the starting location as , Second, it introduces an identiﬁer A that can be used as a pointer to the beginning of the array. The value of this pointer will be .

084 Problem 3.35.png

3.8.2 Pointer Arithmetic

That is, if p is a pointer to data of type T , and the value of p is , then the expression p+i has value , where L is the size of data type T .

The array subscripting operation can be applied to both arrays and pointers. The array reference A[i] is identical to the expression *(A+i).

085 Pointer Arithmetic.png

085 Problem 3.36.png

3.8.3 Nested Arrays

The general principles of array allocation and referencing hold even when we create arrays of arrays. For example, the declaration

int A[5][3];

is equivalent to the declaration

typedef int row3_t[3];
row3_t A[5];

Data type row3_t is deﬁned to be an array of three integers. Array A contains ﬁve such elements, each requiring 12 bytes to store the three integers. The total array size is then 4 * 5 * 3 = 60 bytes.

In general, for an array declared as

T D[R][C];

array element D[i][j]is at memory address

&D[i][j] = x_D + L(C * i + j),

where L is the size of data type T in bytes.

086 Problem 3.37.png

3.8.4 Fixed-Size Arrays

3.8.5 Variable-Size Arrays

3.9 Heterogeneous Data Structures

3.9.1 Structures

087 struct example.png

088 Problem 3.39.png

3.9.2 Unions

Unions provide a way to circumvent the type system of C, allowing a single object to be referenced according to multiple types.

Rather than having the different ﬁelds reference different blocks of memory, they all reference the same block. The overall size of a union equals the maximum size of any of its ﬁelds.

089 union.png

Unions can be useful in several contexts. However, they can also lead to nasty bugs, since they bypass the safety provided by the C type system. One application is when we know in advance that the use of two different ﬁelds in a data structure will be mutually exclusive. Then, declaring these two ﬁelds as part of a union rather than a structure will reduce the total space allocated.

090 union example 1.png

091 union example 2.png

092 Problem 3.40.png

3.9.3 Data Alignment

Many computer systems place restrictions on the allowable addresses for the primitive data types, requiring that the address for some type of object must be a multiple of some value K (typically 2, 4, or 8). Such alignment restrictions simplify the design of the hardware forming the interface between the processor and the memory system.

The IA32 hardware will work correctly regardless of the alignment of data. However, Intel recommends that data be aligned to improve memory system performance. Linux follows an alignment policy where 2-byte data types (e.g., short) must have an address that is a multiple of 2, while any larger data types (e.g., int, int *, float, and double) must have an address that is a multiple of 4. Note that this requirement means that the least signiﬁcant bit of the address of an object of type short must equal zero. Similarly, any object of type int, or any pointer, must be at an address having the low-order 2 bits equal to zero.

Alignment is enforced by making sure that every data type is organized and allocated in such a way that every object within the type satisﬁes its alignment restrictions.

Library routines that allocate memory, such as malloc, must be designed so that they return a pointer that satisﬁes the worst-case alignment restriction for the machine it is running on, typically 4 or 8. For code involving structures, the compiler may need to insert gaps in the ﬁeld allocation to ensure that each structure element satisﬁes its alignment requirement. The structure then has some required alignment for its starting address.

.align 4
    
//This ensures that the data following it will start with an address that is a multiple of 4.

093 Data Alignment.png

094 Problem 3.41.png

095 Problem 3.42.png

3.10 Putting It Together: Understanding Pointers

Here we highlight some key principles of pointers and their mapping into machine code.

Every pointer has an associated type. This type indicates what kind of object the pointer points to.
Every pointer has a value. This value is an address of some object of the designated type. The special NULL (0) value indicates that the pointer does not point anywhere.
Pointers are created with the & operator.
Pointers are dereferenced with the * operator. The result is a value having the type associated with the pointer.
Arrays and pointers are closely related.
Casting from one type of pointer to another changes its type but not its value.
Pointers can also point to functions. This provides a powerful capability for storing and passing references to code, which can be invoked in some other part of the program.

3.11 Life in the Real World: Using the gdb Debugger

It is very helpful to ﬁrst run objdump to get a disassembled version of the program.

027 Code Examples.png

028 Code Example 2.png

We start gdb with the following command line:

unix> gdb prog

096 GDB commands.png

Rather than using the command-line interface to gdb, many programmers prefer using ddd, an extension to gdb that provides a graphic user interface.

097 higher levels of optimization.png

3.12 Out-of-Bounds Memory References and Buffer Overflow

099 Problem 3.43.png

A more pernicious use of buffer overflow is to get a program to perform a function that it would otherwise be unwilling to do. This is one of the most common methods to attack the security of a system over a computer network. Typically, the program is fed with a string that contains the byte encoding of some executable code, called the exploit code, plus some extra bytes that overwrite the return address with a pointer to the exploit code. The effect of executing the ret instruction is then to jump to the exploit code.

In one form of attack, the exploit code then uses a system call to start up a shell program, providing the attacker with a range of operating system functions. In another form, the exploit code performs some otherwise unauthorized task, repairs the damage to the stack, and then executes ret a second time, causing an (apparently) normal return to the caller.

3.12.1 Thwarting Buffer Overflow Attacks

The techniques we have outlined—randomization, stack protection, and limiting which portions of memory can hold executable code—are three of the most common mechanisms used to minimize the vulnerability of programs to buffer
overflow attacks. Unfortunately, there are still ways to attack computers [81, 94], and so worms and viruses continue to compromise the integrity of many machines.

3.13 x86-64: Extending IA32 to 64 Bits

A shift is underway to a 64-bit version of the Intel instruction set. Originally developed by Advanced Micro Devices (AMD) and named x86-64,it is now supported by most processors from AMD (who now call it AMD64) and by Intel,
who refer to it asIntel64. Most people still refer to it as “x86-64,” and we follow this convention. (Some vendors have shortened this to simply “x64”.)

For example, procedure parameters are now passed via registers rather than on the stack, greatly reducing the number of memory read and write operations.

3.13.1 History and Motivation for x86-64

For applications that involve manipulating large data sets, such as scientific computing, databases, and data mining, the 32-bit word size makes life difficult for programmers. They must write code using out-of-core algorithms, where the data reside on disk and are explicitly read into memory for processing.

In this text, we use “IA32” to refer to the combination of hardware and gcc code found in traditional 32-bit versions of Linux running on Intel-based machines. We use “x86-64” to refer to the hardware and code combination running
on the newer 64-bit machines from AMD and Intel. In the worlds of Linux and gcc, these two platforms are referred to as “i386” and “x86_64,” respectively.

3.13.2 An Overview of x86-64

The main features include:

Pointers and long integers are 64 bits long. Integer arithmetic operations support 8, 16, 32, and 64-bit data types.
The set of general-purpose registers is expanded from 8 to 16.
Much of the program state is held in registers rather than on the stack. Integer and pointer procedure arguments (up to 6) are passed via registers. Some procedures do not need to access the stack at all.
Conditional operations are implemented using conditional move instructions when possible, yielding better performance than traditional branching code.
Floating-point operations are implemented using the register-oriented instruction set introduced with SSE version 2, rather than the stack-based approach supported by IA32.

Data Types

100 x86_64 data types.png

Assembly-Code Example

101 x86_64 code example.png

3.13.3 Accessing Information

102 x86_64 registers.png

103 x86_64 data movement instructions.png

104 x86_64 special arithmetic operations.png

3.13.4 Control

105 x86_64 64-bits comparison and test instructions.png

Procedures

106 x86_64 Registers for passing function arguments.png

3.13.5 Data Structures

One difference is that x86-64 follows a more stringent set of alignment requirements. For any scalar data type requiring K bytes, its starting address must be a multiple of K. Thus, data types long and double as well as pointers, must be aligned on 8-byte boundaries. In addition, data type long double uses a 16-byte alignment (and size allocation), even though the actual representation requires only 10 bytes. These alignment conditions are imposed to improve memory system performance—the memory interface is designed in most processors to read or write aligned blocks that are 8 or 16 bytes long.

3.13.6 Concluding Observations about x86-64

The formulation of both the x86-64 hardware and the programming conventions changed the processor from one that relied heavily on the stack to hold program state to one where the most heavily used part of the state is held in the much faster and expanded register set.

The biggest drawback in transforming applications from 32 bits to 64 bits is that the pointer variables double in size, and since many data structures contain pointers, this means that the overall memory requirement can nearly double.

3.14 Machine-Level Representations of Floating-Point Programs

We call this combination of storage model, instructions, and conventions the floating-point architecture for a machine.

method of storing floating-point data
additional instructions to operate on floating-point values
instructions to convert between floating-point and integer values
instructions to perform comparisons between floating-point values
conventions on how to pass floating-point values as function arguments and to return them as function results

107 SSE floating-point architecture.png

3.15 Summary

By contrast, Java is implemented in an entirely different fashion. The object code of Java is a special binary representation known as Java byte code. This code can be viewed as a machine-level program for a virtual machine. As its name suggests, this machine is not implemented directly in hardware. Instead, software interpreters process the byte code, simulating the behavior of the virtual machine.