Chapter 6: Intermediate-Code Generation
In the analysis-synthesis model of a compiler, the front end analyzes a source program and creates an intermediate representation, from which the back end generates target code.在编译器的分析-综合模型中,前端分析源程序并生成一种中间表示,后端根据这种中间表示生成目标代码。
This chapter deals with intermediate representations, static type checking, and intermediate code generation.主要讲解中间表示、静态检查和中间代码生成
6.1 Variants of Syntax Trees
Nodes in a syntax tree represent constructs in the source program; the children of a node represent the meaningful components of a construct.语法树中的节点代表源程序中的结构,节点的孩子代表这个结构的有意义的组件。
A directed acyclic graph ( hereafter called a DAG) for an expression identifies the common subexpressions ( sub expressions that occur more than once ) of the expression.表达式的简单有向图DAG
DAG can be constructed by using the same techniques that construct syntax trees.可以像构建语法树一栏来构建DAG
6.1.1 Directed Acyclic Graphs for Expressions表达式的简单有向图
The difference between DAG and syntax tree is that a node N in a DAG has more than one parent if N represents a common subexpression; in a syntax tree, the tree for the common sub expression would be replicated as many times as the sub expression appears in the original expression.DAG中的节点可以有多个父节点
It will construct a DAG if, before creating a new node, these functions first check whether an identical node already exists. If a previously created identical node exists, the existing node is returned.创建DAG节点时,先检查同样的节点是否已经存在。
6.1.2 The Value-Number Method for Constructing DAG
Often, the nodes of a syntax tree or DAG are stored in an array of records.编程实现语法树或者DAG时,使用数组。
Each row of the array represents one record, and therefore one node. 数组中的一个元素代表一个节点。
In each record, the first field is an operation code , indicating the label of the node. 数组中元素的第一个域是操作码,对应节点的标签。
leaves have one additional field , which holds the lexical value (either a symbol-table pointer or a constant, in this case) , and interior nodes have two additional fields indicating the left and right children.叶子节点有一个额外的域,装着词法值(符号表中的一项或者一个常量);内部节点有2个额外的域,对应左右子节点。
In this array, we refer to nodes by giving the integer index of the record for that node within the array. This integer historically has been called the value number for the node or for the expression represented by the node.节点或者节点对应的表达式有一个value number。
searching the entire array every time we are asked to locate one node is expensive, especially if the array holds expressions from an entire program. A more efficient approach is to use a hash table, in which the nodes are put into "buckets" each of which typically will have only a few nodes.使用哈希表提升效率。
6.2 Three-Address Code
there is at most one operator on the right side of an instruction.指令的右边最多有一个操作符。
This unraveling of multi-operator arithmetic expressions and of nested flow-of-control statements makes three-address code desirable for target-code generation and optimization.便于目标代码生成和优化
6.2.1 Addresses and Instructions
three-address code can be implemented using records with fields for the addresses.可以通过带有地址域的记录实现
An address can be one of the following: name, constant, compiler-generated temporary.地址可以是标识符、常量或者编译器生成的临时变量。
Here is a list of the common three-address instruction forms常见的9种三地址指令:
1. Assignment instructions of the form x = y op Z
2. Assignments of the form x = op y
3. Copy instructions of the form x = y
4. An unconditional jump got o L
5. Conditional jumps of the form if x goto L and if Fal se x goto L
6. Conditional jumps such as if x relop y goto L
7. Procedure calls and returns
8. Indexed copy instructions of the form x = y [i] and x [i] = y
9. Address and pointer assignments of the form x = &y, x = *y, and *x = y
6.2. 2 Quadruples四元组
In a compiler, these three-address instructions can be implemented as objects or as records with fields for the operator and the operands. Three such representations are called "quadruples", "triples", and "indirect triples".三地址指令可以用对象或者带有操作符和操作数域的记录实现。
A quadruple (or just "quaff") has four fields , which we call op, arg1, arg2, and result. The op field contains an internal code for the operator.
6.2.3 Triples
A triple has only three fields, which we call op, arg1, and arg2.只有3个域
Using triples, we refer to the result of an operation x op y by its position, rather than by an explicit temporary name.用指令的位置来标识指令的结果
Indirect triples consist of a listing of pointers to triples, rather than a listing of triples themselves.
6.2.4 Static Single-Assignment Form
Two distinctive aspects distinguish SSA from three-address code. The first is that all assignments in SSA are to variables with distinct names; hence the term static single-assigment.区别之一
SSA uses a notational convention called the ¢-function to combine the two definitions of x.区别之二
the ¢-function returns the value of its argument that corresponds to the control-flow path that was taken to get to the assignment statement containing the ¢-f unction.
6.3 Types and Declarations
The applications of types can be grouped under checking and translation:
Type checking uses logical rules to reason about the behavior of a program at run time. Specifically, it ensures that the types of the operands match the type expected by an operator.类型检查使用逻辑规则来推理程序运行时的行为。特别地,它确保操作数的类型与操作符期望的类型匹配。
Translation Applications. From the type of a name, a compiler can determine the storage that will be needed for that name at run time. Type information is also needed to calculate the address denoted by an array reference, to insert explicit type conversions, and to choose the right version of an arithmetic operator, among other things.根据标识符的类型,编译器可以确定那个标识符在运行时所需要的存储空间。计算数组引用的地址时,插入显式的类型转换时,以及为算术操作符选择正确的类型时,也需要标识符的类型信息。
The actual storage for a procedure call or an object is allocated at run time, when the procedure is called or the object is created.真正的内存分配发生在运行时的函数被调用以及对象被创建。
As we examine local declarations at compile time, we can, however, lay out relative addresses, where the relative address of a name or a component of a data structure is an offset from the start of a data area.因为在编译期我们会检测局部声明,我们可以使用相对地址来布局内存。在相对布局下,一个标识符或者结构体的一个组件的地址是相对于数据区开始处的一个偏移量。
6.3.1 Type Expressions
a type expression is either a basic type or is formed by applying an operator called a type constructor to a type expression一个类型表达式是一个基本类型或者由一个叫做类型构造器的操作符应用到一个类型表达式上组成。
the following definition of type expressions:
A basic type is a type expression.
A type name is a type expression.
A type expression can be formed by applying the array type constructor to a number and a type expression.
A record is a data structure with named fields.
A type expression can be formed by using the type constructor -> for function types .
If s and t are type expressions, then their Cartesian product s x t is a type expression.
Type expressions may contain variables whose values are type expressions.
A convenient way to represent a type expression is to use a graph. The value-number method can be adapted to construct a dag for a type expression, with interior nodes for type constructors and leaves for basic types, type names, and type variables.可以使用值数法来构建一个类型表达式的DAG,内部节点表示类型构造器,叶子节点表示基本类型、类型名、和类型变量。