《Compilers : principles, techniques, and tools》读书笔记(3)

Chapter 6: Intermediate-Code Generation

In  the  analysis-synthesis  model of a compiler,  the  front  end  analyzes  a  source program and  creates  an  intermediate  representation,  from which the  back end generates  target code.在编译器的分析-综合模型中,前端分析源程序并生成一种中间表示,后端根据这种中间表示生成目标代码。

This chapter deals with  intermediate  representations,  static  type  checking, and  intermediate  code  generation.主要讲解中间表示、静态检查和中间代码生成

6.1 Variants of Syntax Trees

Nodes in a syntax tree represent constructs in  the  source  program; the children of a  node  represent  the  meaningful  components  of a  construct.语法树中的节点代表源程序中的结构,节点的孩子代表这个结构的有意义的组件。

A  directed acyclic graph  ( hereafter called a DAG) for  an expression  identifies the  common subexpressions  ( sub  expressions  that  occur more than once )  of the  expression.表达式的简单有向图DAG

DAG  can  be  constructed  by  using  the  same techniques  that construct syntax trees.可以像构建语法树一栏来构建DAG

6.1.1 Directed Acyclic Graphs for Expressions表达式的简单有向图

The difference between DAG and syntax tree is that a node N in a DAG has  more than one parent if N represents a com­mon subexpression; in a syntax  tree, the tree for the common sub expression would be replicated as many times as the sub expression appears in the original expression.DAG中的节点可以有多个父节点

It  will  construct  a  DAG  if, before creating  a  new  node, these functions first check  whether an identical node already  exists.  If a  previously created identical  node  exists,  the  existing  node is  returned.创建DAG节点时,先检查同样的节点是否已经存在。

6.1.2 The Value-Number Method for Constructing DAG

Often, the nodes of a  syntax tree or DAG are  stored in an array  of records.编程实现语法树或者DAG时,使用数组。

Each  row  of  the array represents  one  record, and  therefore one node. 数组中的一个元素代表一个节点。

In each record, the first field is an operation code ,  indicating  the label of  the  node. 数组中元素的第一个域是操作码,对应节点的标签。

leaves  have one  additional  field ,  which  holds  the lexical  value  (either  a  symbol-table  pointer  or  a  constant,  in  this  case) ,  and interior nodes have two  additional fields indicating  the  left  and right  children.叶子节点有一个额外的域,装着词法值(符号表中的一项或者一个常量);内部节点有2个额外的域,对应左右子节点。

In this array,  we  refer  to  nodes  by  giving  the  integer  index  of  the  record for  that  node  within  the  array.  This  integer historically  has  been  called  the value  number  for the  node  or for the  expression  represented  by  the node.节点或者节点对应的表达式有一个value number。

searching  the  entire  array every  time  we are asked to locate one  node  is expensive,  especially if  the array holds  expressions from an  entire program.  A more efficient  approach is to use  a hash  table,  in  which the nodes  are put  into  "buckets"  each of which  typically will  have only  a few  nodes.使用哈希表提升效率。

6.2 Three-Address Code

there  is  at  most  one  operator  on the  right  side  of an instruction.指令的右边最多有一个操作符。

This unraveling of multi-operator  arithmetic  expressions  and  of  nested flow-of-control  statements makes three-address code  desirable  for target-code  generation  and optimization.便于目标代码生成和优化

6.2.1 Addresses and Instructions

three-address code  can  be  implemented  using  records  with  fields for  the  addresses.可以通过带有地址域的记录实现

An address can be one of  the following: name, constant, compiler-generated  temporary.地址可以是标识符、常量或者编译器生成的临时变量。

Here is  a list  of  the  common  three-address instruction forms常见的9种三地址指令:

1.  Assignment  instructions  of  the  form  x =  y  op  Z

2.  Assignments of  the form  x =  op y

3.  Copy instructions of  the form  x =  y

4.  An  unconditional  jump got o  L

5.  Conditional  jumps  of  the  form  if  x goto L and  if  Fal se  x goto L

6.  Conditional jumps such  as if  x  relop  y  goto  L

7.  Procedure calls and returns

8.  Indexed copy instructions of  the form  x = y [i]  and x [i]  =  y

9.  Address and pointer assignments of the form x = &y, x = *y,  and *x  =  y

6.2. 2  Quadruples四元组

In a compiler, these three-address instructions can be implemented as objects or as records with fields for the operator and the operands. Three such representations are called "quadruples", "triples", and "indirect triples".三地址指令可以用对象或者带有操作符和操作数域的记录实现。

A  quadruple  (or just  "quaff")  has  four  fields ,  which we call  op,  arg1, arg2, and result.  The op  field contains an internal code for the operator.

6.2.3  Triples

A  triple  has  only  three  fields,  which  we  call  op,  arg1,  and  arg2.只有3个域

Using triples,  we  refer  to  the  result  of  an  operation  x op y  by  its  position,  rather than  by  an  explicit  temporary name.用指令的位置来标识指令的结果

Indirect triples consist of a  listing  of  pointers to  triples, rather  than a  listing of  triples themselves.

6.2.4  Static  Single-Assignment  Form

Two  distinctive  aspects  distinguish  SSA from  three-address  code.  The first is  that  all  assignments  in  SSA are to vari­ables  with distinct  names;  hence  the  term  static  single-assigment.区别之一

SSA  uses  a notational convention called the ¢-function  to combine the two definitions of x.区别之二

the ¢-function returns the value of  its  argument that corresponds to the  control-flow path that  was taken to get  to the assignment­ statement  containing the ¢-f unction.

6.3 Types and Declarations

The applications of types can be grouped under checking and translation:

Type checking uses logical rules to reason about the behavior of a program at run time. Specifically, it ensures that the types of the operands match the type expected by an operator.类型检查使用逻辑规则来推理程序运行时的行为。特别地,它确保操作数的类型与操作符期望的类型匹配。

Translation Applications. From the type of a name, a compiler can determine the storage that will be needed for that name at run time. Type information is also needed to calculate the address denoted by an array reference, to insert explicit type conversions, and to choose the right version of an arithmetic operator, among other things.根据标识符的类型,编译器可以确定那个标识符在运行时所需要的存储空间。计算数组引用的地址时,插入显式的类型转换时,以及为算术操作符选择正确的类型时,也需要标识符的类型信息。

The actual storage for a procedure call or an object is allocated at run time, when the procedure is called or the object is created.真正的内存分配发生在运行时的函数被调用以及对象被创建。

As we examine local declarations at compile time, we can, however, lay out relative addresses, where the relative address of a name or a component of a data structure is an offset from the start of a data area.因为在编译期我们会检测局部声明,我们可以使用相对地址来布局内存。在相对布局下,一个标识符或者结构体的一个组件的地址是相对于数据区开始处的一个偏移量。

6.3.1 Type Expressions

a type expression is either a basic type or is formed by applying an operator called a type constructor to a type expression一个类型表达式是一个基本类型或者由一个叫做类型构造器的操作符应用到一个类型表达式上组成。

the following  definition of  type expressions:

A basic type  is  a type  expression.

A type  name is a type expression.

A type  expression can  be formed  by applying the  array  type constructor to  a number and a type expression.

A  record  is  a  data  structure  with  named  fields.

A type  expression can  be  formed  by  using  the  type  constructor  -> for  func­tion  types .

If s and t are type expressions, then their Cartesian product s x t is  a type  expression.

Type expressions may contain variables whose values are type expressions.

A convenient way to represent a type expression is to use a graph. The value-number method can be adapted to construct a dag for a type expression, with interior nodes for type constructors and leaves for basic types, type names, and type variables.可以使用值数法来构建一个类型表达式的DAG,内部节点表示类型构造器,叶子节点表示基本类型、类型名、和类型变量。

6.4 Translation of Expressions

6.5 Type Checking

6.6 Control Flow

6.7 Backpatching

6.8 Switch-Statements

6.9 Intermediate Code for Procedures

你可能感兴趣的:(《Compilers : principles, techniques, and tools》读书笔记(3))