《Compilers : principles, techniques, and tools》读书笔记(4)

Chapter 7: Run-Time  Environments

the compiler creates and manages a run-time environment in which it assumes its  target programs are being executed.编译器创建并管理一个运行时环境,并在这个环境中模拟目标程序的执行。

Run-time environment deals with a  variety of issues such as  the  layout and allocation  of  storage locations  for the objects  named in  the source program, the mechanisms used by the target  pro­gram to access variables, the linkages  between procedures, the mechanisms for passing parameters ,  and  the  interfaces  to  the  operating system,  input / output devices ,  and other programs.运行时环境负责处理源程序中的对象的存储位置的分布以及分配、目标程序存取变量的机制、过程之间的链接、传递参数的机制、跟操作系统、IO设备和其他程序交互的接口。

The two themes in this chapter are the allocation of storage locations and access to variables and data. We shall discuss memory management in some detail, including stack allocation, heap management, and garbage collection.这一章的2个主题是存储位置的分配和存取变量和数据。将会详细地讨论内存管理中的栈分配、堆管理和垃圾回收。

7.1  Storage  Organization

The  run-time  representation  of  an  object  program  in  the  logical  address space consists of  data and program areas.在逻辑地址空间下,程序的运行时环境包括数据区和代码区

On  many  machines,  instructions  to  add integers may  expect integers to  be aligned,  that is ,  placed at an address divisible by 4.四字节对齐问题

The  size  of the  generated target code  is  fixed  at  compile  time,  so the  compiler can place the executable target  code in a statically  determined area  Code, usually  in  the  low  end of  memory.  Similarly,  the size  of  some  program  data objects,  such  as  global  constants, and  data generated by  the  compiler,  such  as information to support garbage collection,  may be known at  compile time ,  and these  data  objects  can  be  placed  in  another  statically determined  area  called Static.可执行的目标代码放在Code区,像全局常量这样的程序数据对象和由编译器产生的比如为了支持垃圾回收的一些信息等数据放在Static区。

To maximize the utilization of  space at run time ,  the other two areas,  Stack and Heap,  are at the opposite ends of  the  remainder  of  the address space.

The stack is  used  to  store  data structures called activation records that  get generated during procedure calls.栈用来保存因过程调用而产生的activation records。

In  practice,  the  stack  grows  towards  lower  addresses,  the  heap  towards higher.实际的操作系统中,栈向低地址生长,堆向高地址生长。

an  activation record  is  used  to  store information  about the  status  of  the machine, such as the value of  the program counter  and  machine  registers ,  when  a  procedure  call  occurs.activation record用来保存过程调用发生时机器状态的信息,比如程序计数器和机器寄存器的值。

When control returns from the call, the activation of the calling procedure can be restarted after restoring the values of relevant registers and setting the program counter to the point immediately after the call. 调用一个函数返回后,相关寄存器的值会被重置,程序计数器会指向函数的下一行,主程序的activation可以被重新开始。

Data objects whose lifetimes are con­tained in that of an activation can be allocated on the stack along with other information associated with the activation.局部数据对象和其他与activation关联的信息是在stack上分配的。

Many programming languages allow the programmer to allocate and deal­locate data  under program control. For example, C has the functions malloc and free that can be used to obtain and give back arbitrary chunks of stor­age. The heap is used to manage this kind of long-lived data.programmer申请的内存是在堆上分配的。

7.1.1 Static Versus Dynamic Storage Allocation

We say  that  a storage-allocation decision is  static, if  it can  be made by  the  compiler  looking  only  at  the  text  of the  program,  not at what  the program  does  when it  executes. Conversely,  a  decision is  dynamic  if it  can be decided only while the  program is running.内存分配有static和dynamic之后

Many compilers use some combination of  the following two strategies  for  dynamic storage allocation:

Stack storage: Names local to a procedure are allocated space on a stack.

Heap storage: Data that  may outlive the call to  the  procedure that  cre­ated it  is  usually  allocated  on a  "heap"  of reusable storage.

To support heap management, "garbage  collection" enables  the  run-time system to detect useless data  elements and reuse their storage, even if  the pro­grammer does  not  return  their space explicitly.运行时环境可以检测不再使用的数据元素,并且在即使pro­grammer没有显式地释放它们的情况下重用它们的存储空间。

7. 2 Stack Allocation of Space

Almost all compilers  for  languages that use procedures, functions,  or methods as units of user-defined actions manage at least part of  their run-time memory as a  stack. 使用过程、函数或者方法作为用户定义的操作的单元的语言的编译器把它们运行时内存的一部分组织成一个栈。

Each  time  a  procedure is  called,  space  for  its  local  variables  is pushed onto a stack,  and when the procedure terminates ,  that space is popped off the  stack.每次过程被调用时,它的局部变量的空间被加入到一个栈上,过程结束时,那段空间被弹出栈。

7. 2.1 Activation Trees

Stack allocation would not be feasible  if  procedure calls, or  activations of pro­cedures,  did  not  nest  in  time.时间上要嵌套

We therefore can  represent the activations of  procedures during  the  running of  an entire program by a tree,  called an  activation tree.激活树?

Each node corresponds to one  activation,  and the  root  is  the  activation  of  the  "main"  procedure that initiates execution of  the program.每个节点对应一个activation,根是main过程的activation,main过程初始化程序的执行。

At a node for  an  activation of procedure  p, the  children  correspond  to activations  of  the  procedures  called  by this activation of  p.过程p的activation对应的节点,它的子节点对应过程p调用的过程的activations。

Notice  that  one  child  must  finish  before  the activation  to  its  right  can begin.左节点必须在右节点的开始之前结束。

The use of  a  run-time stack  is  enabled  by several  useful relationships  between the  activation tree and the behavior of  the program:

1. The sequence of  procedure calls corresponds to a preorder traversal of the activation tree.

2. The sequence of returns  corresponds to a postorder traversal  of the acti­vation tree.

3. The order in which these activations were called is the order in which they appear along the path to N, starting at the root, and they will return in the reverse of that order.

7.2.2 Activation Records

Procedure calls and returns are usually managed by a run-time stack called the control stack. 控制栈

Each live activation  has  an  activation  record ( sometimes called  a frame)  on the  control  stack,  with the root  of the  activation tree  at  the  bottom , and the entire  sequence of activation  records on the stack corresponding to the path  in  the  activation  tree  to  the  activation  where  control  currently  resides.在控制栈上,每一个live的activation有一个activation  record,activation tree的根在底部。控制栈上activation records的完整序列对应activation tree上到控制当前停留的activation的路径。

The latter activation has its record at the top of the stack.后来的activation的record在栈顶。

We shall conventionally draw control stacks with the bottom of the stack higher than the top, so the elements in an activation record that appear lowest on the page are actually  closest to the top of the stack.控制栈底部在下,activation tree中出现在越下的activation record的元素,事实上越靠近栈顶。

Here is a list of the kinds of data that might appear in an activation record:

1. Temporary values比如计算表达式产生的且不能够放在寄存器中的值

2. Local data

3. A saved machine status: This information  typically includes the  return  address and  the  contents of registers that were used  by the calling procedure and that  must be restored when the  return occurs.包括返回地址和寄存器的内容

4. An "access link"

5. A control link: pointing to  the  activation record of the caller

6. Space for the return value: if called procedures return a value ,  we may prefer to place that value in  a  register  for  efficiency.

7. The actual parameters: Commonly,  these values are  not placed  in  the  activation record  but rather  in registers, when possible,  for  greater efficiency.通常,实际参数不放在activation record中,为了更好的效率,而是放在寄存器中。

the top of stack is  at  the  bottom of diagrams.栈顶在图示的下方

when a procedure is recursive,  it is normal to have several of its activation records on the stack at the same time.递归函数可以在栈中同时有多个它的activation records。

7.2.3 Calling Sequences

Procedure calls are  implemented by what are  known as calling  sequences, which consists  of  code  that  allocates  an  activation  record  on  the  stack  and  enters information into its fields.过程调用由calling sequences实现,calling sequences由在栈上分配activation  record并把信息写入到activation  record的域中的代码构成。

The  code in  a  calling  se­quence  is  often  divided  between  the  calling  procedure  ( the  "caller" )  and  the procedure  it  calls  (the  "callee" ).

1. Values communicated between caller  and callee are  generally placed at the beginning of  the callee's activation record, so  they  are  as close  as possible to  the  caller 's  activation  record.

2. Fixed-length  items  are generally  placed  in  the  middle.

3. Items whose size  may  not  be  known  early  enough  are  placed  at  the  end of the  activation  record.

4. We must locate the top-of-stack pointer judiciously.

A  register top_sp points  to  the  end  of the  machine­ status field  in  the  current top  activation record.  This position within the  callee's activation record is  known to the  caller,  so the caller  can  be made responsible for  setting  top_sp  before  control  is  passed  to  the  callee.寄存器top_sp指向当前最顶activation record的机器状态域之后。

The calling sequence and its  division between  caller and  callee is  as  follows:

1.  The caller evaluates  the actual parameters.调用者计算实际参数

2.  The  caller  stores  a  return  address  and  the  old  value  of  top_sp  into  the callee's  activation  record.  The  caller then  increments  top_sp  to  the po­sition shown in  Fig. 7.7.  That is,  top_sp  is moved past  the  caller's local data and temporaries and the  callee's  parameters and status fields.调用者把返回地址和top_sp的旧值保存到被调用者的activation  record。然后移动top_sp的指向,越过调用者的局部数据以及临时变量和被调用者的参数以及状态域。

3. The callee saves the register values and other status information.保存寄存器的值和其他状态信息

4.  The callee initializes its local data and begins execution.初始化局部数据并开始执行

A suitable, corresponding return sequence is:

1. The callee places the  return value  next  to  the  parameters.被调用者把返回值放在临近参数的内存空间。

2. Using  information  in the machine-status  field,  the callee  restores  top_sp and  other  registers,  and  then  branches  to  the return  address  that  the caller  placed in the status  field.被调用者重置top_sp和其他寄存器,跳到调用者指定的返回地址处。

3. Although top_sp has been decremented, the caller knows where the return value is, relative to the current value of top-sp; the caller therefore may use that value.调用者取用返回值。

7.2.4  Variable-Length  Data on  the  Stack

In modern languages, objects whose size cannot be determined at compile time are allocated space in the heap. However, it is also possible to allocate objects, arrays, or other structures of unknown size on the stack.编译期不能确定大小的对象,可以在堆上分配,也可以在栈上分配。

the stack  can  be used  only  for  an object if  it is local to  a procedure and  becomes inaccessible when the  procedure returns.只有局部对象的内存空间才可以从栈上分配。

Access to the data on the stack is through two pointers, top  and  top_sp. Here, top  marks the actual top of stack; it points to the position at which the next activation  record will begin. The second, top_sp  is used to find local, fixed-length fields of the  top activation record.top指针标识栈的真正的栈顶,指向下一个activation  record的开始;top_sp用来寻找栈顶activation record的局部的定长的域。

7.3  Access  to  Nonlocal  Data  on  the  Stack

Access becomes  more  complicated in  languages  where procedures  can  be declared inside  other  procedures.当可以在一个过程中声明另一个过程时,存取数据变得更加复杂。

7.3.1  Data Access  Without  Nested  Procedures

For languages  that do  not allow nested procedure declarations, allocation of storage for variables  and access to those variables is  simple:

1.  Global variables are allocated static  storage.  The locations of these vari­ables  remain  fixed  and  are  known  at  compile  time .  So  to  access  any variable  that is not  local to the  currently executing procedure, we simply use the statically determined address.

2. Any other name must  be local  to the  activation  at  the top  of the  stack. We may access these variables through the top_sp pointer of the stack.

7.3.2  Issues  With  Nested  Procedures

Finding the declaration that  applies to a nonlocal name x  in  a nested pro­cedure  p  is  a static  decision;  it can  be done by  an extension of  the  static-scope rule  for  blocks.

Suppose  x  is  declared in  the  enclosing procedure  q.  Finding the relevant  activation of q  from  an  activation of  p  is a dynamic decision; it  re­quires  additional  run-time  information  about activations.  One possible solution to this problem is to use  "access links".

7.3.3  A  Language  With Nested Procedure  Declarations

介绍了一种叫做ML语言,function definitions  can be nested函数定义可以嵌套。

ML is  a functional  language,  meaning  that  variables , once  declared  and initialized,  are not changed.  There are  only a few exceptions , such as  the array, whose elements can be changed  by special function calls.变量一旦声明并初始化之后,不可以改变;例外是数组。

val  (name)  =  (expression)

fun  (name)  ( (arguments) )  =  (body)

et  (list of  definitions)  in  (statements)  end

7.3.4  Nesting  Depth

nesting depth 1: procedures that are not nested within any other procedure

if a procedure p is defined immediately within a procedure at nesting depth i, then give p the nesting depth i+1.如果过程p是在nesting depth为i的过程中直接被定义的,则过程p的nesting depth为i+1。

7.3.5  Access  Links

A direct implementation of the normal static scope rule for nested functions is obtained by adding a pointer called the access link to each activation record.为每一个activation record添加一个称为access link的指针,来实现嵌套函数的正常的静态范围规则。

Access links form a chain from the activation record at the top of the stack to a  sequence of activations at progressively lower nesting depths.access links形成了一条从栈顶的activation record到低嵌套深度的一系列activations的链。

7.3.6  Manipulating  Access  Links

what should happen when a procedure q  calls procedure p,  explicitly?  There are three  cases:

1.  Procedure  p  is at  a higher nesting depth than  q

2.  The call  is  recursive, that  is, p  =  q

3.  The  nesting  depth  np of p  is  less than the nesting depth  nq of  q

7.3.7  Access  Links  for  Procedure  Parameters函数作为参数

when procedures are used as parameters, the caller  needs to  pass,  along  with the  name of  the  procedure-parameter, the  proper access link for  that  parameter.

7.3.8  Displays

One  problem with  the  access-link  approach  to  nonlocal  data  is  that  if  the  nesting depth  gets  large,  we  may have to follow  long  chains  of links  to reach the  data we need.  A more efficient  implementation  uses an auxiliary array d,  called  the display,  which consists  of one pointer for each nesting depth.当嵌套深度很大时,access link方法不再适用,于是引入了display方法。

We arrange that, at  all  times, d[i] is a pointer to the highest activation record on the stack for any procedure at nesting depth i.d[i]指向当前栈上嵌套深度为i的最高的activation record。

The  advantage  of  using  a  display  is that  if  procedure p  is executing,  and it  needs to access  element  x  belonging  to  some  procedure  q,  we  need  to  look only in  d[i] ,  where i  is the nesting depth of q;  we follow  the pointer d[i]  to the activation  record  for  q,  wherein  x  is  found  at  a  known  offset.优点是不用沿着长长的链表去查询对应的activation  record记录。

In order to maintain the display correctly,  we need to save previous values of display entries in new activation records.在新的activation record中,保存display enties的旧值。

7.4 Heap Management

The heap is the portion of the store that is used for data that lives indefinitely, or until the program explicitly deletes it.长时间使用的或者需要程序显式删除的数据保存在堆上。

the memory manager ,  the  subsystem  that  allo­cates  and  deallocates  space within the  heap;  it  serves  as an  interface  between application  programs and the  operating  system.

7. 4.1  The  Memory  Manager

The memory manager keeps track of all the free space in heap storage at all times. memory manager追踪检测所有堆存储上的空闲空间。

It  performs two basic functions: Allocation and Deallocation.

data elements  of  different sizes are allocated, and there  is no  good way to predict the lifetimes of all allocated objects.数据元素的大小不固定,而且没有好的方式来预测所有被分配的对象的生命周期。

7.4.2 The Memory Hierarchy of a Computer

A memory hierarchy consists of a series of storage elements, with the smaller faster ones "closer" to the processor, and the larger slower ones further away.离CPU越近,存储介质越快。

7.4.3  Locality  in  Programs

a  program has temporal locality if the memory locations it accesses are likely to be accessed again within a short period of time.如果一块内存被访问了,那么在将来一段短的时间内,它很大可能还会被访问。(时间局部性)

a program has spatial locality if memory locations close to the location accessed are likely also to be accessed within a short period of time.(空间局部性)

Optimization Using  the  Memory Hierarchy

One effective technique to improve the spatial lo­cality of instructions is to have the compiler place basic blocks (sequences of instructions that are always executed sequentially) that are likely to follow each other contiguously - on the  same page, or even the same cache line, if possi­ble.尽可能地,把基本块放在同一page甚至同一cache line上。

We can also improve the  temporal  and  spatial  locality  of data  accesses  in a program  by  changing  the  data  layout or the order of the computation.改变计算的数据流或者顺序

7.4.4  Reducing  Fragmentation

At the beginning of  program execution, the heap is one contiguous unit of free space. As the program allocates and deallocates memory, this space is broken up into free and used chunks of memory, and the free  chunks need not reside in a contiguous area of the heap.随着程序分配和回收内存,堆空间出现了碎片。

Best-Fit  and  Next-Fit  Object  Placement

best-fit algorithm tends to spare the large holes to satisfy subsequent, larger  requests. An alternative, called first-fit, where an object is placed in the first  (lowest-address) hole in which it fits, takes less time to place objects, but has  been found inferior to best-fit in overall performance.best-fit算法是找到差不多最合适的Hole,first-fit是找到第一个合适的Hole,效率比best-fit要差。

next-fit strategy, trying to allocate the object in the chunk that has last been  split, whenever enough space for the new object remains in that chunk. Next-fit also tends to improve the speed of the allocation operation.

Managing and  Coalescing Free  Space

7.4.5  Manual  Deallocation  Requests

Ideally, any storage that  will no longer  be accessed should  be  deleted.  Conversely,  any storage  that  may be referenced must not be deleted.不再使用的空间应该被回收,正在被引用的空间一定不能被删除。

Problems  with Manual Deallocation

The common mistakes take two forms: failing ever to delete data that cannot  be referenced is called a memory­ leak error, and referencing deleted data is a  dangling-pointer-dereference error.

Programming  Conventions  and  Tools

Object ownership: associate  an  owner with  each  object  at  all times, The owner  is responsible for either  deleting  the  object  or for  passing the  object  to  another  owner.

Reference  counting: associate  a  count  with each  dynamically allocated object, Whenever a reference to the object is  created, we incre­ment the reference  count ;  whenever a  reference is removed,  we decrement the  reference count.  When the count goes  to  zero , the object can no  longer be  referenced and can therefore be deleted.

Region- based allocation: When objects  are  created to be used only within some step of a  computation,  we can  allocate all such objects  in  the  same  region.  We  then  delete  the  entire  region  once  that computation  step completes.

7.5 Introduction to Garbage Collection



7. 6 Introduction to Trace-Based  Collection


7.7 Short-Pause Garbage Collection


7.8 Advanced Topics in Garbage Collection

你可能感兴趣的:(《Compilers : principles, techniques, and tools》读书笔记(4))