RPC论文翻译(Implementing Remote Procedure Calls)

论文解读请参考:https://blog.csdn.net/bingxuesiyang/article/details/119670092?spm=1001.2014.3001.5501 

总述:

Remote procedure calls (RPC) appear to be a useful paradig m for providing communication across a network between programs written in a high-level language. This paper describes a package providing a remote procedure call facility, the options that face the designer of such a package, and the decisions ~we made. We describe the overall structure of our RPC mechanism, our facilities for binding RPC clients, the transport level communication protocol, and some performance measurements. We include descriptioro~ of some optimizations used to achieve high performance and to minimize the load on server machines that have many clients.

CR Categories and Subject Descriptors: C.2.2 [Computer-Communication Networks]: Network Protocols--protocol architecture; C.2.4 [Computer-Communication Networks]: Distributed Systems-distributed applications, network operating systems; D.4.4 [Operating Systems]: Communications Management--message sending, network communication; D.4.7[Operatiug Systems]: Organization and Design--distributed systems

General Terms: Design, Experimentation, Performance, Security

Additional Keywords and Phrases: Remote procedure calls, transport layer protocols, distributed naming and binding, inter-process communication, performance of communication protocols.

     远程过程调用(RPC)似乎是在用高级语言编写的程序之间通过网络进行通信的有用范例。本文描述了一个提供远程过程调用工具的包,该包的设计者面临的选择,以及我们所做的决策。我们描述了RPC机制的总体结构、绑定RPC客户机的工具、传输层通信协议和一些性能度量。我们还介绍了一些用于实现高性能和最小化有许多客户机的服务器机器负载的优化。C.2.2[计算机通信网络]:网络协议——协议体系结构;C.2.4[计算机通信网络]:分布式系统——分布式应用程序,网络操作系统;D.4.4[操作系统]:通信管理——消息发送、网络通信;远程过程调用,传输层协议,分布式命名和绑定,进程间通信,通信协议的性能。

1. INTRODUCTION

1.1 Background

   The idea of remote procedure calls (hereinafter called RPC) is quite simple. It is based on the observation that procedure calls are a well-known and wellunderstood mechanism for transfer of control and data within a program running on a single computer. Therefore, it is proposed that this same mechanism be extended to provide for transfer of control and data across a communication network. When a remote procedure is invoked, the calling environment is suspended, the parameters are passed across the network to the environment where the procedure is to execute (which we will refer to as the callee), and the desired procedure is executed there. When the procedure finishes and produces its results, the results are passed backed to the calling environment, where execution resumes as if returning from a simple single-machine call. While the calling environment is suspended, other processes on that machine may (possibly) still execute (depending on the details of the parallelism of that environment and the RPC implementation). 

   There are many attractive aspects to this idea. One is clean and simple semantics: these should make it easier to build distributed computations, and to get them right. Another is efficiency: procedure calls seem simple enough for the communication to be quite rapid. A third is generality: in singie-machine computations, procedures are often the most important mechanism for communication between parts of the algorithm.

  The idea of RPC has been around for many years. It has been discussed in the public literature many times since at least as far back as 1976 [15]. Nelson's doctoral dissertation [13] is an extensive examination of the design possibilities for an RPC system and has references to much of the previous work on RPC. However, full-scale implementations of RPC have been rarer than paper designs. Notable recent efforts include Courier in the Xerox NS family of protocols [4], and current work at MIT [10].

    This paper results from the construction of an RPC facility for the Cedar project. We felt, because of earlier work (particularly Nelson's thesis and associated experiments), that we understood the choices the designer of an RPC facility must make. Our task was to make the choices in light of our particular aims and environment. In practice, we found that several areas were inadequately understood, and we produced a system whose design has several novel aspects. Major issues facing the designer of an RPC facility include: the precise semantics of a call in the presence of machine and communication failures; the semantics of address-containing arguments in the (possible) absence of a shared address space; integration of remote calls into existing (or future) programming systems; binding (how a caller determines the location and identity of the callee); suitable protocols for transfer of data and control between caller and callee; and how to provide data integrity and security (if desired) in an open communication network. In building our RPC package we addressed each of these issues, but it not possible to describe all of them in suitable depth in a single paper. This paper includes a discussion of the issues and our major decisions about them, and describes the overall structure of our solution. We also describe in some detail our binding mechanism and our transport level communication protocol. We plan to produce subsequent papers describing our facilities for encryption-based security, and providing more information about the manufacture of the stub modules (which are responsible for the interpretation of arguments and results of RPC calls) and our experiences with practical use of this facility.

1.简介

1.1背景

     远程过程调用(以下简称RPC)的思想非常简单。它是基于观察过程调用是一个众所周知和很好理解的机制,在一个运行在单台计算机上的程序中传递控制和数据。因此,我们建议将同样的机制加以扩展,以提供跨通信网络的控制和数据传输。当调用远程过程时,调用环境将被挂起,参数将通过网络传递到过程将要执行的环境(我们将其称为被调用者),所需的过程将在那里执行。当过程完成并产生结果时,结果被回传给调用环境,在那里执行就像从简单的单机调用返回一样。当调用环境挂起时,该计算机上的其他进程可能(可能)仍然执行(取决于该环境的并行性和RPC实现的细节)。

     这个想法有许多吸引人的方面。一个是简洁的语义:这些应该使构建分布式计算更容易,并使它们正确。另一个是效率:程序调用似乎足够简单,通信速度相当快。第三是通用性:在单机计算中,程序调用通常是算法各部分之间最重要的通信机制。

    RPC的思想已经存在很多年了。至少从1976年起,它就在公共文献中多次被讨论。Nelson的博士论文[13]对RPC系统的设计可能性进行了广泛的检查,并参考了之前关于RPC的许多工作。然而,全面的RPC实现比纸面上的设计更有价值。值得注意的是,最近的努力包括施乐公司的Courier协议[4]和麻省理工学院[10]协议。

    本文的研究成果来自于Cedar项目的RPC设施的建设。我们认为,由于早期的工作(特别是Nelson的论文和相关的实验),我们理解RPC设施的设计者必须做出的选择。我们的任务是根据我们特定的目标和环境作出选择。在实践中,我们发现有几个方面没有得到充分的理解,因此我们生成了一个系统,它的设计有几个新颖的方面。

       RPC设施的设计者面临的主要问题包括:在机器和通信故障出现时调用的精确语义;在(可能)没有共享地址空间的情况下,地址包含参数的语义;将远程调用集成到现有(或未来)编程系统中;绑定(调用方如何确定被调用方的位置和身份);调用者和被调用者之间传输数据和控制的合适协议;以及如何在开放的通信网络中提供数据完整性和安全性(如果需要的话)。在构建我们的RPC包时,我们解决了这些问题中的每一个,但是不可能在一篇论文中以适当的深度描述所有这些问题。本文包括对这些问题的讨论和我们关于它们的主要决策,并描述了我们的解决方案的总体结构。我们还详细描述了绑定机制和传输级通信协议。我们计划在后续的论文中描述我们基于加密的安全设施,并提供关于存根模块(负责解释RPC调用的参数和结果)的制造的更多信息,以及我们实际使用该设施的经验。

1.2 Environment

  The remote-procedure-call package we have built was developed primarily for use within the Cedar programming environment, communicating across the Xerox research internetwork. In building such a package, some characteristics of the environment inevitably have an impact on the design, so the environment is summarized here. 

  Cedar [6] is a large project concerned with developing a programming environment that is powerful and convenient for the building of experimental programs and systems. There is an emphasis on uniform, highly interactive user interfaces, and ease of construction and debugging of programs. Cedar is designed to be used on single-user workstations, although it is also used for the construction of servers (shared computers providing common services, accessible through the communication network).

  Most of the computers used for Cedar are Dorados [8]. The Dorado is a very powerful machine (e.g., a simple Algol-style call and return takes less than 10 microseconds). It is equipped with a 24-bit virtual address space (of 16-bit words) and an 80-megabyte disk. Think of a Dorado as having the power of an IBM 370/168 processor, dedicated to a single user.

  Communication between these computers is typically by means of a 3-megabitper-second Ethernet [11]. {Some computers are on a 10-megabit-per-second Ethernet [7].) Most of the computers running Cedar are on the same Ethernet, but some are on different Ethernets elsewhere in our research internetwork. The internetwork consists of a large number of 3-megabyte and 10-megabyte Ethernets (presently about 160) connected by leased telephone and satellite links (at data rates of between 4800 and 56000 bps). We envisage that our RPC communication will follow the pattern we have experienced with other protocols: most communication is on the local Ethernet (so the much lower data rates of the internet links are not an inconvenience to our users), and the Ethernets are not overloaded (we very rarely see offered loads above 40 percent of the capacity of an Ethernet, and 10 percent is typical).

   The PUP family of protocols [3] provides uniform access to any computer on this internetwork. Previous PUP protocols include simple unreliable (but highprobability) datagram service, and reliable flow-controlled byte streams. Between two computers on the same Ethernet, the lower level raw Ethernet packet format is available.

  Essentially all programming is in high-level languages. The dominant language is Mesa [12] (as modified for the purposes of Cedar), although Smalltalk and InterLisp are also used. There is no assembly language for Dorados.

1.2 环境    

我们所构建的远程过程调用包主要是为了在Cedar编程环境中使用而开发的,它可以通过施乐研究互联网络进行通信。在建设这样一个包,环境的一些特点不可避免的会对设计产生影响,所以在这里对环境进行总结。

    Cedar[6]是一个大型项目,致力于开发一个强大而方便的编程环境,用于构建实验程序和系统。它强调统一的、高度交互的用户界面,以及易于构建和调试的程序。雪松是被设计用来在单用户工作站使用的,虽然它也用于建造服务器(通过通信网提供共同服务的共用计算机)。

    大多数用于雪松的计算机是Dorados[8]。多拉多是一个非常强大的机器(例如,一个简单的algol式的调用和返回需要不到10微秒)。它配备了一个24位的虚拟地址空间(16位字)和一个80兆字节的磁盘。可以将Dorado想象为具有IBM 370/168处理器的能力,专用于单个用户。

     这些计算机之间的通信通常是通过每秒3兆比特的以太网[11]进行的。{有些计算机使用的是10mb / s的以太网[7]。)大多数运行Cedar的计算机都在同一个以太网上,但有些在我们研究的互联网的其他地方的不同以太网上。互联网由大量3兆字节和10兆字节的以太网络(目前约160个)组成,由租用的电话和卫星连接(数据速率在4800至56000比特/秒之间)。我们设想我们的RPC通信将遵循我们在其他协议中经历过的模式:大多数通信是在本地以太网上进行的(因此,互联网连接的数据速率要低得多,这对我们的用户来说并不是一种不便),而且以太网没有过载(我们很少看到负载超过以太网容量的40%,通常是10%)。

    PUP系列协议[3]提供了对该互联网上任何计算机的统一访问。以前的PUP协议包括简单的不可靠(但高概率)数据报服务和可靠的流控制字节流。在同一以太网上的两台计算机之间,可以使用较低级的原始以太网数据包格式

    基本上所有的编程都是用高级语言进行的。虽然也使用Smalltalk和InterLisp,但主流语言是Mesa[12](根据Cedar的目的进行了修改)。Dorados没有汇编语言。

 1.3 Aims

   The primary purpose of our RPC project was to make distributed computation easy. Previously, it was observed within our research community that the construction of communicating programs was a difficult task, undertaken only by members of a select group of communication experts. Even researchers with substantial systems experience found it difficult to acquire the specialized expertise required to build distributed systems with existing tools. This seemed undesirable. We have available to us a very large, very powerful communication network, numerous powerful computers, and an environment that makes building programs relatively easy. The existing communication mechanisms appeared to be a major factor constraining further development of distributed computing. Our hope is that by providing communication with almost as much ease as local procedure calls, people will be encouraged to build and experiment with distributed applications. RPC will, we hope, remove unnecessary difficulties, leaving only the fundamental difficulties of building distributed systems: timing, independent failure of components, and the coexistence of independent execution environments.

  We had two secondary aims that we hoped would support our purpose. We wanted to make RPC communication highly efficient (within, say, a factor of five beyond the necessary transmission times of the network). This seems important, lest communication become so expensive that application designers strenuously avoid it. The applications that might otherwise get developed would be distorted by their desire to avoid communicating. Additionally, we felt that it was important to make the semantics of the RPC package as powerful as possible, without loss of simplicity or efficiency. Otherwise, the gains of a single unified communication paradigm would be lost by requiring application programmers to build extra mechanisms on top of the RPC package. An important issue in design is resolving the tension between powerful semantics and efficiency.

  Our final major aim was to provide secure communication with RPC. None of the previously implemented protocols had any provision for protecting the data in transit on our networks. This was true even to the extent that passwords were transmitted as clear-text. Our belief was that research on the protocols and mechanisms for secure communication across an open network had reached a stage where it was reasonable and desirable for us to include this protection in our package. In addition, very few (if any) distributed systems had previously provided secure end-to-end communication, and it had never been applied to RPC, so the design might provide useful research insights.

1.3 目标    

我们RPC项目的主要目的是简化分布式计算。以前,在我们的研究团体中观察到,通信项目的构建是一项困难的任务,只能由一组精选的通信专家承担。即使是具有大量系统经验的研究人员也发现很难获得使用现有工具构建分布式系统所需的专业知识。这项任务貌似不被大多人接受。我们有一个非常大,非常强大的通信网络,大量强大的计算机,和一个环境,使构建程序相对容易。现有的通信机制似乎是限制分布式计算进一步发展的主要因素。我们希望通过提供几乎与本地过程调用一样容易的通信,鼓励人们构建和试验分布式应用程序。我们希望,RPC将消除不必要的困难,只留下构建分布式系统的基本困难:定时、组件的独立故障,以及独立执行环境的共存。 

    我们有两个次要目标,希望能支持我们的目的。我们希望使RPC通信高效(在必要的网络传输时间以外的5个因素)似乎很重要,以免通信变得如此昂贵,以至于应用程序设计人员极力避免它。否则,可能会开发的应用程序将会被避免通信的愿望所扭曲。此外,我们认为使RPC包的语义尽可能强大,而不损失简单性或效率是很重要的。否则,由于需要应用程序程序员在RPC包之上构建额外的机制,单一统一通信范式的好处就会丧失。设计中的一个重要问题是解决强大的语义和效率之间的矛盾。

   我们的最后一个主要目标是提供与RPC的安全通信。以前实施的协议都没有任何条款来保护我们网络上传输的数据。甚至在密码以明文形式传输时也是如此。我们的信念是,对跨开放网络的安全通信协议和机制的研究已经达到了一个阶段,在我们的包中包含这种保护是合理和可取的。此外,很少(如果有的话)分布式系统以前提供过安全的端到端通信,而且从未应用于RPC,因此该设计可能提供有用的研究见解。

1.4 Fundamental Decisions

   It is not an immediate consequence of our aims that we should use procedure calls as the paradigm for expressing control and data transfers. For example, message passing might be a plausible alternative. It is our belief that a choice between these alternatives would not make a major difference in the problems faced by this design, nor in the solutions adopted. The problems of reliable and efficient transmission of a message and of its possible reply are quite similar to the problems encountered for remote procedure calls. The problems of passing arguments and results, and of network security, are essentialy unchanged. The overriding consideration that made us choose procedure calls was that they were the major control and data transfer mechanism imbedded in our major language, Mesa.

   One might also consider using a more parallel paradigm for our communication, such as some form of remote fork. Since our language already includes a construct for forking parallel computations, we could have chosen this as the point at which to add communication semantics. Again, this would not have changed the major design problems significantly.

   We discarded the possibility of emulating some form of shared address space among the computers. Previous work has shown that with sufficient care moderate efficiency can be achieved in doing this [14]. We do not know whether an approach employing shared addresses is feasible, but two potentially major diffÉculties spring to mind: first, whether the representation of remote addresses can be integrated into our programming languages (and possibly the underlying machine architecture) without undue upheaval; second, whether acceptable efficiency can be achieved. For example, a host in the PUP internet is represented by a 16-bit address, so a naive implementation of a shared address space would extend the width of language addresses by 16-bits. On the other hand, it is possible that careful use of the address-mapping mechanisms of our virtual memory hardware could allow shared address space without changing the address width. Even on our 10 megabit Ethernets, the minimum average round trip time for a packet exchange is 120 microseconds [7], so the most likely way to approach this would be to use some form of paging system. In summary, a shared address space between participants in RPC might be feasible, but since we were not willing to undertake that research our subsequent design assumes the absence of shared addresses. Our intuition is that with our hardware the cost of a shared address space would exceed the additional benefits.

   A principle that we used several times in making design choices is that the semantics of remote procedure calls should be as close as possible to those of local (single-machine) procedure calls. This principle seems attractive as a way of ensuring that the RPC facility is easy to use, particularly for programmers familiar with single-machine use of our languages and packages. Violation of this principle seemed likely to lead us into the complexities that have made previous communication packages and protocols difficult to use. This principle has occasionally caused us to deviate from designs that would seem attractive to those more experienced in distributed computing. For example, we chose to have no time-out mechanism limiting the duration of a remote call (in the absence of machine or communication failures), whereas most communication packages consider this a worthwhile feature. Our argument is that local procedure calls have no time-out mechanism, and our languages include mechanisms to abort an activity as part of the parallel processing mechanism. Designing a new time-out arrangement just for RPC would needlessly complicate the programmer's world. Similarly, we chose the building semantics described below (based closely on the existing Cedar mechanisms) in preference to the ones presented in Nelson's thesis [13].

 1.4基本决策:

     我们应该使用过程调用作为表示控制和数据传输的范例,这并不是我们的目标的直接结果。例如,消息传递可能是一种可行的选择。我们相信,在这些备选方案中作出选择不会对这一设计所面临的问题,也不会对所采取的解决办法产生重大影响。消息的可靠和有效传输及其可能的应答的问题与远程过程调用遇到的问题非常相似。传递参数和结果的问题,以及网络安全问题,在本质上没有改变。让我们选择过程调用的主要考虑因素是,它们是嵌入在我们的主要语言Mesa中的主要控制和数据传输机制。

    还可以考虑使用更并行的通信范式,例如某种形式的远程分叉。因为我们的语言已经包含了一个用于分叉并行计算的构造,所以我们可以选择这个点作为添加通信语义的点。同样,这不会显著改变主要的设计问题。

     我们放弃了在计算机之间模拟某种形式的共享地址空间的可能性。以前的工作已经表明,只要足够小心,在这样做的[14]中可以达到中等的效率。我们不知道使用共享地址的方法是否可行,但可以想到两个潜在的主要diffÉculties:第一,是否可以将远程地址的表示集成到我们的编程语言(可能还有底层机器架构)中而不产生不必要的变化;第二,能否达到可接受的效率。例如,PUP互联网中的主机由16位地址表示,因此共享地址空间的简单实现将语言地址的宽度扩展16位。另一方面,仔细使用虚拟内存硬件的地址映射机制可能会在不改变地址宽度的情况下允许共享地址空间。即使在我们的10兆以太网上,一个包交换的最小平均往返时间是120微秒[7],所以最可能的方法是使用某种形式的分页系统。总之,RPC参与者之间的共享地址空间可能是可行的,但是由于我们不愿意进行这项研究,我们的后续设计假设没有共享地址。我们的直觉是,使用我们的硬件,共享地址空间的成本将超过额外的好处。

    在进行设计选择时,我们多次使用的原则是,远程过程调用的语义应该尽可能接近本地(单机)过程调用的语义。作为确保RPC工具易于使用的一种方法,这个原则似乎很有吸引力,特别是对于熟悉我们的语言和包的单机使用的程序员来说。违反这一原则似乎会使我们陷入之前的通信包和协议难以使用的复杂性。这个原则偶尔会导致我们偏离那些对分布式计算更有经验的人看起来有吸引力的设计。例如,我们选择不使用超时机制来限制远程调用的持续时间(在没有机器或通信故障的情况下),而大多数通信包认为这是一个有价值的特性。我们的论点是本地过程调用没有超时机制,并且我们的语言包含了中止活动的机制,这是并行处理机制的一部分。为RPC设计一个新的超时安排将不必要地使程序员的世界复杂化。类似地,我们选择了下面描述的建筑语义(基于现有的Cedar机制),而不是Nelson的论文[13]中提出的那些。

   

 1.5 Structure

The program structure we use for RPC is similar to that proposed in Nelson's thesis. It is based on the concept of stubs. When making a remote call, five pieces of program are involved: the user, the user-stub, the RPC communications package (known as RPCRuntime), the server-stub, and the server. Their relatidnship is shown in Figure 1. The user, the user-stub, and one instance of RPCRuntime execute in the caller machine; the server, the server-stub and another instance of RPCRuntime execute in the callee machine. When the user wishes to make a remote call, it actually makes a perfectly normal local call which invokes a corresponding procedure in the user-stub. The user-stub is responsible for placing a specification of the target procedure and the arguments into one or more packets and asking the RPCRuntime to transmit these reliably to the callee machine. On receipt of these packets, the RPCRuntime in the callee machine passes them to the server-stub. The server-stub unpacks them and again makes a perfectly normal local call, which invokes the appropriate procedure in the server. Meanwhile, the calling process in the caller machine is suspended awaiting a result packet. When the call in the server completes, it returns to the serverstub and the results are passed back to the suspended process in the caller machine. There they are unpacked and the user-stub returns them to the user. RPCRuntime is responsible for retransmissions, acknowledgments, packet routing, and encryption. Apart from the effects of multimachine binding and of machine or communication failures, the call happens just as if the user had  invoked the procedure in the server directly. Indeed, if the user and server code were brought into a single machine and bound directly together without the stubs, the program would still work. 

   RPCRuntime is a standard part of the Cedar system. The user and server are written as part of the distributed application. But the user-stub and server-stub are automatically generated, by a program called Lupine. This generation is specified by use of Mesa interface modules. These are the basis of the Mesa (and Cedar) separate compilation and binding mechanism [9]. An interface module is mainly a list of procedure names, together with the types of their arguments and results. This is sufficient information for the caller and callee to independently perform compile-time type checking and to generate appropriate calling sequences. A program module that implements procedures in an interface is said to export that interface. A program module calling procedures from an interface is said to import that interface. When writing a distributed application, a programmer first writes an interface module. Then he can write the user code that imports that interface and the server code that exports the interface. He also presents the interface to Lupine, which generates the user-stub, (that exports the interface) and the server-stub {that imports the interface). When binding the programs on the 'caller machine, the user is bound to the user-stub. On the callee machine, the server-stub is bound to the server.

   Thus, the programmer does not need to build detailed communication-related code. After designing the interface, he need only write the user and server code. Lupine is responsible for generating the code for packing and unpacking arguments and results (and other details of parameter/result semantics), and for dispatching to the correct procedure for an incoming call in the server-stub. RPCRuntime is responsible for packet-level communications. The programmer must avoid specifying arguments or results that are incompatible with the lack of shared address space. (Lupine checks this avoidance.) The programmer must also take steps to invoke the intermachine binding described in Section 2, and to handle reported machine or communication failures.

RPC论文翻译(Implementing Remote Procedure Calls)_第1张图片

                                                                   图一

1.5 架构

     我们为RPC使用的程序结构与Nelson的论文中提出的类似。它是基于存根的概念。在进行远程调用时,涉及到五个程序部分:用户、用户存根、RPC通信包(称为RPCRuntime)、服务器存根和服务器。它们的关系如图1所示。用户、用户存根和RPCRun- time的一个实例在调用者机器中执行;服务器、服务器存根和另一个RPCRuntime实例在被调用机器中执行。当用户希望进行远程调用时,它实际上进行了一个完全正常的本地调用,该调用 调用用户存根中相应的过程。用户存根负责将目标过程的规范和参数放入一个或多个包中,并要求RPCRuntime将这些可靠地传输到被调用机器。接收到这些数据包后,被调用机器中的RPCRuntime将它们传递给服务器存根。服务器存根将它们解包,并再次执行一个完全正常的本地调用,该调用调用服务器中的适当过程。同时,调用方机器中的调用进程挂起,等待结果包。当服务器中的调用完成时,它返回到服务器存根,并将结果传递回调用者机器中挂起的进程。在那里,它们被解包,用户存根将它们返回给用户。RPCRuntime负责重传、确认、包路由和加密。除了多机绑定和机器或通信故障的影响之外,调用的发生就像用户直接调用远端程序一样。

    实际上,如果把用户和服务器代码放在一台机器上,直接绑定在一起,而不带存根,程序仍然可以工作。

    RPCRuntime是Cedar系统的一个标准部分。用户和服务器是作为分布式应用程序的一部分编写的。但是用户存根和服务器存根是由一个叫做Lupine的程序自动生成的。这一代是由Mesa接口模块指定的。这些是Mesa(和Cedar)独立编译和绑定机制[9]的基础。接口模块主要是过程名的列表,以及它们的参数和结果的类型。对于调用方和被调用方来说,这是足够的信息,可以独立地执行编译时类型检查并生成适当的调用序列。在接口中实现过程的程序模块被称为导出该接口。从接口调用过程的程序模块被称为导入该接口。在编写分布式应用程序时,程序员首先编写接口模块。然后他可以编写导入该接口的用户代码和导出该接口的服务器代码。他还将接口呈现给Lupine,后者生成用户存根(导出接口)和服务器存根(导入接口)。当在调用者机器上绑定程序时,用户被绑定到用户存根上。在被调用的机器上,服务器存根被绑定到服务器。

    因此,程序员不需要构建详细的通信相关代码。在设计完界面后,他只需要编写用户和服务器代码。Lupine负责生成打包和解打包参数和结果(以及参数/结果语义的其他细节)的代码,并将服务器存根中的传入调用分派到正确的过程。RPCRuntime负责包级通信。程序员必须避免指定与缺少共享地址空间不兼容的参数或结果。(Lupine检查这种回避。)程序员还必须采取步骤调用第2节中描述的机器间绑定,并处理报告的机器或通信故障。

2. BINDING

There are two aspects to binding which we consider in turn. First, how does a client of the binding mechanism specify what he wants to be bound to? Second,how does a caller determine the machine address of the callee and specify to the callee the procedure to be invoked? The first is primarily a question of naming and the second a question of location.

2、绑定: 

我们依次考虑了约束的两个方面。首先,绑定机制的客户端如何指定他想要绑定到什么?

其次,调用者如何确定被调用者的机器地址,并向被调用者指定要调用的过程?第一个主要是命名问题,第二个是地点问题。 

2.1 Naming

   The binding operation offered by our RPC package is to bind an importer of an interface to an exporter of an interface. After binding, calls made by the importer invoke procedures implemented by the (remote) exporter. There are two parts to the name of an interface: the type and the instance. The type is intended to specify, at some level of abstraction, which interface the caller expects the callee to implement. The instance is intended to specify which particular implementor of an abstract interface is desired. For example, the type of an interface might correspond to the abstraction of "mail server," and the instance would correspond to some particular mail server selected from many. A reasonable default for the type of an interface might be a name derived from the name of the Mesa interface module. Fundamentally, the semantics of an interface name are not dictated by the RPC package--they are an agreement between the exporter and the importer, not fully enforceable by the RPC package. However, the means by which an exporter uses the interface name to locate an exporter are dictated by the RPC package, and these we now describe.

 2.1命名:

我们的RPC包提供的绑定操作是将接口的导入器绑定到接口的导出器。绑定之后,导入器的调用调用(远程)导出器实现的过程。接口的名称由两部分组成:类型和实例。类型的目的是在某种抽象级别上指定调用方希望被调用方实现哪个接口。该实例用于指定需要抽象接口的哪个特定实现者。例如,接口的类型可能对应于“邮件服务器”的抽象,而实例则对应于从许多邮件服务器中选择的某个特定邮件服务器。接口类型的合理默认值可能是派生自Mesa接口模块名称的名称。从根本上说,接口名称的语义不是由RPC包指定的——它们是导出者和导入者之间的协议,不是由RPC包完全强制执行的。然而,导出程序使用接口名来定位导出程序的方法是由RPC包决定的,我们现在描述这些方法。

2.2 Locating an Appropriate Exporter

    We use the Grapevine distributed database [1] for our RPC binding. The major attraction of using Grapevine is that it is widely and reliably available. Grapevine is distributed across multiple servers strategically located in our internet topology, and is configured to maintain at least three copies of each database entry. Since the Grapevine servers themselves are highly reliable and the data is replicated, it is extremely rare for us to be unable to look up a database entry. There are alternatives to using such a database, but we find them unsatisfactory. For example, we could include in our application programs the network addresses of the machine with which they wish to communicate: this would bind to a particular machine much too early for most applications. Alternatively, we could use some form of broadcast protocol to locate the desired machine: this would sometimes be acceptable, but as a general mechanism would cause too much interference with innocent bystanders, and would not be convenient for binding to machines not on the same local network.

    Grapevine's database consists of a set of entries, each keyed by a character string known as a Grapevine RName. There are two varieties of entries: individua/s and groups. Grapevine keeps several items of information for each database entry, but the RPC package is concerned with only two: for each individual there is a connect-site, which is a network address, and for each group there is a member-list, which is a list of RNames. The RPC package maintains two entries in the Grapevine database for each interface name: one for each type and one for each instance; so the type and instance are both Grapevine RNames. The database entry for the instance is a Grapevine individual whose connect-site is a network address, specifically, the network address of the machine on which that instance was last exported. The database entry for the type is a Grapevine group whose members are the Grapevine RNames of the instances of that type which have been exported. For example, if the remote interface with type FileAccess.Alpine and instance Ebbets.Alpine has been exported by a server running at network address 3#22#, and the remote interface with type FileAccess.Alpine and instance Luther.Alpine has been exported by a server running at network address 3#276#, then the members of the Grapevine group FileAccess.Alpine would include Ebbets.Alpine and Luther.Alpine. The Grapevine individual Ebbets. Alpine would have 3#22# as its connect-site and Luther.Alpine would have 3#276#.

   When an exporter wishes to make his interface available to remote clients, the server code calls the server-stub which in turn calls a procedure, Exportlnterface, in the RPCRuntime. Exportlnterface is given the interface name (type and instance) together with a procedure (known as the dispatcher) implemented in the server-stub which will handle incoming calls for the interface. Exportlnterface calls Grapevine and ensures that the instance is one of the members of the Grapevine group which is the type, and that the connect-site of (the Grapevine individual which is) the instance is the network address of the exporting machine. This may involve updating the database. As an optimization, the database is not updated if it already contains the correct information--this is usually true: typically an interface of this name has previously been exported, and typically from the same network address. For example, to export the interface with type FileAccess.Alpine and instance Ebbets.Alpine from network address 3#22#, the RPCRuntime would ensure that Ebbets.Alpine in the Grapevine database has connect-site 3#22# and that Ebbets.Alpine is a member of FileAccess.Alpine. The RPCRuntime then records information about this export in a table maintained on the exporting machine. For each currently exported interface, this table contains the interface name, the dispatcher procedure from the server-stub, and a 32-bit value that serves as a permanently unique (machine-relative) identifier of the export. This table is~implemented as an array indexed by a small integer. The identifier is guaranteed to be permanently unique by the use of successive values of a 32-bit counter; on start-up this counter is initialized to a one-second real time clock, and the counter is constrained subsequently to be less than the current value of that clock. This constrains the rate of calls on Exportlnterface in a single machine tO an average rate of less than one per second, averaged over the time since the exporting machine was restarted. The burst rate of such calls can exceed one per second (see Figure 2).

   When an importer wishes to bind to an exporter, the user code calls its userstub which in turn calls a procedure, Importlnterface, in the RPCRuntime, giving it the desired interface type and instance. The RPCRuntime determines the network address of the exporter (if there is one) by asking Grapevine for the network address which is the connect-site of the interface instance. The RPCRuntime then makes a remote procedure call to the RPCRuntimepackage on that machine asking for the binding information associated with this interface type and instance. If the specified machine is not currently exporting that interface this fact is returned to the importing machine and the binding fails. If the specified machine is currently exporting that interface, then the table of current exports maintained by its RPCRuntime yields the corresponding unique identifier; the identifier and the table index are returned to the importing machine and the binding succeeds. The exporter network address, identifier, and table index are remembered by the user-stub for use in remote calls.

    Subsequently, when that user-stub is making a call on the imported remote interface, the call packet it manufactures contains the unique identifier and table index of the desired interface, and the entry point number of the desired procedure relative to the interface. When the RPCRuntime on the callee machine receives a new call packet it uses the index to look up its table of current exports (efficiently), verifies that the unique identifier in the packet matches that in the table, and passes the call packet to the dispatcher procedure specified in the table.

   There are several variants of this binding scheme available to our clients. If the importer calling Importlnterface specifies only the interface type but no instance, the RPCRuntime obtains from Grapevine the members of the Grapevine group named by the type. The RPCRuntime then obtains the network address for each of those Grapevine individuals, and tries the addresses in turn to find some instance that will accept the binding request: this is done efficiently, and in an order which tends to locate the closest (most responsive) running exporter. This allows an importer to become bound to the closest running instance of a replicated service, where the importer does not care which instance. Of course, an importer is free to enumerate the instances himself, by enumerating the members of the group named by the type.

    The instance may be a network address constant instead of a Grapevine name. This would allow the importer to bind to the exporter without any interaction with Grapevine, at the cost of including an explicit address in the application programs.

RPC论文翻译(Implementing Remote Procedure Calls)_第2张图片

 2.2 查询对应的导出接口

     我们使用Grapevine分布式数据库[1]来进行RPC绑定。使用Grapevine的主要吸引力在于它广泛且可靠。Grapevine分布在我们的internet拓扑结构中的多个服务器上,并被配置为维护每个数据库条目的至少三个副本。由于Grapevine服务器本身是高度可靠的,而且数据是复制的,所以我们很少会无法查找数据库条目。除了使用这样的数据库,还有其他选择,但我们认为它们不令人满意。例如,我们可以在应用程序中包含它们希望与之通信的机器的网络地址:对于大多数应用程序来说,这将过早地绑定到特定的机器。或者,我们可以使用某种形式的广播协议来定位所需的机器:这有时是可以接受的,但作为一种通用机制,会对无辜的旁观者造成太多干扰,并且不方便绑定到不在同一本地网络上的机器。

     Grapevine的数据库由一组条目组成,每个条目由一个名为Grapevine RName的字符串作为关键字。有两种类型的条目:individ(个体类型)和groups(组类型)。Grapevine为每个数据库条目保留了几项信息,但是RPC包只关心两项:对于每个个体,有一个连接站点,这是一个网络地址,对于每个组,有一个成员列表,这是一个rname列表。RPC包在Grapevine数据库中为每个接口名维护两个条目:一个用于每个类型,一个用于每个实例;所以类型和实例都是Grapevine rname。该实例的数据库条目是一个Grapevine个体,其连接站点是一个网络地址,具体来说,就是最后导出该实例所在机器的网络地址类型的数据库条目,Grapevine组类型的成员是已经被导出的Grapevine RNames类型实例,。

   例如,如果类型为FileAccess的远程接口。阿尔卑斯和埃贝茨。Alpine已经由运行在网络地址3#22#和类型为FileAccess的远程接口的服务器导出。Alpine和instance Luther。Alpine已经由运行在网络地址3#276#的服务器导出,然后是Grapevine组FileAccess的成员。阿尔卑斯山将包括埃贝茨。高山和Luther.Alpine。葡萄藤个人的凋落。阿尔卑斯的连接点是3#22#和路德。阿尔卑斯有3#276#。

   当导出器希望使其接口对远程客户端可用时,服务器代码调用服务器存根,而服务器存根又在RPCRuntime中调用一个过程exportinterface。exportinterface被赋予接口名称(类型和实例)以及在服务器存根中实现的过程(称为分派器),该过程将处理接口的传入调用。exportinterface调用Grapevine,并确保该实例是该类型的Grapevine组的成员之一,并且该实例的连接站点(该Grapevine个体)是导出机器的网络地址。这可能涉及到更新数据库。作为一种优化,如果数据库已经包含正确的信息,则不会更新它——这通常是正确的:通常之前已经导出了同名的接口,而且通常是从相同的网络地址导出的。例如,导出类型为FileAccess的接口。阿尔卑斯和埃贝茨。从网络地址3#22#的Alpine, RPCRuntime将确保Ebbets。Alpine在Grapevine数据库中有连接站点3#22#和Ebbets。Alpine是FileAccess.Alpine的成员。然后,RPCRuntime将有关此导出的信息记录在导出机器上主维护的表中。对于当前导出的每个接口,该表包含接口名称、来自服务器存根的调度程序过程和一个32位值,该值作为导出的永久唯一(机器相对)标识符。该表以一个小整数索引的数组实现。使用32位计数器的连续值保证标识符永久唯一;在启动时,该计数器被初始化为一个1秒的实时时钟,随后计数器被限制为小于该时钟的当前值。这将在一台机器上对exportinterface的调用速率限制为小于每秒一个的平均速率,这是自导出机器重新启动以来的平均速率。这种呼叫的突发率可能超过每秒一个(参见图2)。

   当导入器希望绑定到导出器时,用户代码调用它的用户存根,然后调用RPCRuntime中的过程importinterface,给它所需的接口类型和实例。RPCRuntime通过向Grapevine请求接口实例的连接站点的网络地址来确定导出器的网络地址(如果有的话)。然后,RPCRuntime对该机器上的RPCRuntimepackage进行远程过程调用,请求与该接口类型和实例关联的绑定信息。如果指定的机器当前没有导出该接口,则将此事实返回到导入机器,绑定失败。如果指定的机器当前正在导出该接口,那么由其RPCRuntime维护的当前导出表将产生相应的惟一标识符;将标识符和表索引返回给导入机器并绑定成功。导出器网络地址、标识符和表索引由用户存根记住,以便在远程调用中使用。

     随后,当用户存根对导入的远程接口进行调用时,它制造的调用包包含所需接口的惟一标识符和表索引,以及所需过程相对于该接口的入口点编号。当RPCRuntime被机器上收到一个新的呼叫包它使用索引来查找表当前的出口(有效),验证数据包的惟一标识符匹配表中,并将调用包传递到调度过程中指定的表。

    我们的客户端可以使用这种绑定方案的几种变体。如果调用importinterface的导入器只指定接口类型而没有指定实例,则RPCRuntime从Grapevine获得以该类型命名的Grape- vine组的成员。然后RPCRuntime获取每个Grapevine个体的网络地址,并依次尝试这些地址,以找到接受绑定请求的实例:这是非常有效的并且按照一个倾向于定位最近(响应最快)的运行出口商的顺序。这允许导入器绑定到最近的正在运行的复制服务实例,而导入器并不关心哪个实例。当然,导入器可以通过枚举由类型命名的组的成员来自由枚举实例本身。

     该实例可以是一个网络地址常量而不是一个Grapevine名称。这将允许导入器绑定到导出器,而不需要与Grapevine进行任何交互,代价是在应用程序中包含一个显式地址。

2.3 Discussion

 There are some important effects of this scheme. Notice that importing an interface has no effect on the data structures in the exporting machine; this is advantageous when building servers that may have hundreds of users, and avoids problems regarding what the server should do about this information in relation to subsequent importer crashes. Also, use of the unique identifier scheme means that bindings are implicitly broken if the exporter crashes and restarts (since the currency of the identifier is checked on each call). We believe that this implicit unbinding is the correct semantics: otherwise a user will not be notified of a crash happening between calls. Finally, note that this scheme allows calls to be made only on procedures that have been explicitly exported through the RPC mechanism. An alternate, slightly more efficient scheme would be to issue importers with the exporter's internal representation of the server-stub dispatcher procedure; this we considered undesirable since it would allow unchecked access to almost any procedure in the server machine and, therefore, would make it impossible to enforce any protection or security schemes.

    The access controls that restrict updates to the Grapevine database have the effect of restricting the set of users who will be able to export particular interface names. These are the desired semantics: it should not be possible, for example, for a random user to claim that his workstation is a mail server and to thereby be able to intercept my message traffic. In the case of a replicated service, this access control effect is critical. A client of a replicated service may not know a priori the names of the instances of the service. If the client wishes to use twoway authentication to get the assurance that the service is genuine, and if we wish to avoid using a single password for identifying every instance of the service, then the client must be able to securely obtain the list of names of the instances of the service. We can achieve this security by employing a secure protocol when the client interacts with Grapevine as the interface is being imported. Thus Grapevine's access controls provide the client's assurance that an instance of the service is genuine (authorized).

   We have allowed several choices for binding time. The most flexible is where the importer specifies only the type of the interface and not its instance: here the decision about the interface instance is made dynamically. Next (and most common) is where the interface instance is an RName, delaying the choice of a particular exporting machine. Most restrictive is the facility to specify a network address as an instance, thus binding it to a particular machine at compile time. We also provide facilities allowing an importer to dynamically instantiate interfaces and to import them. A detailed description of how this is done would be too complicated for this paper, but in summary it allows an importer to bind his program to several exporting machines, even when the importer cannot know statically how many machines he wishes to bind to. This has proved to be useful in some open-ended multimachine algorithms, such as implementing the manager of a distributed atomic transaction. We have not allowed binding at a finer grain than an entire interface. This was not an option we considered, in light of inutility of this mechanism in the packages and systems we have observed.

 2.3一些讨论:

    这个方案有一些重要的效果。注意,导入接口对导出机器中的数据结构没有影响;当构建可能有数百个用户的服务器时,这是非常有利的,并且避免了服务器应该如何处理与随后导入程序崩溃相关的信息的问题。此外,使用惟一标识符模式意味着,如果导出程序崩溃和重新启动,绑定将隐式中断(因为标识符的货币在每次调用时都要检查)。我们相信这种隐式解绑定是正确的语义:否则用户将不会收到调用之间发生崩溃的通知。最后,请注意,该模式只允许对通过RPC机制显式导出的过程进行调用。另一种稍微更有效的方案是向导入者发送服务器存根分发程序的导出者的内部表示;我们认为这是不可取的,因为它将允许对服务器机器中的几乎所有过程进行未经检查的访问,因此,将不可能执行任何保护或安全方案。

    限制对Grapevine数据库的更新的访问控制具有限制能够导出特定接口名称的用户集的效果。这些是需要的语义:例如,不应该让一个随机用户声称他的工作站是邮件服务器,从而能够拦截我的消息流。对于复制的服务,这种访问控制效果是至关重要的。复制服务的客户端可能事先不知道服务实例的名称。如果客户希望使用两个身份验证来保证服务是真诚的,如果我们希望避免使用一个密码识别服务的每一个实例,那么客户端必须能够安全地获取服务的实例的名称列表。我们可以通过在导入接口时客户端与Grapevine交互时使用安全协议来实现这种安全性。因此,Grapevine的访问控制为客户端提供了服务实例是真实的(授权的)保证。

     我们允许几种绑定时间的选择。最灵活的是导入器只指定接口的类型而不指定它的实例:在这里,关于接口实例的决定是动态做出的。接下来(也是最常见的)是接口实例是一个RName,这将延迟选择特定的导出机器。最严格的限制是将网络地址指定为实例,从而在编译时将其绑定到特定的机器。我们还提供了允许导入器动态实例化接口并导入它们的工具。详细说明这是如何做到的对该论文来说内容过多。但总而言之,它允许导入器将其程序绑定到多个导出机器,即使导入器无法静态地知道他希望绑定到多少台机器。这在一些开放的多机算法中被证明是有用的,例如实现分布式原子事务的管理器。我们不允许在比整个界面更细的晶粒上结合。这不是我们考虑的选项,因为在我们观察到的包和系统中这种机制没有用处。

    

3. PACKET-LEVEL TRANSPORT PROTOCOL

3.1 Requirements

The semantics of RPCs can be achieved without designing a specialized packetlevel protocol. For example, we could have built our package using the PUP byte stream protocol (or the Xerox NS sequenced packet protocol) as our transport layer. Some of our previous experiments [13] were made using PUP byte streams, and the Xerox NS "Courier" RPC protocol [4] uses the NS sequenced packet protocol. Grapevine protocols are essentially similar to remote procedure calls, and use PUP byte streams. Our measurements [13] and experience with each of these implementations convinced us that this approach was unsatisfactory. The particular nature of RPC communication means that there are substantial performance gains available if one designs and implements a transport protocol specially for RPC. Our experiments indicated that a performance gain of a factor of ten might be possible.

    An intermediate stance might be tenable: we have never tried the experiment of using an existing transport protocol and building an implementation of it specialized for RPC. However, the request-response nature of communication with RPC is sufficiently unlike the large data transfers for which bytes streams are usually employed that we do not believe this intermediate position to be tenable.

   One aim we emphasized in our protocol design was minimizing the elapsed real-time between initiating a call and getting results. With protocols for bulk data transfer this is not important: most of the time is spent actually transferring the data. We also strove to minimize the load imposed on a server by substantial numbers of users. When performing bulk data transfers, it is acceptable to adopt schemes that lead to a large cost for setting up and taking down connections, and that require maintenance of substantial state information during a connection. These are acceptable because the costs are likely to be small relative to the data transfer itself. This, we believe, is untrue for RPC. We envisage our machines being able to serve substantial numbers of clients, and it would be unacceptable to require either a large amount of state information or expensive connection handshaking.

   It is this level of the RPC package that defines the semantics and the guarantees we give for calls. We guarantee that if the call returns to the user then the procedure in the server has been invoked precisely once. Otherwise, an exception is reported to the user and the procedure will have been invoked either once or not at all--the user is not told which. If an exception is reported, the user does not know whether the server has crashed or whether there is a problem in the communication network. Provided the RPCRuntime on the server machine is still responding, there is no upper bound on how long we will wait for results; that is, we will abort a call if there is a communication breakdown or a crash but not if the server code deadlocks or loops. This is identical to the semantics of local procedure calls.

 3.包层面的传输协议

 3.1必要要求

   rpc的语义可以在不设计专门的包级协议的情况下实现。例如,我们可以使用PUP字节流协议(或Xerox NS序列包协议)作为传输层来构建包。我们之前的一些实验[13]使用PUP字节流,而施乐的NS“Courier”RPC协议[4]使用NS序列包协议。Grapevine协议本质上类似于远程过程调用,并且使用PUP字节流。我们的测量[13]和这些实现的经验使我们确信这种方法是不令人满意的。RPC通信的特殊性质意味着,如果设计和实现一个专门用于RPC的传输协议,就可以获得大量的性能增益。我们的实验表明,性能增益的十倍是可能的。

   中间立场可能是站得住的:我们从来没有尝试过使用现有的传输协议并为RPC专门构建它的实现。然而,使用RPC进行通信的请求-响应特性与通常使用字节流的大型数据传输完全不同,因此我们认为这种中间位置是站不住脚的。

    我们在协议设计中强调的一个目标是,尽量减少发起调用和获得结果之间的实时时间消耗。对于用于批量数据传输的协议,这并不重要:大部分时间都花在实际传输数据上。我们还努力将大量用户对服务器施加的负载降到最低。在执行批量数据传输时,可以采用一些方案,这些方案在建立和关闭连接时开销很大,并且在连接期间需要维护大量的状态信息。这是可以接受的,因为成本相对于数据传输本身来说可能很小。我们相信,这对于RPC是不真实的。我们设想我们的机器能够服务大量的客户端,而需要大量的状态信息或昂贵的连接握手都是不可接受的。

    正是这一层的RPC包定义了我们为调用提供的语义和保证。我们保证,如果调用返回给用户,则服务器中的过程被精确地调用了一次。否则,将向用户报告一个异常,并且过程将被调用一次或根本没有调用——用户不会被告知是哪一个。如果报告异常,则用户不知道服务器是否崩溃,或者通信网络是否存在问题。服务器机器上的RPCRuntime是仍旧在响应,我们等待结果的时间没有上限;也就是说,如果出现通信故障或崩溃,我们将中止调用,但如果服务器代码死锁或循环,则不会。这与本地过程调用的语义相同。

3.2 Simple Calls

We have tried to make the per call communication particularly efficient for the situation where all of the arguments will fit in a single packet buffer, as will all of the results, and where frequent calls are being made. To make a call, the caller sends a call packet containing a call identifier (discussed below), data specifying the desired procedure (as described in connection with binding), and the arguments. When the callee machine receives this packet the appropriate procedure is invoked. When the procedure returns, a result packet containing the same call identifier, and the results, is sent back to the caller.

   The machine that transmits a packet is responsible for retransmitting it until an acknowledgment is received, in order to compensate for lost packets. However, the result of a call is sufficient acknowledgment that the call packet was received, and a call packet is sufficient to acknowledge the result packet of the previous call made by that process. Thus in a situation where the duration of a call and the interval between calls are each less than the transmission interval, we transmit precisely two packets per ~all (one in each direction). If the call lasts longer or there is a longer interval between calls, up to two additional packets may be sent (the retransmission and an explicit acknowledgment packet); we believe this to be acceptable because in those situations it is clear that communication costs are no longer the limiting factor on performance.

    The call identifier serves two purposes. It allows the caller to determine that the result packet is truly the result of his current call (not, for example, a much delayed result of some previous call), and it allows the callee to eliminate duplicate call packets (caused by retransmissions, for example). The call identifier consists of the calling machine identifier (which is permanent and globally unique), a machine-relative identifier of the calling process, and a sequence number. We term the pair [machine identifier, process] an activity. The important property of an activity is that each activity has at most one outstanding remote call at any time--it will not initiate a new call until it has received the results of the preceding call. The call sequence number must be monotonic for each activity (but not necessarily sequential). The RPCRuntime on a callee machine maintains a table giving the sequence number of the last call invoked by each calling activity. When a call packet is received, its call identifier is looked up in this table. The call packet can be discarded as a duplicate (possibly after acknowledgment) unless its sequence number is greater than that given in this table. Figure 3 shows the packets transmitted in simple calls.

    It is interesting to compare this arrangement with connection establishment, maintenance and termination in more heavyweight transport protocols. In our protocol, we think of a connection as the shared state information between an activity on a calling machine and the RPCRuntime package on the server machine accepting calls from that activity. We require no special connection establishment protocol (compared with the two-packet handshake required in many other protocols); receipt of a call packet from a previously unknown activity is sufficient to create the connection implicitly. When the connection is active (when there is a call being handled, or when the last result packet of the call has not yet been acknowledged), both ends maintain significant amounts of state information. However, when the connection is idle the only state information in the server machine is the entry in its table of sequence numbers. A caller has minimal state information when a connection is idle: a single machine-wide counter is sufficient. When initiating a new call, its sequence number is just the next value of this counter. This is why sequence numbers in the calls from an activity are required only to be monotonic, not sequential. When a connection is idle, no process in either machine is concerned with the connection. No communications (such as "pinging" packet exchanges) are required to maintain idle connections. We have no explicit connection termination protocol. If a connection is idle, the server machine may discard its state information after an interval, when there is no longer any danger of receiving retransmitted call packets (say, after five minutes), and it can do so without interacting with the caller machine. This scheme provides the guarantees of traditional connection-oriented protocols without the costs. Note, however, that we rely on the unique identifier we introduced when doing remote binding. Without this identifier we would be unable to detect duplicates if a server crashed and then restarted while a caller was still retransmitting a call packet (not very likely, but just plausible). We are also assuming that the call sequence number from an activity does not repeat even if the calling machine is restarted (otherwise a call from the restarted machine might be eliminated as a duplicate). In practice, we achieve this as a side effect of a 32-bit conversation identifier which we use in connection with secure calls. For nonsecure calls, a conversation identifier may be thought of as a permanently unique identifier which distinguishes incarnations of a calling machine. The conversation identifier is passed with the call sequence number on every call. We generate conversation identifiers based on a 32-bit clock maintained by every machine (initialized from network time servers when a machine restarts).

   From experience with previous systems, we anticipate that this light-weight connection management will be important in building large and busy distributed systems.

RPC论文翻译(Implementing Remote Procedure Calls)_第3张图片

 3.2 简单调用

   我们已经尝试使每个呼叫通信特别有效的情况下,所有的参数将适合在一个单一的包缓冲区,作为所有的结果,并在频繁的呼叫正在进行。要进行调用,调用方发送一个包含调用标识符(将在下面讨论)、指定所需过程的数据(与绑定相关的描述)和参数的调用包。当被调用机器收到此数据包时,将调用适当的过程。当过程返回时,将包含相同调用标识符的结果包和结果发送给调用者。

    传送数据包的机器负责重新传送数据包,直到收到确认信息,以补偿丢失的数据包。然而,调用的结果是对调用包已接收的充分确认,而调用包也足以确认由该进程发出的前一个调用的结果包。因此,在一个调用的持续时间和调用之间的间隔都小于传输间隔的情况下,我们精确地每~all传输两个包(每个方向一个)。如果呼叫持续时间较长或呼叫之间的间隔较长,最多可以发送两个额外的包(重传和一个明确的确认包);我们认为这是可以接受的,因为在这些情况下,通信成本显然不再是性能的限制因素。

   调用标识符有两个用途。它允许调用者确定结果包确实是他当前调用的结果(而不是之前某个调用的延迟很久的结果),它还允许被调用者消除重复的调用包(例如,由重传输引起的)。调用标识符由调用机器标识符(它是永久的和全局唯一的)、调用进程的机器相对标识符和序列号组成。我们称这对[机器标识符,进程]为活动。活动的重要属性是每个活动在任何时候最多有一个未完成的远程调用——它不会发起一个新的调用,直到它收到了前一个调用的结果。对于每个活动,调用序号必须是单调的(但不一定是连续的)。被调用机器上的RPCRuntime维护一个表,该表给出每个调用活动所调用的最后一次调用的序列号。当接收到一个呼叫包时,将在该表中查找它的呼叫标识符。调用包可以被丢弃为一个副本(可能在确认之后),除非它的序号大于表中给出的序号。图3显示了在简单调用中传输的数据包。

   将这种安排与更重量级的传输协议中的连接建立、维护和终止进行比较是很有趣的。在我们的协议中,我们将连接视为调用机器上的活动和服务器机器上接受该活动调用的RPCRuntime包之间的共享状态信息。我们不需要特殊的连接建立协议。(与许多其他协议中要求的双包握手相比)接收来自先前未知活动的调用包就足以隐式地创建连接。当连接处于活动状态时(当有一个调用正在被处理时,或者当调用的最后一个结果包还没有被确认时),两端都会维护大量的状态信息。然而,当连接处于空闲状态时,服务器机器中唯一的状态信息是其序列号表中的条目。当连接空闲时,调用者只有最小的状态信息:单个机器范围的计数器就足够了。当发起一个新的调用时,它的序号就是这个计数器的下一个值。这就是为什么来自活动的调用中的序列号只需要是单调的,而不是连续的。当连接空闲时,任何一台机器上的进程都与该连接无关。不需要通信(如“ping”包交换)来维持空闲连接。我们没有明确的连接终止协议。如果连接是空闲的,服务器机器可能会在一段时间后丢弃它的状态信息,此时不再有接收重新传输的调用包的危险(比如,五分钟后),而且它可以在不与调用机器交互的情况下这样做。该方案提供了传统的面向连接协议的保证,且没有成本。但是请注意,我们依赖于在进行远程绑定时引入的惟一标识符。如果没有这个标识符,我们将无法检测重复,如果服务器崩溃,然后重新启动,而调用者仍在发送一个调用包(不太可能,但只是合理的)。我们还假设一个活动的调用序列号不会重复,即使调用机器重新启动(否则,从重新启动的机器发出的调用可能会作为重复而被消除)。在实践中,这是我们在连接安全调用时使用的32位会话标识符的副作用。对于非安全呼叫,会话标识符可以被认为是一个永久唯一的标识符,用来区分呼叫机器的不同版本。在每次呼叫中,通话标识符与呼叫序列号一起传递。我们根据每台机器维护的32位时钟生成会话标识符(当机器重新启动时从网络时间服务器初始化)。

    根据以往系统的经验,我们预计这种轻量级连接管理在构建大型、繁忙的分布式系统中将非常重要。

   

 3.3 Complicated Calls

    As mentioned above, the transmitter of a packet is responsible for retransmitting it until it is acknowledged. In doing so, the packet is modified to request an explicit acknowledgment. This handles lost packets, long duration calls, and long gaps between calls. When the caller is satisfied with its acknowledgments, the caller process waits for the result packet. While waiting, however, the caller periodically sends a probe packet to the callee, which the callee is expected to acknowledge. This allows the caller to notice if the callee has crashed or if there is some serious communication failure, and to notify the user of an exception. Provided these probes continue to be acknowledged the caller will wait indefinitely, happy in the knowledge that the callee is (or claims to be) working on the call. In our implementation the first of these probes is issued after a delay of slightly more than the approximate round-trip time between the machines. The interval between probes increases gradually, until, after about 10 minutes, the probes are being sent once every five minutes. Each probe is subject to retransmission strategies similar to those used for other packets of the call. So if there is a communication failure, the caller will be told about it fairly soon, relative to the total time the caller has been waiting for the result of the call. Note that this will only detect failures in the communication levels: it will not detect if the callee has deadlocked while working on the call. This is in keeping with our principle of making RPC semantics similar to local procedure call semantics. We have language facilities available for watching a process and aborting it if this seems appropriate; these facilities are just as suitable for a process waiting on a remote call.

   A possible alternative strategy for retransn~ssions and acknowledgments is to have the recipient of a packet spontaneously generate an acknowledgment if he doesn't generate the next packet significantly sooner than the expected retransmission interval. This would save the retransmission of a packet when dealing with long duration calls or large gaps between calls. We decided that saving this packet was not a large enough gain to merit the extra cost of detecting that the spontaneous acknowledgment was needed. In our implementation this extra cost would be in the form of maintaining an additional data structure to enable an extra process in the server to generate the spontaneous acknowledgment, ~ when appropriate, plus the computational cost of the extra process deciding when to generate the acknowledgment. In particular, it would be difficult to avoid incurring extra cost when the acknowledgment is not needed. There is no analogous extra cost to the caller, since the caller necessarily has a retransmission algorithm in case the call packet is lost.

   If the arguments (or results) are too large to fit in a single packet, they are sent in multiple packets with each but the last requesting explicit acknowledgment. Thus when transmitting a large call argument packets are sent alternately by the caller and callee, with the caller sending data packets and the callee responding with acknowledgments. This allows the implementation to use only one packet buffer at each end for the call, and avoids the necessity of including the buffering and flow control strategies found in normal-bulk data transfer protocols. To permit duplicate elimination, these multiple data packets within a call each has a call-relative sequence number. Figure 4 shows the packet sequences for complicated calls.

RPC论文翻译(Implementing Remote Procedure Calls)_第4张图片

 3.3复杂调用:

     如上所述,数据包的发送方负责重新发送它,直到它被确认。在这样做时,数据包被修改以请求一个明确的确认。这可以处理丢失的数据包、长时间调用和调用之间的长间隔。当调用方对其确认满意时,调用方进程将等待结果包。然而,在等待时,调用者定期地向被调用者发送一个探测包,被调用者希望对该探测包进行确认。这允许调用者注意到被调用者是否崩溃或是否存在一些严重的通信故障,并将异常通知用户。如果继续确认这些探测,调用者将无限期地等待,因为知道被调用者正在(或声称正在)处理调用而感到高兴。在我们的实现中,第一个探测是在比机器之间的大约往返时间稍长一点的延迟之后发出的。探针之间的间隔逐渐增加,直到大约10分钟后,探针每5分钟发送一次。每一个探测都受到重传任务策略的影响,与调用的其他数据包相似。因此,如果出现通信故障,相对于调用者一直在等待调用结果的总时间,调用者很快就会被告知。注意,这只会检测通信级别的故障:它不会检测调用方在处理调用时是否已死锁。这与我们使RPC语义类似于本地过程调用语义的原则是一致的。我们有可用的语言工具来监视进程并在适当的时候终止它;这些工具同样适用于等待远程调用的进程。

    对于重传会话和确认,一种可能的替代策略是,如果数据包的接收方没有明显地比预期的重传任务间隔时间更早地生成下一个数据包,则自动生成一个确认。这将在处理长时间调用或调用之间的大间隙时节省数据包的重传。我们认为,保存这个数据包并不是一个足够大的收益,值得额外的成本,以检测自发承认是需要的。在我们的实现形式的这些额外的成本将维持一个额外的数据结构来支持一个额外的过程在服务器产生自发的承认,在适当的时候~ +额外的成本的计算过程决定何时生成确认。特别是在不需要确认的情况下,很难避免产生额外的成本。调用方没有类似的额外成本,因为调用方必须有一个重传算法,以防呼叫包丢失。

    如果参数(或结果)太大而无法放入单个数据包,则将它们以多个数据包的形式发送,除最后一个数据包外,每个数据包都请求明确的确认。因此,当发送较大的调用参数时,调用方和被调用方交替发送数据包,调用方发送数据包,而被调用方以确认进行响应。这允许实现在调用的每一端只使用一个包缓冲区,并避免了在普通批量数据传输协议中发现的缓冲和流控制策略的必要性。为了允许重复消除,一个呼叫中的多个数据包每个都有一个呼叫相关的序列号。图4显示了复杂调用的包序列。

    

    As described in Section 3.1, this protocol concentrates on handling simple calls on local networks. If the call requires more than one packet for its arguments or results, our protocol sends more packets than are logically required. We believe this is acceptable; there is still a need for protocols designed for efficient transfer of bulk data, and we have not tried to incorporate both RPC and bulk data in a single protocol. For transferring a large amount of data in one direction, our protocol sends up to twice as many packets as a good bulk data protocol would send (since we acknowledge each packet). This would be particularly inappropriate across long haul networks with large delays and high data rates. However, if the communication activity can reasonably be represented as procedure calls, then our protocol has desirable characteristics even across such long haul networks. It is sometimes practical to use RPC for bulk data transfer across such networks, by multiplexing the data between several processes each of which is making single packet calls--the penalty then is just the extra acknowledgment per packet, and in some situations this is acceptable. The dominant advantage of requiring one acknowledgment for each argument packet (except the last one) is that it simplifies and optimizes the implementation. It would be possible to use our protocol for simple calls, and to switch automatically to a more conventional protocol for complicated ones. We have not explored this possibility.

     如3.1节所述,该协议主要处理本地网络上的简单调用。如果调用的参数或结果需要一个以上的包,我们的协议就会发送超出逻辑要求的包。我们认为这是可以接受的;仍然需要为高效传输批量数据而设计的协议,而且我们还没有尝试将RPC和批量数据合并到一个协议中。对于在一个方向上传输大量数据,我们的协议发送的包数是良好的批量数据协议发送的包数的两倍(因为我们确认每个包)。这在具有大延迟和高数据速率的长途网络中尤其不合适。然而,如果通信活动可以被合理地表示为过程调用,那么我们的协议就具有理想的特性,即使在这样的长途网络中。有时使用RPC在这样的网络上进行批量数据传输是可行的,方法是在多个进程之间多路传输数据,每个进程都在进行单个包调用——因此代价只是每个包额外的确认,在某些情况下这是可以接受的。对每个参数包(最后一个除外)都要求一个确认的主要优点是它简化并优化了实现。 使用我们的协议应对简单的调用需求,并自动切换到一个对复杂的调用更适用的协议。我们还没有探讨过这种可能性。

3.4 Exception Handling

 The Mesa language provides quite elaborate facilities for a procedure to notify exceptions to its caller. These exceptions, called signals, may be thought of as dynamically bound procedure activations: when an exception is raised, the Mesa runtime system dynamically scans the call stack to determine if there is a catch phrase for the exception. If so, the body of the catch phrase is executed, with arguments given when the exception was raised. The catch phrase may return {with results) causing execution to resume where the exception was raised, or the catch phrase may terminate with a jump out into a lexically enclosing context. In the case of such termination, the dynamically newer procedure activations on the call stack are unwound (in most-recent-first order).

   Our RPC package faithfully emulates this mechanism. There are facilities in the protocol to allow the process on the server machine handling a call to transmit an exception packet in place of a result packet. This packet is handled by the RPCRuntime on the caller machine approximately as if it were a call packet, but instead of invoking a new call it raises an exception in the appropriate process. If there is an appropriate catch phrase, it is executed. If the catch phrase returns, the results are passed back to the callee machine, and events proceed normally. If the catch phrase terminates by a jump then the callee machine is so notified, which then unwinds the appropriate procedure activations. Thus we have again emulated the semantics of local calls. This is not quite true: in fact we permit the callee machine to communicate only those exceptions which are defined in the Mesa interface which the callee exported. This simplifies our implementation (in translating the exception names from the callee's machine environment to the caller's), and provides some protection and debugging assistance. The programming convention in single machine programs is that if a package wants to communicate an exception to its caller then the exception should be defined in the package's interface; other exceptions should be handled by a debugger. We have maintained and enforced this convention for RPC exceptions

    In addition to exceptions raised by the callee, the RPCRuntime may raise a call failed exception if there is some communication difficulty. This is the primary way in which our clients note the difference between local and remote calls.

 3.4异常处理

Mesa语言为过程提供了非常复杂的工具来通知调用者异常。这些异常(被调用的信号)可以被视为动态绑定的过程激活:当引发异常时,Mesa运行时系统动态地扫描调用堆栈,以确定是否有异常的捕获短语。如果是,则执行catch短语的主体,并在引发异常时给出参数。catch短语可能返回{带有结果),导致在引发异常的地方继续执行,或者catch短语可能以跳转到词法封闭的上下文而终止。在这种终止的情况下,调用堆栈上动态更新的过程激活被解除(以最近优先顺序)。

    我们的RPC包忠实地模拟了这种机制。协议中有一些设施允许服务器机器上处理调用的进程传输一个异常包来代替结果包。这个包由调用者机器上的RPCRuntime处理,就好像它是一个调用包一样,但是它不是调用一个新的调用,而是在适当的进程中引发一个异常。如果有适当的catch短语,就执行它。如果catch短语返回,则将结果传递回被调用机器,事件将正常进行。如果catch短语通过跳转终止,则通知被调用机器,然后展开适当的过程激活。因此,我们再次模拟了本地调用的语义。这并不完全正确:事实上,我们只允许被调用机器通信被调用机器导出的Mesa接口中定义的异常。这简化了我们的实现(将异常名称从被调用方的机器环境转换为调用方的机器环境),并提供了一些保护和调试帮助。单机程序中的编程约定是,如果一个包想要将异常传递给它的调用者,那么该异常应该在包的接口中定义;其他异常应由调试器处理。我们为RPC异常维护并强制了这个约定。

   除了被调用方引发的异常之外,如果存在一些通信困难,RPCRuntime还可能引发调用失败异常。这是我们的客户注意本地调用和远程调用之间区别的主要方式。

 3.5 Use of Processes

In Mesa and Cedar, parallel processes are available as a built-in language feature. Process creation and changing the processor state on a process swap are considered inexpensive. For example, forking a new process costs about as much as ten (local) procedure calls. A process swap involves swapping an evaluation stack and one register, and invalidating some cached information. However, on the scale of a remote procedure call, process creation and process swaps can amount to a significant cost. This was shown by some of Nelson's experiments [13]. Therefore we took care to keep this cost low when building this package and designing our protocol.

   The first step in reducing cost is maintaining in each machine a stock of idle server processes willing to handle incoming packets. This means that a call can be handled without incurring the cost of process creation, and without the cost of initializing some of the state of the server process. When a server process is entirely finished with a call, it reverts to its idle state instead of dying. Of course, excess idle server processes kill themselves if they were created in response to a transient peak in the number of RPC calls.

   Each packet contains a process identifier for both source and destination. In packets from the caller machine, the source process identifier is the calling process. In packets from the callee machine, the source process identifier is the server process handling the call. During a call, when a process transmits a packet it sets the destination process identifier in the packet from the source process identifier in the preceding packet of the call. If a process is waiting for the next packet in a call, the process notes this fact in a (simple) data structure shared with our Ethernet interrupt handler. When the interrupt handler receives an RPC packet, it looks at the destination process identifier. If the corresponding process on this machine is at this time waiting for an RPC packet, then the incoming packet is dispatched directly to that process. Otherwise, the packet is dispatched to an idle server process (which then decides whether the packet is part of a current call requiring an acknowledgment; the start of a new call that this server process should handle, or a duplicate that may be discarded). This means that in most cases an incoming packet is given to the process that wants it with one process swap. (Of course, these arrangements are resilient to being given an incorrect process identifier.) When a calling activity initiates a new call, it attempts to use as its destination the identifer of the process that handled the previous call from that activity. This is beneficial, since that process is probably waiting for an acknowledgment of the results of the previous call, and the new call packet will be sufficient acknowledgment. Only a slight performance degradation will result from the caller using a wrong destination process, so a caller maintains only a single destination process for each calling process.

   In summary, the normal sequence of events is as follows: A process wishing to make a call manufactures the first packet of the call, guesses a plausible value for the destination process identifier and sets the source to be itself. It then presents the packet to the Ethernet output device and waits for an incoming packet. In the callee machine, the interrupt handler receives the packet and notifies an appropriate server process. The server process handles the packet, then manufactures the response packet. The destination process identifier in this packet will be that of the process waiting in the caller machine. When the response packet arrives in the caller machine, the interrupt handler there passes it directly to the calling process. The calling process now knows the process identifier of the server process, and can use this in subsequent packets of the call, or when initiating a later call

   The effect of this scheme is that in simple calls no processes are created, and there are typically only four process swaps in each call. Inherently, the minimum possible number of process swaps is two (unless we busy-wait)--we incurred the extra two because incoming packets are handled by an interrupt handler instead of being dispatched to the correct process directly by the device microcode (because we decided not to write specialized microcode).

 3.5使用过程

    Mesa和Cedar中,并行进程是一种内置的语言特性。在进程交换中创建进程和更改处理器状态被认为是开销不大的。例如,fork一个新过程的成本大约相当于10个(本地)过程调用。进程交换包括交换一个计算堆栈和一个寄存器,并使一些缓存信息失效。但是,在远程过程调用的规模上,进程创建和进程交换可能会产生巨大的成本。尼尔森的一些实验证明了这一点。因此,在构建这个包和设计我们的协议时,我们小心地保持低成本。

    降低成本的第一步是在每台机器中维护一批愿意处理传入数据包的空闲服务器进程。这意味着可以在不产生进程创建成本和初始化服务器进程某些状态成本的情况下处理调用。当一个服务器进程完全完成调用时,它将恢复到空闲状态,而不是死亡。当然,如果创建多余的空闲服务器进程是为了响应RPC调用数量的临时峰值,那么它们就会杀死自己。

    每个包包含源和目的的进程标识符。在来自调用者机器的数据包中,源进程标识符就是调用进程。在来自被调用机器的数据包中,源进程标识符是处理调用的服务器进程。在调用过程中,当一个进程传输一个包时,它在包中设置目标进程标识符,该标识符来自调用的前一个包中的源进程标识符。如果一个进程正在等待调用中的下一个数据包,进程会在一个(简单的)数据结构中记录这一事实,该数据结构与我们的以太网中断处理程序共享。当中断处理程序收到一个RPC包时,它会查看目标进程标识符。如果此机器上的相应进程此时正在等待一个RPC包,那么传入的包将直接分派给该进程。否则,包被分派到一个空闲的服务器进程(然后决定包是否是需要确认的当前调用的一部分;该服务器进程应该处理的新调用的开始,或者可能被丢弃的重复调用)。这意味着在大多数情况下,通过一个进程交换,将传入的数据包交给需要它的进程。(当然,这些安排对于给出不正确的进程标识符是有弹性的。)当一个调用活动发起一个新的调用时,它将尝试使用处理该活动上一次调用的流程的标识符作为目的地。这是有益的,因为该过程可能正在等待对前一个调用结果的确认,而新的调用包将是充分的确认。调用方使用错误的目标进程只会导致性能稍有下降,因此调用方仅为每个调用进程维护单个目标进程。

    总而言之,正常的事件序列如下:希望进行调用的进程制造调用的第一个包,猜测目标进程标识符的合理值,并将源设置为自身。然后它将数据包呈现给以太网输出设备,并等待传入的数据包。在被调用的机器上,中断处理程序接收数据包并通知相应的服务器进程。服务器进程处理数据包,然后制造响应数据包。这个包中的目标进程标识符将是在调用者机器中等待的进程标识符。当响应包到达调用者机器时,那里的中断处理程序直接将它传递给调用进程。调用进程现在知道服务器进程的进程标识符,并且可以在调用的后续包中使用它,或者在发起以后的调用时使用它。

   这种模式的效果是,在简单调用中不会创建进程,而且在每个调用中通常只有4个进程交换。本质上,进程交换的最小可能数量是两个(除非我们忙-等待)——我们产生了额外的两个,因为传入的包是由中断处理程序处理的,而不是由设备微码直接分派到正确的进程(因为我们决定不写专门的微码)。

3.6 Other Optimizations

 The above discussion shows some optimizations we have adopted: we use subsequent packets for implicit acknowledgment of previous packets, we attempt to minimize the costs of maintaining our connections, we avoid costs of establishing and terminating connections, and we reduce the number of process switches involved in a call. Some other detailed optimizations also have significant payoff.

   When transmitting and receiving RPC packets we bypass the software layers that correspond to the normal layers of a protocol hierarchy. (Actually, we only do so in cases where caller and callee are on the same network--we still use the protocol hierarchy for internetwork routing.) This provides substantial performance gains, but is, in a sense, cheating: it is a successful optimization because only the RPC package uses it. That is, we have modified the network-driversoftware to treat RPC packets as a special case; this would not be profitable if there were ten special cases. However, our aims imply that RPC/s a special case: we intend it to become the dominant communication protocol. We believe that the utility of this optimization is not just an artifact of our particular implementation of the layered protocol hierarchy. Rather, it will always be possible for one particular transport level protocol to improve its performance significantly by by-passing the full generality of the lower layers.

   There are reasonable optimizations that we do not use: we could refrain from using the internet packet format for local network communication, we could use specialized packet formats for the simple calls, we could implement special purpose network microcode, we could forbid non-RPC communication, or we could save even more process switches by using busy-waits. We have avoided these optimizations because each is in some way inconvenient, and because we believe we have achieved sufficient efficiency for our purposes. Using them would probably have provided an extra factor of two in our performance.

3.6 其它优化 

   上述讨论展示了一些优化我们采用了:我们使用后续数据包隐认为以前的包,我们试图最小化成本,保持我们的联系,我们避免成本国——栎社和终止连接,和我们减少过程开关参与一个电话。其他一些详细的优化也有显著的回报。

    当发送和接收RPC包时,我们绕过与协议层次结构的正常层对应的软件层。(实际上,我们只在调用方和被调用方在同一个网络上的情况下这样做——我们仍然使用协议层次结构进行网络路由。)这提供了实质性的性能提升,但从某种意义上说,这是一种欺骗:这是一种成功的优化,因为只有RPC包使用它。也就是说,我们对网络驱动软件进行了修改,将RPC包作为一个特例对待;如果有十种特殊情况,这将不会有利可图。然而,我们的目标意味着RPC/是一种特殊情况:我们打算让它成为主要的通信协议。我们相信这种优化的效用不仅仅是我们特定实现的产物——分层协议层次结构。相反,一个特定的传输层协议总是有可能通过绕过较低层的全部通用性来显著提高其性能。

    有一些合理的优化我们没有使用:我们可以避免使用互联网包格式进行本地网络通信,我们可以为简单的调用使用专门的包格式,我们可以实现特殊目的的网络微码,我们可以禁止非rpc通信,或者我们可以通过使用忙等待节省更多的进程切换。我们避免了这些优化,因为每一个都在某种程度上不方便,而且我们相信我们已经达到了我们的目的所需的效率。使用它们可能会为我们的性能提供额外的两个因素。

3.7 Security

Our RPC package and protocol include facilities for providing encryption-based security for calls. These facilities use Grapevine as an authentication service (or key distribution center) and use the federal data encryption standard [5]. Callers are given a guarantee of the identity of the callee, and vice versa. We provide full end-to-end encryption of calls and results. The encryption techniques provide protection from eavesdropping (and conceal patterns of data), and detect attempts at modification, replay, or creation of calls. Unfortunately, there is insufficient space to describe here the additions and modifications we have made to support this mechanism. It will be reported in a later paper.

3.7 安全

    我们的RPC包和协议包括为调用提供基于加密的安全性的工具。这些设施使用Grapevine作为身份验证服务(或密钥分发中心),并使用联邦数据加密标准[5]。呼叫者得到了被呼叫者身份的保证,反之亦然。我们提供完整的端到端加密呼叫和结果。加密技术提供了防止窃听(和隐藏数据模式)的保护,并检测修改、重放或创建呼叫的企图。不幸的是,这里没有足够的篇幅来描述我们为支持该机制所做的添加和修改。这将在以后的论文中报道。    

4. PERFORMANCE

As we have mentioned already, Nelson's thesis included extensive analysis of several RPC protocols and implementations, and included an examination of the contributing factors to the differing performance characteristics. We do not repeat that information here

  We have made the following measurements of use of our RPC package. The measurements were made for remote calls between two Do rados connected by an Ethernet. The Ethernet had a raw data rate of 2.94 megabits per second. The Dorados were running Cedar. The measurements were made on an Ethernet shared with other users, but the network was lightly loaded (apart from our tests), at five to ten percent of capacity. The times shown in Table I are all in microseconds, and were measured by counting Dorado microprocessor cycles and dividing by the known crystal frequency. They are accurate to within about ten percent. The times are elapsed times: they include time spent waiting for the network and time used by interference from other devices. We are measuring from when the user program invokes the local procedure exported by the userstub until the corresponding return from that procedure call. This interval includes the time spent inside the user-stub, the RPCRuntime on both machines, the server-stub, and the server implementation of the procedures (and transmission times in both directions). The test procedures were all exported to a single interface. We were not using any of our encryption facilities.

   We measured individually the elapsed times for 12,000 calls on each procedure. Table I shows the minimum elapsed time we observed, and the median time. We also present the total packet transmission times for each call (as calculated from the known packet sizes used by our protocol, rather than from direct measurement). Finally, we present the elapsed time for making corresponding calls if the user program is bound directly to the server program (i.e., when making a purely local call, without any involvement of the RPC package). The time for purely local calls should provide the reader with some calibration of the speed of the Dorado processor and the Mesa language. The times for local calls also indicate what part of the total time is due to the use of RPC.

   The first five procedures had, respectively, 0, 1, 2, 4 and 10 arguments and 0, 1, 2, 4 and 10 results, each argument or result being 16 bits long. The next five procedures all had one argument and one result, each argument or result being an array of size 1, 4, 10, 40 and 100 words respectively. The second line from the bottom shows a call on a procedure that raises an exception which the caller resumes. The last line is for the same procedure raising an exception that the caller causes to be unwound.

   For transferring large amounts of data in one direction, protocols other than RPC have an advantage, since they can transmit fewer packets in the other direction. Nevertheless, by interleaving parallel remote calls from multiple processes, we have achieved a data rate of 2 megabits per second transferring between Dorado main memories on the 3 megabit Ethernet. This is equal to the rate achieved by our most highly optimized byte stream implementation (written in BCPL).

   We have not measured the cost of exporting or importing an interface. Both of these operations are dominated by the time spent talking to the Grapevine server(s). After locating the exporter machine, calling the exporter to determine the dispatcher identifier uses an RPC call with a few words of data.

RPC论文翻译(Implementing Remote Procedure Calls)_第5张图片

4. 性能    

正如我们已经提到的,Nelson的论文包括了对几个RPC协议和实现的广泛分析,并包括了对不同性能特征的影响因素的检查。我们在此不再重复这一信息。

    我们对RPC包的使用进行了以下度量。测量是通过以太网对两个Do无线电之间的远程呼叫进行的。以太网的原始数据速率为每秒2.94兆位。多拉多人跑的是雪松。测量是在与其他用户共享的以太网上进行的,但是网络负载很轻(除了我们的测试),容量只有5%到10%。表一所示的时间都以微秒为单位,是通过计算多拉多微处理器周期并除以已知的晶体频率来测量的。它们的准确度在10%以内。时间是经过的时间:包括等待网络的时间和来自其他设备的干扰所占用的时间。我们从用户程序何时调用由用户存根导出的本地过程开始度量,直到该过程调用的相应返回为止。这个间隔包括在用户存根内花费的时间、两台机器上的RPCRuntime、服务器存根和过程的服务器实现(以及两个方向的传输时间)。所有测试过程都导出到单个接口。我们没有使用任何加密设备。

    我们分别测量了每个过程上12,000个调用的耗时。表I显示了我们观察到的最小运行时间和中值时间。我们还给出了每次调用的总包传输时间(根据协议使用的已知包大小计算,而不是直接测量)。最后,我们给出了当用户程序直接绑定到服务器程序时(即,当进行纯本地调用时,不涉及任何RPC包)进行相应调用所需的时间。纯粹本地调用的时间应该为读取器提供Dorado处理器和Mesa语言的速度的一些校准。本地调用的时间还表明总时间的哪一部分是由于RPC的使用。

     前5个过程分别有0、1、2、4和10个参数和0、1、2、4和10个结果,每个参数或结果都是16位长。接下来的5个过程都有一个参数和一个结果,每个参数或结果都是一个数组,大小分别为1、4、10、40和100个单词。底部的第二行显示了对过程的调用,该调用引发了一个异常,调用者将继续该异常。最后一行是用于同一个过程的,该过程引发调用者导致解除的异常。

   对于在一个方向上传输大量数据,RPC之外的协议具有优势,因为它们可以在另一个方向上传输更少的包。然而,通过交错来自多个过程的并行远程调用,我们已经实现了在3兆以太网上Dorado主存储器之间每秒2兆的数据传输速率。这相当于我们最优的字节流实现(用BCPL编写)所达到的速率。

    我们还没有衡量导出或导入接口的成本。这两种操作都取决于花在与Grapevine服务器对话上的时间。在定位导出器机器之后,调用导出器以确定分派器标识符,将使用一个带有几个数据字的RPC调用。

5. STATUS AND DISCUSSIONS

 The package as we have described it is fully implemented and in use by Cedar programmers. The entire RPCRuntime package amounts to four Cedar modules (packet exchange, packet sequencing, binding and security), totalling about 2,200 lines of source code. Lupine (the stub generator) is substantially larger. Clients are using RPC for several projects, including the complete communication protocol for Alpine (a file server supporting multimachine transactions), and the control communication for an Ethernet-based telephone and audio project. (It has also been used for two network games, providing real-time communication between players on multiple machines.) All of our clients have found the package convenient to use, although neither of the projects is yet in full-scale use. Implementations of the protocol have been made for BCPL, InterLisp, SmallTalk and C.

   We are still in the early stages of acquiring experience with the use of RPC and certainly more work needs to be done. We will have much more confidence in the strength of our design and the appropriateness of RPC when it has been used in earnest by the projects that are now committing to it. There are certain circumstances in which RPC seems to be the wrong communication paradigm. These correspond to situations where solutions based on multicasting or broadcasting seem more appropriate [2]. It may be that in a distributed environment there are times when procedure calls (together with our language's parallel processing and coroutine facilities) are not a sufficiently powerful tool, even though there do not appear to be any such situations in a single machine.

   One of our hopes in providing an RPC package with high performance and low cost is that it will encourage the development of new distributed applications that were formerly infeasible. At present it is hard to justify some of our insistence on good performance because we lack examples demonstrating the importance of such performance. But our belief is that the examples will come: the present lack is due to the fact that, historically, distributed communication has been inconvenient and slow. Already we are starting to see distributed algorithms being developed that are not considered a major undertaking; if this trend continues we will have been successful.

   A question on which we are still undecided is whether a sufficient level of performance for our RPC aims can be achieved by a general purpose transport protocol whose implementation adopts strategies suitable for RPC as well as ones suitable for bulk data transfer. Certainly, there is no entirely convincing argument that it would be impossible. On the other hand, we have not yet seen it achieved.

   We believe the parts of our RPC package here discussed are of general interest in several ways. They represent a particular point in the design spectrum of RPC. We believe that we have achieved very good performance without adopting extreme measures, and without sacrificing useful call and parameter semantics. The techniques for managing transport level connections so as to minimize the communication costs and the state that must be maintained by a server are important in our experience of servers dealing with large numbers of users. Our binding semantics are quite powerful, but conceptually simple for a programmer familiar with single machine binding. They were easy and efficient to implement.

5.现状讨论      

我们所描述的包是由Cedar程序员完全实现和使用的。整个RPCRuntime包相当于四个Cedar模块(包交换、包排序、绑定和安全),总共大约有2200行源代码。Lupine(存根生成器)要大得多。客户端在多个项目中使用RPC,包括Alpine的完整通信协议(支持多机事务的文件服务器),以及基于以太网的电话和音频项目的控制通信。(它也被用于两款网络游戏,为多台机器上的玩家提供实时通信。)我们所有的客户都发现这个包使用起来很方便,尽管这两个项目都还没有完全投入使用。该协议已在BCPL、InterLisp、SmallTalk和C语言中实现。

    我们仍然处于获取使用RPC经验的早期阶段,当然还需要做更多的工作。当现在致力于RPC的项目认真地使用它时,我们将对我们的设计的力量和RPC的适当性有更多的信心。在某些情况下,RPC似乎是错误的通信范式。这些对应的情况是,基于多播或广播的解决方案似乎更合适[2]。可能在分布式环境中,有时过程调用(连同我们语言的并行处理和协程设施)不是一个足够强大的工具,即使在单个机器中似乎没有任何这样的情况。

     我们希望提供一个高性能和低成本的RPC包,它将鼓励新的分布式应用程序的开发,这在以前是不可实现的。目前,我们很难证明我们对良好业绩的一些坚持是合理的,因为我们缺乏证明这种业绩重要性的例子。但我们相信,这样的例子会出现:目前的缺乏是由于历史上分布式通信不方便和缓慢的事实。我们已经开始看到正在开发的分布式算法,它们被认为不是一项主要任务;如果这种趋势继续下去,我们就成功了。

    我们仍未确定的一个问题是,对于我们的RPC目标,是否可以通过一个通用传输协议来实现足够的性能级别,该协议的实现采用了适合RPC的策略以及适合批量数据传输的策略。当然,没有完全令人信服的论点认为这是不可能的。另一方面,我们还没有看到它实现。

    我们相信这里讨论的RPC包的各个部分在几个方面都是普遍感兴趣的。它们代表RPC设计范围中的一个特定点。我们相信,我们没有采取极端的措施,也没有牺牲有用的调用和参数语义,就取得了非常好的性能。在服务器处理大量用户的经验中,管理传输级连接以最小化通信成本和服务器必须维护的状态的技术非常重要。我们的绑定语义非常强大,但对于熟悉单机绑定的程序员来说,概念上很简单。它们的实现简单而高效。

REFERENCES

1. BIRRELL, A. D., LEVIN, R., NEEDHAM, R. Mo AND SCHROEDER, M. D. Grapevine: an exercise in distributed computing. Commun. ACM 25, 4 (April 1982), 260-274.

2. BOGGS, D. R. Internet Broadcasting. PhD dissertation, Department of Electrical Engineering, Stanford University, Jan. 1982.

3. BOGGS, D. R., SHOCH, J. R., TAFT, E. A. AND METCALF, R. M. PUP: An internetwork architecture. IEEE Trans. Commun. 28, 4 (April 1980), 612-634.

4. Courier: the remote procedure call protocol. Xerox System Integration Standard XSIS-038112, Xerox Corporation, Stamford, Connecticut, Dec. 1981.

5. DATA ENCRYPTION STANDARD. FIPS Publication 46. National Bureau of Standards, U.S. Department of Commerce, Washington D.C., January 1977.

6. DEUTSCH, L. P. AND TAFT, E. A. Requirements for an exceptional programming environment. Tech. Rep. CSL-80-10, Xerox Palo Alto Research Center, Palo Alto, Calif., 1980.

7. Ethernet, a local area network: data link layer and physical layer specifications version 1.0. Digital Equipment Corporation, Intel Corporation, Xerox Corporation, Sept. 1980.

8. LAMPSON, B. W. AND PIER, K. A. A processor for a high-performance personal computer. In Proc 7th IEEE Symposium on Computer Architecture, (May 1980), IEEE, New York, pp. 146- 160.

9. LAMPSON, B. W. AND SCHMIDT, E. E.'Practical use of a polymorphic applicative language. In Proc. Tenth Annual ACM Symposium on Principles of Programming Languages (Austin, Texas, Jan. 24-26), ACM, New York (1983), pp. 237-255.

10. LISKOV, B. Primitives for distributed computing. Oper. Syst. Rev. 13, 5 {Dec. 1979), 33-42.

11. METCALFE, R. M. AND BOGGS, D. R. Ethernet: Distributed packet switching for local computer networks. Commun. ACM 19, 7 {July 1976), 395-404.

12. MITCHELL, J. G., MAYBURY, W. AND SWEET, R. Mesa language manual (Version 5.0). Tech. Rep. CSL-79-3, Xerox Palo Alto Research Center, Palo Alto, Calif. 1979.

13. NELSON, B. J. Remote procedure call. Tech. Rep. CSL-81-9, Xerox Palo Alto Research Center, Palo Alto, Calif. 1981.

14. SPECTOR, A. Z. Performing remote operations efficiently on a local computer network. Commun. ACM 25, 4 (April 1982), 246-260.

15. WHITE, J. E. A high-level framework for network-based resource sharing. In Proc. National Computer Conference, (June 1976). Received March 1983; revised November 1983; accepted November 1983

参考文献:

伯雷尔,A. D.,莱文,R., NEEDHAM, R. Mo AND SCHROEDER, M. D. Grapevine:分布式计算的一个练习。Commun。ACM 25, 4(1982年4月),260-274。

2. 网络广播。1982年1月,斯坦福大学电气工程系博士论文。

3.BOGGS, D. R., SHOCH, J. R., TAFT, E. A. AND METCALF, R. M. PUP:一种互联网络架构。IEEE反式。共同体28,4(1980年4月),612-634。

4. 信使:远程过程调用协议。Xerox System Integration Standard XSIS-038112, Xerox Corporation, Stamford, Connecticut, 1981年12月。

5. 数据加密标准。FIPS出版46。美国商务部国家标准局,华盛顿特区,1977年1月。

6. 特殊编程环境的要求。技术众议员CSL-80-10,施乐帕洛阿尔托研究中心,加利福尼亚州帕洛阿尔托,1980年。

7. 以太网,局域网:数据链路层和物理层规范版本1.0。数字设备公司,英特尔公司,施乐公司,1980年9月。

8. 一种用于高性能个人计算机的处理器。第7届IEEE计算机体系结构研讨会,(1980年5月),IEEE,纽约,146- 160页。

9. 兰普森,b。w。施密特,e。e。多态应用语言的实际应用。第十届ACM年度编程语言原则研讨会(Austin, Texas, january 24-26), ACM,纽约(1983),第237-255页。

10. 用于分布式计算的原语。③。系统。Rev. 13, 5{12月5日1979), 33-42。

11. 以太网:用于本地计算机网络的分布式分组交换。Commun。ACM 197{1976年7月),395-404。

12. MITCHELL, j.g., MAYBURY, W. AND SWEET, R. Mesa语言手册(5.0版)。技术代表CSL-79-3,施乐帕洛阿尔托研究中心,帕洛阿尔托,加利福尼亚州,1979年。

13. 尼尔森,b。j。远程程序调用。技术众议员CSL-81-9,施乐帕洛阿尔托研究中心,帕洛阿尔托,加利福尼亚州,1981年。

14. 在本地计算机网络上有效地执行远程操作。Commun。ACM 25, 4(1982年4月),246-260。

15. 基于网络的资源共享的高级框架。Proc。国家

作者:BIRRELL与NELSON

参考链接:http://birrell.org/andrew/papers/ImplementingRPC.pdf

你可能感兴趣的:(RPC,RPC论文,Nelson,远程通信,分布式计算)