PBFT算法部分翻译

算法

  Our algorithm is a form of state machine replication the service is modeled as a state machine that is replicated across different nodes in a distributed system. Each state machine replica maintains the service state and implements the service operations. We denote the set of replicas R by and identify each replica using an integer in {0, ..., |R| - 1} . For simplicity, we assume |R| = 3f + 1 where f is the maximum number of replicas that may be faulty; although there could be more than 3f + 1 replicas, the additional replicas degrade performance (since more and bigger messages are being exchanged) without providing improved resiliency.
  PBFT算法采用一种状态机复制的形式:服务被建模为一个状态机,在分布式系统中的不同节点之间复制。 每个状态机副本节点都维护服务的状态并实现服务操作。 我们用 {0, ..., |R| - 1} 中所有的整数表示一组副本节点,并使用每一个整数来标识每个副本节点。 为了简单,我们假设 |R| = 3f + 1f 是可能有错误的副本节点的最大数目; 尽管副本节点的总数有可能超过 3f + 1, 但是额外的副本降低了性能(因为更多和更大的消息交换)而不会提供更高的容错能力。

  The replicas move through a succession of configurations called views. In a view one replica is the primary and the others are backups. Views are numbered con-secutively. The primary of a view is replica p such that p = v mod |R| , where v is the view number. View changes are carried out when it appears that the primary has failed. Viewstamped Replication and Paxos used a similar approach to tolerate benign faults (as dis-cussed in Section 8.)
  副本节点通过一系列配置来移动,这些配置被称为视图。在一个视图中,一个副本节点是主节点,其他副本节点是备份节点。 视图是连续编号的。 一个视图中的主要副本的编号为 p,由公式 p = v mod | R | 计算得来,其中 v 是视图编号。 当主节点出现故障时,将执行视图更换。Viewstamped ReplicationPaxos 使用类似的方法来容忍良性故障(如第8节讨论的)

The algorithm works roughly as follows:
该算法工作流程大致如下:

  1. A client sends a request to invoke a service operation to the primary
  2. The primary multicasts the request to the backups
  3. Replicas execute the request and send a reply to the client
  4. The client waits for f + 1 replies from different replicas with the same result; this is the result of the operation.

  1. 客户端向主节点发送调用服务操作的请求。
  2. 主节点组播请求到各备份节点。
  3. 副本节点们执行请求并发送回复给客户端。
  4. 客户端等待来自不同副本的 f + 1 个相同的回复; 这就是这次请求的结果。

  Like all state machine replication techniques, we impose two requirements on replicas: they must be deterministic (i.e., the execution of an operation in a given state and with a given set of arguments must always produce the same result) and they must start in the same state. Given these two requirements, the algorithm ensures the safety property by guaranteeing that all non-faulty replicas agree on a total order for the execution of requests despite failures.
  像所有状态机复制技术一样,我们对副本节点施加两个要求:它们必须是确定性的(即,在给定状态以及给定一组参数的状态下,执行操作必须始终产生相同的结果),并且它们必须以同样的状态开始服务。考虑到这两个要求,该算法通过保证即使在失败的情况下,所有无故障副本节点都会就总的执行请求的顺序达成一致,来确保安全性。

  The remainder of this section describes a simplified version of the algorithm. We omit discussion of how nodes recover from faults due to lack of space. We also omit details related to message retransmissions. Furthermore, we assume that message authentication is achieved using digital signatures rather than the more efficient scheme based on message authentication codes; Section 5 discusses this issue further. A detailed formalization of the algorithm using the I/O automaton model is presented in.
  本节的其余部分介绍了该算法的简化版本。 我们忽略了节点由于缺乏存储空间而从故障中恢复的问题。 我们也省略了与消息重传有关的细节。 此外,我们假设消息认证是使用数字签名来实现的,而不是基于消息认证码的更有效的方案。 第5节将进一步讨论这个问题。 提出了使用 I / O 自动机模型的算法的详细形式化。

4.1 The Client

  A client c requests the execution of state machine operation o by sending [图片上传失败...(image-302321-1511683393005)] message to the primary. Timestamp is used to ensure exactly-once semantics for the execution of client requests. Timestamps for c's requests are totally ordered such that later requests have higher timestamps than earlier ones; for example, the timestamp could be the value of the client’s local clock when the request is issued.
  客户端通过向主节点发送一个消息

png.latex.png
来请求执行状态机操作。时间戳用于确保执行客户端请求的唯一,保证该请求不会重复执行。请求的时间戳是完全有序的,以便后来的请求比以前的有更高的时间戳。例如,时间戳可以是发出请求时客户端本地时钟的值。

  Each message sent by the replicas to the client includes the current view number, allowing the client to track the view and hence the current primary. A client sends a request to what it believes is the current primary using a point-to-point message. The primary atomically multicasts the request to all the backups using the protocol described in the next section.
  副本发送给客户端的每个消息都包含当前视图编号,使得客户端可以追踪视图以及当前的主视图。客户端使用点对点消息向其认为的当前主服务器发送请求。主要使用下一节中描述的协议将请求自动多播到所有备份。

  A replica sends the reply to the request directly to the client. The reply has the form [图片上传失败...(image-ed216c-1511683393005)] where v is the current view number, t is the timestamp of the corresponding request, i is the replica number, and r is the result of executing the requested operation.
  副本将答复直接发送到客户端。答复的形式为 [图片上传失败...(image-a0bfe1-1511683393005)],其中 v 是当前视图编号,t 是相应请求的时间戳,i 是副本编号,r 是执行请求的操作的结果。

  The client waits for f + 1 replies with valid signatures from different replicas, and with the same t and r, before accepting the result . This ensures that the result is valid, since at most f replicas can be faulty.
  客户端在接受结果之前等待来自不同副本的有效签名的f + 1回复,并且使用相同的t和r。 这确保了结果是有效的,因为至多f副本可能有错误。

  If the client does not receive replies soon enough, it broadcasts the request to all replicas. If the request has already been processed, the replicas simply re-send the reply; replicas remember the last reply message they sent to each client. Otherwise, if the replica is not the primary, it relays the request to the primary. If the primary does not multicast the request to the group, it will eventually be suspected to be faulty by enough replicas to cause a view change.
  如果客户端没有及时收到回复,它会将请求广播到所有副本。 如果请求已经被处理,副本只需重新发送答复; 副本记住他们发送给每个客户端的最后回复消息。 否则,如果副本不是主节点,它会将请求转发到主节点。 如果主节点没有组播请求,则最终会被足够的副本认为该主节点已失效,引起视图更改。

  In this paper we assume that the client waits for one request to complete before sending the next one. But we can allow a client to make asynchronous requests, yet preserve ordering constraints on them.
  在本文中,我们假设客户端在发送下一个请求之前会等待前一个请求完成。 但是,在保持对它们的排序约束的情况下,可以允许客户端发出异步请求。

4.2 Normal-Case Operation

  The state of each replica includes the state of the service, a message log containing messages the replica has accepted, and an integer denoting the replica's current view. We describe how to truncate the log in Section 4.3.
  每个副本的状态包括服务的状态,副本已接受消息的消息日志,以及表示副本当前视图的整数。我们在4.3节中描述如何截断日志。

  When the primary, p, receives a client request, m, it starts a three-phase protocol to atomically multicast the request to the replicas. The primary starts the protocol immediately unless the number of messages for which the protocol is in progress exceeds a given maximum. In this case, it buffers the request. Buffered requests are multicast later as a group to cut down on message traffic and CPU overheads under heavy load;this optimization is similar to a group commit in transactional systems . For simplicity, we ignore this optimization in the description below.
  当 primary 节点 p 接收到一个客户端请求 m 时,主节点 p 会启动一个三阶段协议,以原子方式将请求发送到其他副本。主节点立即启动协议,除非协议正在进行的消息数量超过给定的最大值。在这种情况下,它会缓冲请求。缓存的请求稍后作为一个组进行多播,以在负载较重的情况下减少消息流量和CPU开销;这种优化类似于事务性系统中的组提交。为了简单起见,我们在下面的描述中忽略了这个优化。

  The three phases are pre-prepare, prepare, and commit. The pre-prepare and prepare phases are used to totally order requests sent in the same view even when the primary, which proposes the ordering of requests, is faulty. The prepare and commit phases are used to ensure that requests that commit are totally ordered across views.
  三个阶段分别为预准备,准备和提交。预准备阶段和准备阶段用于完全排列在同一视图中发送的请求,即使提出请求排序的主要部分有故障。准备和提交阶段用于确保提交的请求在视图中完全排序。

  In the pre-prepare phase, the primary assigns a sequence number, n, to the request, multicasts a pre-prepare message with m piggybacked to all the backups, and appends the message to its log. The message has the form [图片上传失败...(image-faac88-1511683393005)], where v indicates the view in which the message is being sent, m is the client's request message, and d is m's digest.
  在准预备阶段,主要为请求分配一个序列号n,接着将预准备消息多播到所有的备份节点,并将消息添加加到它的日志中。消息的形式为[图片上传失败...(image-533e80-1511683393005)],其中 v 表示正在发送的消息所在视图的编号,m 是客户端的请求消息,d是m的摘要.

  Requests are not included in pre-prepare messages to keep them small. This is important because pre-prepare messages are used as a proof that the request was assigned sequence number n in view v in view changes. Additionally, it decouples the protocol to totally order requests from the protocol to transmit the request to the replicas; allowing us to use a transport optimized for small messages for protocol messages and a transport optimized for large messages for large requests.
  预准备的消息中不包括请求(可能的意思是不包含请求本身的内容),以减小预准备消息的大小。这一点很重要,因为预准备消息被用来作为在视图 v 中视图变化中为请求分配序号 n 的证据。另外,它将协议解耦以完全从协议中命令请求发送到副本;允许我们使用针对小型消息优化的传输协议消息,以及针对大型消息针对大型消息优化的传输。

A backup accepts a pre-prepare message provided:

  1. the signatures in the request and the pre-prepare message are correct and d is the digest for m;
  2. it is in view v;
  3. it has not accepted a pre-prepare message for view v and sequence number n containing a different digest;
  4. the sequence number in the pre-prepare message is between a low water mark,h , and a high water markm H.

  1. 请求中的签名和预准备消息是正确的,并且 d 是 m 的摘要;
  2. 它是在视图 v;
  3. 未接受包含不同摘要的,并且视图编号为 v 和序列号为 n 的预先准备消息;
  4. 预先准备消息中的序列号 n 在低水位 h 和高水位 H 之间。

  The last condition prevents a faulty primary from exhausting the space of sequence numbers by selecting a very large one. We discuss how H and h advance in Section 4.3.
  最后一个条件通过选择一个非常大的一个来防止一个有缺陷的初级排除序列号的空间。我们在4.3节中讨论H和H如何前进。

  If backup i accepts the [图片上传失败...(image-d83fb5-1511683393005)] message, it enters the prepare phase by multicasting a [图片上传失败...(image-cd0504-1511683393005)] message to all other replicas and adds both messagesto its log. Otherwise, it does nothing.
  如果备份节点 i 接受 [图片上传失败...(image-89f25c-1511683393005)] 消息,它将通过向所有其他副本多播一个 [图片上传失败...(image-f8cfbf-1511683393005)] 消息从而进入准备阶段,并添加这两个消息到其日志种。否则,它什么都不做。

  A replica (including the primary) accepts prepare messages and adds them to its log provided their signatures are correct, their view number equals the replica’s current view, and their sequence number is between h and H.
  只要准备消息的签名是正确的,它们的视图编号等于副本的当前视图,并且它们的序列号介于h和H,副本(包括主节点)便接受准备消息,并将它们添加到日志中

  We define the predicate prepared(m, v, n, i) to be true if and only if replica i has inserted in its log: the request m, a pre-prepare for m in view v with sequence number n, and 2f prepares from different backups that match the pre-prepare. The replicas verify whether the prepares match the pre-prepare by checking that they have the same view, sequence number, and digest.
  当且仅当副本 i 已经插入到它的日志中时,我们定义谓词 prepared(m, v, n, i) 为真:请求 m,序列号为 n 的视图 v 中 预准备 m, 2f 个与来自不同备份节点,并与预准备消息准备消息。副本通过检查它们是否具有相同的视图,序列号和摘要来验证准备消息是否与预准备消息相匹配。

  The pre-prepare and prepare phases of the algorithm guarantee that non-faulty replicas agree on a total order for the requests within a view. More precisely, they ensure the following invariant: if prepared(m, v, n, i) is true then prepared(m', v, n, j) is false for any non-faulty j replica (including i=j) and any m' such that D(m') ≠ D(m). This is true because prepared(m', v, n, i) and |R| = 3f +1 imply that at least f + 1 non-faultyreplicas have sent a pre-prepare or prepare for in view with sequence number n. Thus, for prepared(m', v, n, j) to be true at least one of these replicas needs to have sent two conflicting prepares (or pre-prepares if it is the primary for v), i.e., two prepares with the same view and sequence number and a different digest. But this is not possible because the replica is not faulty. Finally, our assumption about the strength of message digests ensures that the probability that m ≠ m and D(m') ≠ D(m) is negligible.
  算法的预准备和准备阶段保证无故障的副本们在视图内的请求的总排序上达成一致。更精确地说,它们确保了以下不变:如果prepared(m, v, n, i) 为真,那么对于任何非故障的副本 j,prepared(m', v, n, j) 和任何 m' 使得 D(m') ≠ D(m) 都是错误的。这是真的,因为prepared(m', v, n, i)| R | = 3f + 1意味着至少有f + 1个非故障复制已经发送了一个预先准备或准备在顺序号为n的视图中。因此,对于 prepared(m', v, n, i) 为真的这些副本中至少有一个需要发送两个冲突的准备消息(或者预准备消息,如果它是 v的主要准备),即两个准备消息具有相同的视图号和序列号以及不同的摘要。但这是不可能的,因为副本没有错误。最后,我们关于消息摘要强度的假设保证了 m ≠ mD(m') ≠ D(m)的概率是可以忽略的。

  Replica multicastsa [图片上传失败...(image-eb299-1511683393005)] to the other replicas when [图片上传失败...(image-fe64-1511683393005)] becomes true. This starts the commit phase. Replicas accept commit messages and insert them in their log provided they are properly signed, the view number in the message is equal to the replica’s current view, and the sequence number is between h and
  当 [图片上传失败...(image-738fdf-1511683393005)]成为true时,将其多播到其他副本节点。这便是提交阶段的开始。只要它们的签名正确,消息中的视图编号等于副本的当前视图,序列号介于h和H,副本节点便接受提交消息并将它们插入日志中。

  We define the committed and committed-local predicates as follows: committed(m, v, n) is true if and only if prepare(m, v, n, i) is true for all i in some set of f + 1 non-faultyreplicas;and committed-local(m, v, n, i) is true if and only if prepared(m, v, n, i) is true and i has accepted 2f + 1 commits (possibly including its own) from different replicas that match the pre-prepare for m; a commit matches a pre-prepare if they have the same view, sequence number, and digest.
  我们定义如下 committed 和 committed-local:当且仅当在某些集合中,对于所有的 i,committed(m, v, n) 是真的,i 已经接受了 2f + 1 非故障重复;committed-local(m, v, n, i) 从不同的副本提交(可能包括它自己的),这些副本与预准备消息中的 m 相匹配;如果具有相同的视图,序列号和摘要,则提交与预准备相匹配。

  The commit phase ensures the following invariant: if committed-local(m, v, n, i) is true for some non-faulty i then committed(m, v, n) is true. This invariant and the view-change protocol described in Section 4.4 ensure that non-faulty replicas agree on the sequence numbers of requests that commit locally even if they commit in different views at each replica. Furthermore, it ensures that any request that commits locally at a non-faulty replica will commit at f + 1 or more non-faulty replicas eventually.
  提交阶段确保以下不变:如果对于一些没有错误的i,committed-local(m, v, n, i) 是真的,那么 committed(m, v, n) 是真的。第4.4节中介绍的这种不变式和视图更改协议可确保非故障副本同意本地提交的请求的序列号,即使它们在每个副本的不同视图中提交。此外,它确保任何在没有错误的副本上本地提交的请求将最终提交到f + 1或更多无故障的副本。

  Each replica executes the operation requested by after committed-local(m, v, n, i) is true and i's state reflects the sequential execution of all requests with lower sequence numbers. This ensures that all nonfaulty replicas execute requests in the same order as required to provide the safety property. After executing there quested operation, replicas send a reply to the client. Replicas discard requests whose timestamp is lower than the timestamp in the last reply they sent to the client to guarantee exactly-once semantics.
  每个副本执行committed-local(m, v, n, i) 后所请求的操作,并且 i 的状态按照提供安全属性所需的相同顺序执行请求。在执行那里操作后,副本发送一个答复给客户端。副本放弃其时间戳低于发送给客户端的最后一个回复中的时间戳的请求,以保证一次的语义。

  We do not rely on ordered message delivery, and therefore it is possible for a replica to commit requests out of order. This does not matter since it keeps the preprepare, prepare, and commit messages logged until the corresponding request can be executed.
  我们不依赖于有序的消息传递,因此副本可能无序地提交请求。这并不重要,因为它保持预准备,准备和提交记录的消息,直到相应的请求可以被执行。

  Figure 1 shows the operation of the algorithm in the normal case of no primary faults. Replica0 is the primary, replica 3 is faulty, and C is the client.
  图1显示了在主节点没有故障的正常情况下算法的操作。副本0是主要的,副本3是有缺陷的,C是客户端。
[图片上传失败...(image-7b1d93-1511683393005)]

你可能感兴趣的:(PBFT算法部分翻译)