【Lamport 逻辑时钟】Time, Clocks, and the Ordering of Events in a Distributed System

论文地址:https://www.ics.uci.edu/~cs230/reading/time.pdf

Lamport July 1978

注记:本文只是把pdf上的内容搬运过来,方便大家在阅读的时候遇到不熟悉的单词或段落好复制翻译。

lamport逻辑时钟是分布式中很重要的基础理论,是分布式系统事件排序/有序判定的支撑理论,后续分布式中的几块基础理论例如互斥和并发、选举、一致性等等都会基于此理论。

一个比较简单、比较经典的分布式协议raft很多人都知道,像raft选主阶段的的term字段(选主任期)就是典型的lamport逻辑时钟的应用。

 

Summary

The paper shows a way of totally ordering events in distributed systems.

 

Distributed system

  • • A process is a sequence of totally ordered events, i.e., for any event a and b in a process, either a comes before b or b comes before a.
  • • A distributed system is a set of distinct and “spatially separated” processes that communicate by sending messages to each other.
  • • A system is distributed if the time it takes to send a message from one process to another is significant compared to the time interval between events in a single process. Why is this distinction important?

 

Review of ordering relations

Taken from “Introduction to Discrete Mathematics” by Hirschfelder and Hirschfelder.

  • • A binary relation ¦ on a set A is reflexive if a ¦ a for every a ∈ A.
  • • A binary relation ¦ on a set A is symmetric if, whenever a ¦ b, then b ¦ a.
  • • A binary relation ¦ on a set A is transitive if, whenever x ¦ y and y ¦ z, then x ¦ z.
  • • A relation ¦ on a set A is antisymmetric if a ¦ b and b ¦ a imply a = b.
  • • A partial order relation on a set A is a relation that is reflexive, antisymmetric, and transitive. The term partial is used to indicate that it is not necessary for any two elements of the set to be related in some way.
  • • A total order relation ¦ on a set A is a partial order relation with the following additional property: If a and b are elements of A, then either a ¦ b or b ¦ a, i.e., any two elements of the set are related in some way.
  • • Examples:
    • – Relations ≤ and ≥ are partial order relations that are also total order relations on the set of integers.
    • – Relation < is not a partial order relation on the set of integers because it is not reflexive.

 

“Happened before” relation

It is assumed that sending or receiving a message is an event.

Definition. The relation → on the set of events of a system is the smallest relation satisfying the following three conditions:

  1. If a and b are events in the same process, and a comes before b, then a → b.
  2. If a is the sending of a message by one process and b is the receipt of the same message by another process, then a → b.
  3. If a → b and b → c then a → c.

Two distinct events a and b are concurrent if a 6→ b and b 6→ a. Assume a 6→ a for any event a. So → is an irreflexive partial ordering on the set of all events in the system. The ordering is only partial because events can be concurrent in which case it is not known which event happened first.

One can also say that a → b means that a caused b to happen, e.g., the sending of a message causes the receipt of the same message. Events a and b are concurrent if they in no way can causally affect each other, e.g., events p3 and q3 in Fig. 1. So concurrent events do not necessarily have to occur at the same time. As long as there is no causality between them, they are concurrent.

 

Logical clocks

  • • No physical time is associated with the clocks. Clocks just assign numbers to events as they happen. The numbers assigned are the times at which events occurred.
  • • Clock Ci ≡ a function which assigns a number Cihai to any event a in process Pi
  • • Clock C ≡ a function which assigns a number Chbi to any event b in the system where Chbi = Cj hbi if b is an event in process Pj . So C represents all the clocks in the system.
  • • For system events to be ordered correctly, the Clock Condition must be satisfied:
    • For any events a, b: if a → b then Chai < Chbi.  
    • If event a happened before event b, then the clock value assigned to a is less than that assigned to b.
  • • The Clock Condition is satisfied if the following two conditions hold:
    • C1. If a and b are events in process Pi and a comes before b, then Cihai < Cihbi.
    • C2. If a is the sending of a message by process Pi and b if the receipt of that message by process Pj , then Cihai < Cj hbi.
    • C1 means that there must be at least a clock tick between any two events in a process.
    • C2 means that there must be at least a clock tick between the sending of a message by a process and its corresponding receipt by another process.
  • • System clocks must satisfy conditions C1 and C2 to satisfy the Clock Condition. For this to happen, system processes must obey the following implementation rules:
    • IR1. Each process Pi increments Ci between any two successive events.
    • IR2. a. If event a is the sending of a message m by process Pi , then the message m contains a timestamp Tm = Cihai. b. Upon receiving message m, process Pj sets Cj greater than or equal to its current value and greater than Tm.
    • IR1 causes events in a process to happen at different “logical” times and satisfies condition C1.
    • IR2 causes process clocks to be synchronized and satisfies condition C2.

 

Total ordering of events

  • • A system of clocks that satisfy the Clock Condition can be used to totally order system events.
  • • To totally order the events in a system, the events are ordered according to their times of occurrence. In case two or more events occur at the same time, an arbitrary total ordering ≺ of processes is used. To do this, the relation ⇒ is defined as follows:
    • If a is an event in process Pi and b is an event in process Pj , then a ⇒ b if and only if either:
      • i. Cihai < Cj hbi or
      • ii. Cihai = Cj hbi and Pi ≺ Pj
      • There is total ordering because for any two events in the system, it is clear which happened first.
  • • The total ordering of events is very useful for distributed system implementation.

 

The mutual exclusion problem

A system has a fixed number of processes which share one resource. The shared resource can be used by only one process at a time. Assume that the resource is initially allocated to one process. Find an algorithm that allocates the resource to a process and satisfies the following conditions:

  1. I. A process using the resource must release it before it can be given to another process.
  2. II. Requests for the resource must be granted in the order in which they were made. (This condition does not indicate what should be done when two processes request for the resource at the same time. Which process should get it?)
  3. III. If every process using the resource eventually releases it, then every request is eventually granted (no starvation).

 

A distributed algorithm to solve the mutual exclusion problem

Assumptions

  • • For any two processes Pi and Pj , the messages sent from Pi to Pj are received in the same order they were sent.
  • • Every message is eventually received.
  • • A process can send messages directly to every other process in the system.

Algorithm

Let P0 be the process to which the shared resource is initially allocated. Let T0 be less than the initial value of any logical clock in the system. Each process has its own private request queue. Initially, each request queue contains one message, T0 : P0 requests resource. The following rules define the algorithm. Each rule is an event. Note that implementation rules IR1 and IR2 are used to maintain the process clocks.

  1. 1. Resource request. Process Pi sends the message Tm : Pi requests resource to every other process where Tm is the process clock’s value at the time of the request. Pi also puts the request message on its request queue.
  2. 2. Resource request receipt. Pj receives Pi ’s request message. Pj then puts the message on its request queue and sends an acknowledgement to Pi . By IR2, the the acknowledgement is timestamped later than Tm. If Pj has already sent Pi a message timestamped later than Tm, it doesn’t have to send an acknowledgement since all that Pi needs is a message from Pj timestamped later than Tm.
  3. 3. Resource release. Pi removes request message Tm : Pi requests resource from its queue and sends the release message Pi releases resource to every other process.
  4. 4. Resource release receipt. Pj receive’s Pi ’s resource release message. Pj removes the Tm : Pi requests resource from its request queue.
  5. 5. Resource allocation. Pi is allocated the resource when:
    1. (i) There is a Tm : Pi requests resource message in Pi ’s request queue which is ordered before any other request in the queue by ⇒.
    2. (ii) Pi has received messages from every other process timestamped later than Tm.

See paper on how the above rules satisfy conditions I—III of the mutual exclusion problem.

 

Anomalous behavior

The algorithm described above orders requests using the relation ⇒. This can cause the following anomalous behavior to occur. Assume a system of interconnected computers across the country with some shared resource. A user issues request a on computer A to request for the shared resource. He then calls a friend in another city to issue request b on computer B to also request for that resource. Request b can be ordered before a on computer B if the request message for a is received by B after request b has been made. This causes computers A and B to have requests ordered differently. a comes before b on computer A and b comes before a on computer B. It can happen that at some point, a is the first request on computer A and b is the first on B. This satisfies condition (i) of the algorithm’s rule 5 (resource allocation rule). Assume that condition (ii) has already been satisfied. Then both computers will try to obtain the shared resource at about the same time causing a conflict.

There are two possible ways to avoid such anomalous behavior:

  1. 1. Give the user the responsibility for avoiding anomalous behavior. For example, when request a is made, the user making the request is given the timestamp for that request. When that user then calls his friend, the friend can request that request b be given a later timestamp than a.
  2. 2. Let S be the set of all system events and S be the set containing S along with relevant events external to the system. Let ,→ denote the “happened before” relation for S. Construct a system of independent physical clocks that satisfies the following Strong Clock Condition: For any events a, b in S: if a ,→ b then Chai < Chbi.

 

Physical clocks

Let Ci(t) denote the reading of clock Ci at physical time t. Assume that Ci(t) is a continuous, differentiable function of t except for isolated discontinuities introduced by clock resets. Then dCi(t) dt is the rate at which clock Ci is running at time t. To satisfy the Strong Clock Condition, the system of physical clocks must satisfy the following conditions:

  • PC1. There exists a constant κ ¿ 1 such that for all i: | dCi(t)/dt − 1| < κ.
  • PC2. For all i, j : |Ci(t) − Cj (t)| < ².

Condition PC1 says that the rate at which each physical clock Ci runs should vary only by a very small amount (bounded by κ). The paper assumes that PC1 is satisfied, i.e., that clocks run at approximately the correct rate. Condition PC2 says that for all the clocks in the system to be synchronized, their readings at time t should vary within a threshold amount (²). See the paper for details on how to determine the values of κ and ². To guarantee that PC2 is satisfied by the system of physical clocks, processes must obey the following implementation rules (specialization of IR1 and IR2). Let m be a message sent at physical time t and received at time t0. Let vm = t0 − t be the total delay of m which is unknown to the receiving process. Let µm be some minimum delay ≥ 0 known by the receiving process such that µm ≤ vm. Let ξm = vm − µm be the unpredictable delay of m. Assume that each event occurs at a specific instant of physical time and that events of a process occur at different times.

IR10. For each i, if Pi does not receive a message at physical time t, then Ci is differentiable at t and dC(t)/dt > 0.

IR20. a. If Pi sends a message m at physical time t, then m contains a timestamp Tm = Ci(t). b. Upon receiving a message m at time t0, process Pj sets Cj (t0) equal to max(Cj (t0 − 0), Tm + µm).

IR10 states that clock readings change with physical time. IR20 states how the clocks synchronize with each other (and that this synchronization is coupled with message receipts since this is the only way that processes can communciate with each other). Pj ’s clock is set to either its current time or the time at which the message is sent plus the expected minimum transmission delay, whichever is greater.

See paper for the theorem (and its proof) which states that IR10 and IR20 establish PC2. The theorem also bounds the time it takes for the clocks to get synchronized at system startup time.

你可能感兴趣的:(分布式,分布式,Lamport,逻辑时钟)