多数据中心一致性协议-MDCC

MDCC: Multi-Data Center Consistency

A new commit protocol and transaction programming model for efficiently achieving strong consistency in databases across data centers.

 

MDCC Overview

Consistency across data centers

 

With the emergence of cloud services, distributed databases have benefited from many of the advantages of deploying on a clusters of machines. However, entire data centers of machines can fail. Here are just a few recent data center failures, and one even losing customer data. The simplest technique to make databases resilient to data center failures is to replicate the data to multiple, ideally geographically diverse, data centers. However, the only way to make sure the data is durable in the face of failures is to synchronously replicate the data. This means when storing and committing data, the data must be stored in multiple data centers before being considered durable. However, network communication between geographically diverse data centers is very slow and unpredictable. Developing predictable applications with this network variability is difficult, and committing data across data centers is slow. As a result, typical systems either give up on transactions and consistency, or use asynchronous replication.

MDCC (Multi-Data Center Consistency) is a new database solution which provides full transactions with strong consistency, and synchronous replication for fault-tolerant durability. MDCC includes two main components:

PLANET Transaction Programming Model:
A Service-level-objective (SLO) aware transaction programming model which exposes more details of the stages of the transaction. Provides the developer more information and flexibility to handle unpredictable network latencies. Visit the  PLANET (Predictive Latency-Aware NEtworked Transactions) page for more details.
MDCC Commit Protocol:
A new cross-data center latency aware commit protocol which can durably commit transactions with one round-trip message delay in most cases. Minimizes the amount of synchronous network communication to improve transaction latency and throughput performance.

For those interested in more details of the MDCC commit protocol, you can read our MDCC paper or our MDCC presentation from Eurosys 2013.

 

MDCC Commit Protocol

Minimizing message round trips

 

The MDCC commit protocol is based on the family of Paxos algorithms. The Paxos algorithm works to get a set of participants to reach consensus on, or choose a value. Paxos guarantees that only a single value will ever be chosen by the set of participants, regardless of the number of proposers proposing new values.Classic and Multi-Paxos, Fast Paxos, and Generalized Paxos are different variations and optimizations of the Paxos consensus algorithm. The MDCC commit protocol utilizes all of these variations of Paxos for different situations. At the core, MDCC uses Paxos to get replica storage nodes to agree upon updates to records. By taking advantage of all these optimizations, the MDCC commit protocol can commit transactions with only one round trip time for many situations.

Here, we focus on a few key aspects of the commit protocol: Paxos, MDCC transactions with options, and efficient commutative updates with integrity constraints. More details can be found in our MDCC paper.

Background on Paxos

MDCC Transactions

The MDCC commit protocol achieves transactional updates by getting storage nodes to agree on update options, similar to the escrow method. An option of an update is basically a promise that the update can complete at some point in the future, but it has not yet been applied. When all the options of the updates to all the records in a transaction have been agreed upon by the storage nodes with Paxos (any previously mentioned variation of Paxos), the transaction is committed, since Paxos guarantees that the update options cannot be lost or forgotten. Therefore the transaction is durable. A final asynchronous commit message is sent to execute the update and make it visible. MDCC primarily uses fast rounds of Generalized Paxos, so most transaction commits only take 1 round-trip.

Storage nodes only accept update options when the option can commit even if previously seen pending options are committed or aborted. Therefore, in order to accept an option, storage nodes must consider all the commit and abort possibilities of pending options. This guarantees that if a storage node accepts an optionfor an update, executing the commit will not fail.

MDCC Transaction Animation

Step through or play through an animation on how MDCC transactions executes.

v2v1p11v2p11v2p11v2p11v2p11v2p10v1p10v1p10v1p10v1p10v1Asynchronous Commit for VisibilityThe client transaction then asynchronously sends messages to the storage nodes to inform them of the commit, and to make the updates visible.Storage Node [p10 ,v1]Storage Node [p10 ,v1]Storage Node [p10 ,v1]Storage Node [p10 ,v1]Storage Node [p10 ,v1]LeaderProposer [Record 1]Storage Node [p11 ,v2]Storage Node [p11 ,v2]Storage Node [p11 ,v2]Storage Node [p11 ,v2]Storage Node [p11 ,v2]LeaderProposer [Record 2]Client TX 1

MDCC Domain Integrity Constraints for Commutative Updates

MDCC often uses fast rounds of Generalized Paxos to avoid contacting a master and further optimizes transactions with commutative updates to avoid conflicts for concurrent updates. However, for commutative operations, it is typical for some attributes in the database to have domain integrity constraints, such as attribute10. This is easy to enforce in a single site database, but difficult in a large, global distributed system. Each record has several storage nodes accepting transactions, so enforcing domain integrity constraints can be difficult without expensive, global coordination. MDCC uses a new demarcation protocolfor quorum based systems. The MDCC demarcation protocol is more optimistic and is more efficient by reducing the amount of explicit global coordination.

The protocol to ensure integrity constraints in MDCC works as follows: Say an attribute as a constraint like attribute10 and has an initial value of V. Also, suppose there are N=5 storage nodes, and the fast quorum size is Q=4. If each storage node allows decrement updates until the local value is at 0, there is a possibility that too many transactions may commit, and thus violating the integrity constraint. This may happen because a transaction only needs QN=45 of the storage nodes to respond, and in the worst case scenario, only Q=4 messages are received. It can be shown that by setting the lower limit to:

lowerLimitNQNV=V5
the lower limit guarantees satisfaction of the integrity constraint, while still keeping the limit low enough to achieve better concurrency. When storage nodes reach this lower limit, they will have to start rejecting new updates, which may fail potentially constraint violating transactions.

MDCC Demarcation Animation

Step through or play through an animation on how the MDCC demarcation protocol works.

tx 5-1tx 5-1tx 5-1tx 5-1tx 5-1tx 4-1tx 4-1tx 4-1tx 4-1tx 4-1tx 3-1tx 3-1tx 3-1tx 3-1tx 3-1tx 2-1tx 2-1tx 2-1tx 2-1tx 2-1tx 1-1tx 1-1tx 1-1tx 1-1tx 1-1MDCC Demarcation: Fast Propose w/ Commutative UpdatesMDCC uses the demarcation protocol to modify the limit for each storage node, to take into account the fast quorum size. The limit is slightly more conservative, but guarantees satisfying the integrity constraints. Therefore, each storage node only accepts the first 3 transactions, and rejects the last 2 transactions.Storage Node [- ,(t1 ,t2 ,t3)]Storage Node [- ,(t5 ,t1 ,t2)]Storage Node [- ,(t4 ,t5 ,t1)]Storage Node [- ,(t3 ,t4 ,t5)]Storage Node [- ,(t2 ,t3 ,t4) ]Initial Value: 4Constraint: value ≥ 0Proposer [Record 1]Proposer [Record 1]Proposer [Record 1]Proposer [Record 1]Proposer [Record 1]LIMIT ≥ (N - Q )/N ≥ 0.8Client TX 1Client TX 2Client TX 3Client TX 4Client TX 5

 

Additional Information

Or questions?

 

MDCC was developed in the AMPLab at UC Berkeley by Tim Kraska, Gene Pang, Mike Franklin, Samuel Madden, and Alan Fekete. Read our MDCC paper or download our MDCC presentation from Eurosys 2013, if you are interested in more details.

The PLANET page has more details on the transaction programming model of MDCC. Please visit it for more information.

If you have any comments or questions, feel free to email us at: [email protected], [email protected].

你可能感兴趣的:(云计算)