A new commit protocol and transaction programming model for efficiently achieving strong consistency in databases across data centers.
Consistency across data centers
With the emergence of cloud services, distributed databases have benefited from many of the advantages of deploying on a clusters of machines. However, entire data centers of machines can fail. Here are just a few recent data center failures, and one even losing customer data. The simplest technique to make databases resilient to data center failures is to replicate the data to multiple, ideally geographically diverse, data centers. However, the only way to make sure the data is durable in the face of failures is to synchronously replicate the data. This means when storing and committing data, the data must be stored in multiple data centers before being considered durable. However, network communication between geographically diverse data centers is very slow and unpredictable. Developing predictable applications with this network variability is difficult, and committing data across data centers is slow. As a result, typical systems either give up on transactions and consistency, or use asynchronous replication.
MDCC (Multi-Data Center Consistency) is a new database solution which provides full transactions with strong consistency, and synchronous replication for fault-tolerant durability. MDCC includes two main components:
For those interested in more details of the MDCC commit protocol, you can read our MDCC paper or our MDCC presentation from Eurosys 2013.
Minimizing message round trips
The MDCC commit protocol is based on the family of Paxos algorithms. The Paxos algorithm works to get a set of participants to reach consensus on, or choose a value. Paxos guarantees that only a single value will ever be chosen by the set of participants, regardless of the number of proposers proposing new values.Classic and Multi-Paxos, Fast Paxos, and Generalized Paxos are different variations and optimizations of the Paxos consensus algorithm. The MDCC commit protocol utilizes all of these variations of Paxos for different situations. At the core, MDCC uses Paxos to get replica storage nodes to agree upon updates to records. By taking advantage of all these optimizations, the MDCC commit protocol can commit transactions with only one round trip time for many situations.
Here, we focus on a few key aspects of the commit protocol: Paxos, MDCC transactions with options, and efficient commutative updates with integrity constraints. More details can be found in our MDCC paper.
The MDCC commit protocol achieves transactional updates by getting storage nodes to agree on update options, similar to the escrow method. An option of an update is basically a promise that the update can complete at some point in the future, but it has not yet been applied. When all the options of the updates to all the records in a transaction have been agreed upon by the storage nodes with Paxos (any previously mentioned variation of Paxos), the transaction is committed, since Paxos guarantees that the update options cannot be lost or forgotten. Therefore the transaction is durable. A final asynchronous commit message is sent to execute the update and make it visible. MDCC primarily uses fast rounds of Generalized Paxos, so most transaction commits only take 1 round-trip.
Storage nodes only accept update options when the option can commit even if previously seen pending options are committed or aborted. Therefore, in order to accept an option, storage nodes must consider all the commit and abort possibilities of pending options. This guarantees that if a storage node accepts an optionfor an update, executing the commit will not fail.
Step through or play through an animation on how MDCC transactions executes.
MDCC often uses fast rounds of Generalized Paxos to avoid contacting a master and further optimizes transactions with commutative updates to avoid conflicts for concurrent updates. However, for commutative operations, it is typical for some attributes in the database to have domain integrity constraints, such as attribute1≥0. This is easy to enforce in a single site database, but difficult in a large, global distributed system. Each record has several storage nodes accepting transactions, so enforcing domain integrity constraints can be difficult without expensive, global coordination. MDCC uses a new demarcation protocolfor quorum based systems. The MDCC demarcation protocol is more optimistic and is more efficient by reducing the amount of explicit global coordination.
The protocol to ensure integrity constraints in MDCC works as follows: Say an attribute as a constraint like attribute1≥0 and has an initial value of V. Also, suppose there are N=5 storage nodes, and the fast quorum size is Q=4. If each storage node allows decrement updates until the local value is at 0, there is a possibility that too many transactions may commit, and thus violating the integrity constraint. This may happen because a transaction only needs QN=45 of the storage nodes to respond, and in the worst case scenario, only Q=4 messages are received. It can be shown that by setting the lower limit to:
Step through or play through an animation on how the MDCC demarcation protocol works.
Or questions?
MDCC was developed in the AMPLab at UC Berkeley by Tim Kraska, Gene Pang, Mike Franklin, Samuel Madden, and Alan Fekete. Read our MDCC paper or download our MDCC presentation from Eurosys 2013, if you are interested in more details.
The PLANET page has more details on the transaction programming model of MDCC. Please visit it for more information.
If you have any comments or questions, feel free to email us at: [email protected], [email protected].