Data replication 同步技术


复本同步技术

There are 2 ways how the master propagate updates to the slave; State transfer and Operation transfer.

  • In State transfer, the master passes its latest state to the slave, which then replace its current state with the latest state.
  • In Operation transfer, the master propagate a sequence of operations to the slave which then apply the operations in its local state.

State Transfer Model (状态传输模型)

优点是比较robust, 消息丢失后面还可以同步弥补, 缺点传输的流量比较大 
当然为了提高效率, 也不会每次都传输所有的state, 只会传输改变的部分(delta change) 
问题是, 如果在分布式的环境下知道异地复本之间的差异? 最简单的方法就是把一方的数据全传过来看看, 但这样无法减少传输量 
常用的方法是使用Merkle tree, 通过传递digest来降低传输量, 来定位delta change

The state transfer model is more robust against message lost because as long as a latter more updated message arrives, the replica still be able to advance to the latest state.

Even in state transfer mode, we don't want to send the full object for updating other replicas because changes typically happens within a small portion of the object. In will be a waste of network bandwidth if we send the unchanged portion of the object, so we need a mechanism to detect and send just the delta (the portion that has been changed). 
One common approach is break the object into chunks and compute a hash tree of the object (Merkle tree). So the replica can just compare their hash tree to figure out which chunk of the object has been changed and only send those over.

Merkle Tree

Hash tree, http://en.wikipedia.org/wiki/Hash_tree

Hash trees were invented in 1979 by Ralph Merkle
hash tree or Merkle tree is a tree in which every non-leaf node is labelled with the hash of the labels of its children nodes.

如何使用Merkle trees经行高效的复本同步, 参考Amazon's Dynamo, 4.7处理永久性故障:副本同步

Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees

http://www.slideshare.net/quipo/modern-algorithms-and-data-structures-1-bloom-filters-merkle-trees

Data replication 同步技术_第1张图片

Data replication 同步技术_第2张图片

 

Operation transfer mode (操作日志传输模型)

Operation transfer 优点是需要传输的数据少, 但需要可靠的消息系统来保证不丢失 
在分布式的环境下会大大增加复杂性, 似乎很少有使用这种方案的 
In operation transfer mode, usually much less data need to be send over the network. However, it requires a reliable message mechanism with delivery order guarantee.

 

基于Gossip和Vector Clock的同步技术

State Transfer Model

In a state transfer model, each replica maintain a vector clock as well as a state version tree where each state is neither > or < among each other (based on vector clock comparison). In other words, the state version tree contains all the conflicting updates.  
每个replica都维护vector clock, 以及state version tree(Merkle tree)

Query Processing

Client发出query request, 并附上该replica在client端的V-client 
Server端收到后, 返回具有比client所附带vector clock新(V-state > V-client)的那部分数据和其对应的vector clock 
Client收到查询结果后, 会将自己的vector clock和server传回的vector clock进行merge

Data replication 同步技术_第3张图片

Update Processing

Client发出update命令并附上当前V-client 
Server收到后, 发现Vclient比当前服务器的所有V-state都旧(V-client < all V-state, 表明当前服务器的状态已经包含此更新), 则抛弃该更新命令 
否则说明是新的改动, server更新状态值, V-state和merle tree, 并返回新的V-state

Data replication 同步技术_第4张图片

Internode Gossiping

版本间同步, 需要通过Merkle Tree来定位delta change, 并最终完成同步

Data replication 同步技术_第5张图片

 

Operation Transfer Model

In an operation transfer approach, the sequence of applying the operations is very important. At the minimum causal order need to be maintained. 
Because of the ordering issue, each replica has to defer executing the operation until all the preceding operations has been executed
Therefore replicas save the operation request to a log file and exchange the log among each other and consolidate these operation logs to figure out the right sequence to apply the operations to their local store in an appropriate order.

对于这种模式, 必须保证 
1. 消息不丢失 
2. 所有更新命令以相同顺序被执行, 全序问题 
3. 更新命令需要defer执行, 直到在他之前的更新命令都已经被执行

 

Query Processing, 和STM没啥区别  
When a query is submitted by the client, it will also send along its vector clock which reflect the client's view of the world. The replica will check if it has a view of the state that is later than the client's view.

Data replication 同步技术_第6张图片

 

Update Processing

When an update operation is received, the replica will buffer the update operation until it can be applied to the local state. Every submitted operation will be tag with 2 timestamp, V-client indicates the client's view when he is making the update request. V-receive is the replica's view when it receives the submission. 
This update operation request will be sitting in the queue until the replica has received all the other updates that this one depends on. This condition is reflected in the vector clock Vi when it is larger than V-client. 
关键在于如何知道该更新操作之间的操作是否已经被执行? 
通过vector clock的比较是一种办法, 如图中算法, 当V-client < Vi(replica当前的vector)的时候, 认为replica已经执行过之前所有的更新命令

但vector clock只能保证偏序, 而非全序, 所以这个方法并无法保证更新顺序完全一致

The concurrent update problem at different replica can also happen. Which means there can be multiple valid sequences of operation. In order for different replica to apply concurrent update in the same order, we need a total ordering mechanism.

Data replication 同步技术_第7张图片

 

Internode Gossiping, 本质上update没有区别

On the background, different replicas exchange their log for the queued updates and update each other's vector clock. After the log exchange, each replica will check whether certain operation can be applied (when all the dependent operation has been received) and apply them accordingly.

Data replication 同步技术_第8张图片

你可能感兴趣的:(Data replication 同步技术)