分布式服务 Replication

Why we need replication ?

  1. High available.  We can tolerate one machine goes down or Internet interruption.
  2. Lower Latency.  place data geographically close to user.
  3. Scalability.  We can handle higher volume of reads by performing reading in replica.

 

How many kinds of replication way we can do?

  1. Single-leader replication. 
  2. Multi-leader replication.
  3. Leaderless replication

 

What’s the difference between these replication ways ?

Single-Leader replication

Upside:

  1. Easy to understand.
  2. No conflict resolution.
  3. We can do trade-off between durability and availability. Asynchronous, semi-Asynchronous, synchronous. And others like chain replication in Microsoft Azure

 

Multi-Leader replication

Upside:

  1. Tolerate datacenter outage
  2. Processed in the local datacenter and better perceived performance.

Downside:

  1. Have to consider conflict resolution. 

Note:

  1. The best way is to avoid conflict write like all write for a particular method go through the same leader.

 

Leaderless replication

Upside:

  1. No need to replica data from leader to follower.
  2. No failover.

Downside:

  1. You may read stale data under Quorum 
  2. Can not get reading your writes, monotonic reads, or consistent prefix reads.

 

 

How can we replicate write data from leader to follower ?

  1. statement replicate like SQL. Downside: nondeterministic 
  2. Write ahead log. Recording bytes change of blocks. Downside: couple to the storage
  3. Bin log. Row based replication. data change of table row.

 

What should we guarantee if large replication lag happens ?

Read-After-write:  a. read from leader b.save the timestamp of update and wait until that. 

Monotonic reads: make sure each user always make their reads from the same replica

consistent prefix reads:  all writes to the same partition

 

How can we do conflict resolution ?

  1. losing data is acceptable,  Last write wins (LWW) . 
  2. Each operation with one unique key
  3. Version vector and merge concurrent written values.

你可能感兴趣的:(分布式)