Raft Understandable Distributed Consensu

Raft Understandable Distributed Consensus

So What is Distributed Consensus?
Let’s start with an example…

Consensus一致; 舆论; 一致同意,合意
LetLettish 列托人的,列托语的,列托语; 允许,任由; 让,随
start with以…开始
example例子; 范例; 榜样; 先例; 作为…的例子; 为…做出榜样; 举例; 作为…的示范

Let’s say we have a single node system.
For this example, you can think of our node as a database server that stores a single value.
We also have a client that can send a value to the server.
Coming to agreement, or consensus, on that value is easy with one node.

single单一的; 一对一的; 惟一的; 适于一人的; 单程票; 单人房间; 二人对抗赛; 未婚男子; 挑选; 作一垒手
For this为此
example例子; 范例; 榜样; 先例; 作为…的例子; 为…做出榜样; 举例; 作为…的示范
think of想起; 考虑; 有…想法; 对…有意见
node节点; 植物的节
database server数据库服务器
stores商店( store的名词复数 ); 贮存; 大量; 零售商店
single value单一值
Coming即将到来的; 下一个的; 将要遭到报应; 自食恶果; 到来; 到达; 来( come的现在分词); 达到; 出生
agreement协定,协议; 同意,一致; 合同书; 一致
consensus一致; 舆论; 一致同意,合意
value价值,价格; 意义,涵义; 重要性; 面值; 评价; 重视,看重; 估价,给…定价
easy容易的; 舒适的; 宽裕的; 从容的; 容易地; 不费力地; 悠闲地; 缓慢地; 停止划桨; 向发出停划命令

But how do we come to consensus if we have multiple nodes?
That’s the problem of distributed consensus
Raft is a protocol for implementing distributed consensus.
Let’s look at a high level overview of how it works.
A node can be in 1 of 3 states:
The Follower state,the Candidate state,or the Leader state.

multiple多重的; 多个的; 复杂的; 多功能的; <数>倍数; 并联; 连锁商店; 下有多个分社的

All our nodes start in the follower state.
If followers don’t hear from a leader then they can become a candidate.
The candidate then requests votes from other nodes.
Nodes will reply with their vote.
The candidate becomes the leader if it gets votes from a majority of nodes.
This process is called Leader Election.

All changes to the system now go through the leader.
Each change is added as an entry in the node’s log.
This log entry is currently uncommitted so it won’t update the node’s value.
To commit the entry the node first replicates it to the follower nodes…then the leader waits until a majority of nodes have written the entry.
The entry is now committed on the leader node and the node state is “5”.
The leader then notifies the followers that the entry is committed.
The cluster has now come to consensus about the system state.
This process is called Log Replication.

Leader Election领导人选举

In Raft there are two timeout settings which control elections.
First is the election timeout.
The election timeout is the amount of time a follower waits until becoming a candidate.
The election timeout is randomized to be between 150ms and 300ms.
After the election timeout the follower becomes a candidate and starts a new election term…votes for itself…and sends out Request Vote messages to other nodes.
If the receiving node hasn’t voted yet in this term then it votes for the candidate…and the node resets its election timeout.
The leader begins sending out Append Entries messages to its followers.
These messages are sent in intervals specified by the heartbeat timeout.
Followers then respond to each Append Entries message.
This election term will continue until a follower stops receiving heartbeats and becomes a candidate.
Let’s stop the leader and watch a re-election happen.
Node A is now leader of term 2.
Requiring a majority of votes guarantees that only one leader can be elected per term.
If two nodes become candidates at the same time then a split vote can occur.
Let’s take a look at a split vote example…
Two nodes both start an election for the same term…
Now each candidate has 2 votes and can receive no more for this term.
The nodes will wait for a new election and try again.
Node C received a majority of votes in term 7 so it becomes leader.

Log Replication 日支复制

Once we have a leader elected we need to replicate all changes to our system to all nodes.
This is done by using the same Append Entries message that was used for heartbeats.
Let’s walk through the process.
First a client sends a change to the leader.
The change is appended to the leader’s log…then the change is sent to the followers on the next heartbeat.
An entry is committed once a majority of followers acknowledge it…
…and a response is sent to the client.
Now let’s send a command to increment the value by “2”.
Our system value is now updated to “7”.

Raft can even stay consistent in the face of network partitions.
Let’s add a partition to separate A & B from C, D & E.
Because of our partition we now have two leaders in different terms.
Let’s add another client and try to update both leaders.
One client will try to set the value of node B to “3”.
Node B cannot replicate to a majority so its log entry stays uncommitted.
The other client will try to set the value of node C to “8”.
This will succeed because it can replicate to a majority.
Now let’s heal the network partition.
Node B will see the higher election term and step down.
Both nodes A & B will roll back their uncommitted entries and match the new leader’s log.
Our log is now consistent across our cluster.
参考
https://www.jdon.com/artichect/raft.html
http://thesecretlivesofdata.com/raft/

你可能感兴趣的:(算法)