做2020的MIT6.824,完成了实验Raft Lab2B,通过了测试,对于之前一个实验请参考2020 6.824 的 Raft Lab 2A
这个实验坑明显比2A多,花了大概3周时间才全部pass,其中20%时间在理解,10%时间在coding,剩下70%时间在debug,于是顺便养成看log的习惯
Lab2B 部分我也是没有做优化的,也就是这个部分的优化实现,没有conflictIndex以及conflictTerm同样也是可以通过Lab2B的Test的
下面有这个链接对我的实验测试很有帮助,主要是为了多测测试,保证没有因为概率通过而miss掉的一些测试用例
##每20个test并行运行,运行100次2B的test
bash test_many.sh 100 20 2B
实验要求是不能参考别人的代码的,这个我没有完全准守,下面是我参考的链接。不过,使用别人的代码同时也给我挖了个坑,就是把别人的代码片段copy过来自己用,有时候一些边界条件特别容易忽略,所以其实看看别人的思路(框架),再结合自己的代码自己实现,可以少采坑,当然,最好是自己重头到尾撸一遍。好了,下面是我参考的实现
在整体框架上我还是沿用了我Raft 2A的设计,那么2B的实现主要完善了2A中的两个方法
同时,需要完成一些log同步相关的helper function
func (rf *Raft) SendHeartbeat() {
for !rf.killed() {
...
for i := 0; i < len(rf.peers); i++ {
...
args := AppendEntriesArgs{
...
}
go func(p int, args *AppendEntriesArgs) {
...
if reply.Success == true {
//成功处理
...
} else {
//失败处理
...
}
}(i, &args)
}
}()
}
}
nextIndex := rf.nextIndex[i]
entries := make([]LogEntry, 0)
entries = append(entries, rf.log[nextIndex:]...)
args := AppendEntriesArgs{
Term: rf.currentTerm,
LeaderId: rf.me,
Entries: entries,
PrevLogIndex: rf.getPrevLogIndex(i),
PrevLogTerm: rf.getPrevLogTerm(i),
LeaderCommit: rf.commitIndex,
}
func (rf *Raft) convertToLeader() {
...
//每个节点下一次应该接收的日志的index(初始化为Leader节点最后一个日志的Index + 1)
rf.nextIndex = make([]int, len(rf.peers))
for i := 0; i < len(rf.peers); i++ {
rf.nextIndex[i] = rf.getLastLogIndex() + 1
}
//每个节点已经复制的日志的最大的索引(初始化为0,之后递增)
//init match index is [0 0 0]
rf.matchIndex = make([]int, len(rf.peers))
}
需要跟新nextIndex以及matchIndex, 注意nextIndex的值以及log的长度可能已经被别的线程修改了,所以对于matchIndex
rf.matchIndex[p] = args.PrevLogIndex + len(args.Entries)
rf.nextIndex[p] = rf.matchIndex[p] + 1
同时,需要查看commitIndex是否需要跟新,对应paper就是,其实就是找一个MatchIndex的中位数N,如果N更大则跟新当前MatchIndex
If there exists an N such that N > commitIndex, a majority of matchIndex[i] ≥ N, and log[N].term == currentTerm: set commitIndex = N
对应代码就是
func (rf *Raft) advanceCommitIndex() {
sortedMatchIndex := make([]int, len(rf.matchIndex))
copy(sortedMatchIndex, rf.matchIndex)
sortedMatchIndex[rf.me] = len(rf.log) - 1
sort.Ints(sortedMatchIndex)
N := sortedMatchIndex[len(rf.peers)/2]
if rf.currentState == Leader && N > rf.commitIndex && rf.log[N].Term == rf.currentTerm {
rf.commitIndex = N
rf.applyLog()
}
我当前的处理是不完整的,但是也是可以通过Test,思路就是减少matchIndex的值再发送一次,
rf.nextIndex[p] = args.PrevLogIndex
后续试验应该会有优化的。对于优化的实现,参考 MIT Tutor出品的student guide
func (rf *Raft) SendHeartbeat() {
for !rf.killed() {
time.Sleep(10 * time.Millisecond)
func() {
rf.mu.Lock()
defer rf.mu.Unlock()
if rf.currentState != Leader {
return
}
now := time.Now()
if now.Sub(rf.lastBroadcastTime) < 100*time.Millisecond {
return
}
rf.lastBroadcastTime = time.Now()
for i := 0; i < len(rf.peers); i++ {
if i == rf.me {
continue
}
nextIndex := rf.nextIndex[i]
entries := make([]LogEntry, 0)
entries = append(entries, rf.log[nextIndex:]...)
args := AppendEntriesArgs{
Term: rf.currentTerm,
LeaderId: rf.me,
Entries: entries,
PrevLogIndex: rf.getPrevLogIndex(i),
PrevLogTerm: rf.getPrevLogTerm(i),
LeaderCommit: rf.commitIndex,
}
go func(p int, args *AppendEntriesArgs) {
reply := AppendEntriesReply{}
ok := rf.sendAppendEntries(p, args, &reply)
if !ok {
return
}
rf.mu.Lock()
defer rf.mu.Unlock()
if rf.currentTerm != args.Term {
return
}
if reply.Term > rf.currentTerm {
rf.convertToFollower(reply.Term)
return
}
if reply.Success == true {
//如果成功:更新相应跟随者的 nextIndex 和 matchIndex
rf.matchIndex[p] = args.PrevLogIndex + len(args.Entries)
rf.nextIndex[p] = rf.matchIndex[p] + 1
rf.advanceCommitIndex()
} else {
rf.nextIndex[p] = args.PrevLogIndex
}
}(i, &args)
}
}()
}
}
2A的AppendEntries可是说是相当粗糙的,实验2B可是重头戏,paper说到的5个规则一个都不能少
- Reply false if term < currentTerm (§5.1)
- Reply false if log doesn’t contain an entry at prevLogIndex whose term matches prevLogTerm (§5.3)
- If an existing entry conflicts with a new one (same index but different terms), delete the existing entry and all that follow it (§5.3)
- Append any new entries not already in the log
- If leaderCommit > commitIndex, set commitIndex = min(leaderCommit, index of last new entry)
我的实现就是先检测reply=false的情况,使用goto早退出,然后处理日志的保存以及更新commitIndex
func (rf *Raft) AppendEntries(args *AppendEntriesArgs, reply *AppendEntriesReply) {
rf.mu.Lock()
defer rf.mu.Unlock()
isSuccess := false
conflictTerm := -1
conflictIndex := -1
...
if args.Term < rf.currentTerm {
goto label1
}
if args.Term > rf.currentTerm {
...
}
//If a follower does not have prevLogIndex in its log, it should return with conflictIndex = len(log) and conflictTerm = None.
if len(rf.log)-1 < args.PrevLogIndex {
...
goto label1
}
// 如果本地有前一个日志的话,那么term必须相同,否则false
if args.PrevLogIndex > 0 && rf.log[args.PrevLogIndex].Term != args.PrevLogTerm {
...
goto label1
}
// 保存日志
for i, logEntry := range args.Entries {
...
}
// If leaderCommit > commitIndex, set commitIndex = min(leaderCommit, index of last new entry)
if args.LeaderCommit > rf.commitIndex {
...
}
isSuccess = true
goto label1
label1:
rf.applyLog()
reply.Success = isSuccess
reply.Term = rf.currentTerm
reply.ConflictIndex = conflictIndex
reply.ConflictTerm = conflictTerm
return
}
// 如果本地有前一个日志的话,那么term必须相同,否则false
if args.PrevLogIndex > 0 && rf.log[args.PrevLogIndex].Term != args.PrevLogTerm {
goto label1
}
// 保存日志
for i, logEntry := range args.Entries {
index := args.PrevLogIndex + i + 1
if index > len(rf.log)-1 {
rf.log = append(rf.log, logEntry)
} else {
if rf.log[index].Term != logEntry.Term {
rf.log = rf.log[:index]
rf.log = append(rf.log, logEntry)
} // term一样啥也不用做,继续向后比对Log
}
}
对应的paper
If leaderCommit > commitIndex, set commitIndex = min(leaderCommit, index of last new entry)
if args.LeaderCommit > rf.commitIndex {
rf.commitIndex = args.LeaderCommit
if len(rf.log)-1 < rf.commitIndex {
rf.commitIndex = len(rf.log) - 1
}
}
func (rf *Raft) AppendEntries(args *AppendEntriesArgs, reply *AppendEntriesReply) {
rf.mu.Lock()
defer rf.mu.Unlock()
isSuccess := false
conflictTerm := -1
conflictIndex := -1
rf.lastReceived = time.Now()
if args.Term < rf.currentTerm {
goto label1
}
if args.Term > rf.currentTerm {
rf.convertToFollower(args.Term)
}
//If a follower does not have prevLogIndex in its log, it should return with conflictIndex = len(log) and conflictTerm = None.
if len(rf.log)-1 < args.PrevLogIndex {
conflictIndex = len(rf.log)
goto label1
}
// 如果本地有前一个日志的话,那么term必须相同,否则false
if args.PrevLogIndex > 0 && rf.log[args.PrevLogIndex].Term != args.PrevLogTerm {
goto label1
}
// 保存日志
for i, logEntry := range args.Entries {
index := args.PrevLogIndex + i + 1
if index > len(rf.log)-1 {
rf.log = append(rf.log, logEntry)
} else {
if rf.log[index].Term != logEntry.Term {
rf.log = rf.log[:index]
rf.log = append(rf.log, logEntry)
} // term一样啥也不用做,继续向后比对Log
}
}
// If leaderCommit > commitIndex, set commitIndex = min(leaderCommit, index of last new entry)
if args.LeaderCommit > rf.commitIndex {
rf.commitIndex = args.LeaderCommit
if len(rf.log)-1 < rf.commitIndex {
rf.commitIndex = len(rf.log) - 1
}
}
isSuccess = true
goto label1
label1:
rf.applyLog()
reply.Success = isSuccess
reply.Term = rf.currentTerm
reply.ConflictIndex = conflictIndex
reply.ConflictTerm = conflictTerm
return
}
试验代码有改函数的解释,大概意思就是应用层使用Raft的时候回调用Start函数,如果Raft接收到但不是leader,返回false,否则把相关应用层发过来的command append在leader的log后面,填好对应参数(Index, Term),立即返回
the service using Raft (e.g. a k/v server) wants to start agreement on the next command to be appended to Raft’s log. if this server isn’t the leader, returns false. otherwise start the agreement and return immediately
func (rf *Raft) Start(command interface{}) (int, int, bool) {
rf.mu.Lock()
defer rf.mu.Unlock()
index := -1
term := -1
isLeader := true
// Your code here (2B).
term = rf.currentTerm
isLeader = rf.currentState == Leader
if isLeader {
index = len(rf.log)
entry := LogEntry{
Command: command,
Index: index,
Term: term,
}
rf.log = append(rf.log, entry)
}
return index, term, isLeader
}
在2A的实验中,VoteGranted 并没有考虑candidate的log时候跟receiver的log对比,也就是
If votedFor is null or candidateId, and candidate’s log is at least as up-to-date as receiver’s log, grant vote (§5.2, §5.4)
对于2B这个实验,log的对比是需要考虑进去的,所以在代码中加上isLogMoreUpToDate即可
func (rf *Raft) RequestVote(args *RequestVoteArgs, reply *RequestVoteReply) {
...
if (rf.votedFor == -1 || rf.votedFor == args.CandidateId) && rf.isLogMoreUpToDate(args.LastLogIndex, args.LastLogTerm) {
rf.votedFor = args.CandidateId
reply.VoteGranted = true
}
...
}
func (rf *Raft) isLogMoreUpToDate(index int, term int) bool {
return term > rf.getLastLogTerm() || (term == rf.getLastLogTerm() && index >= rf.getLastLogIndex())
}
对应paper就是,也就对比commitIndex以及lastApplied,如果commitIndex更大,则同步applylog[lastApplied]到状态机,也就是发送一个包含ApplyMsg的messge给rf.applyCh
If commitIndex > lastApplied: increment lastApplied, applylog[lastApplied] to state machine (§5.3)
func (rf *Raft) applyLog() {
for rf.commitIndex > rf.lastApplied {
rf.lastApplied += 1
entry := rf.log[rf.lastApplied]
msg := ApplyMsg{
CommandValid: true,
Command: entry.Command,
CommandIndex: entry.Index,
}
rf.applyCh <- msg
}
}