最佳损友1020

mit6.824 2022 lab2

MIT6.824 2022 Raft

Raft
- leader election
- log
- persistence
- log compaction
- 整体测试
- 后面发现的问题
- 参考代码

汇总博客：MIT6.824 2022

Raft

leader election

不论是访问还是修改Raft可变类成员，都需要加锁

rf.mu.Lock()
if rf.state != Leader {
	rf.mu.Unlock()
	return
}
args := AppendEntriesArgs{Term: rf.currentTerm, LeaderId: rf.me}
rf.mu.Unlock()

可以改为

rf.mu.Lock()
flag := (rf.state == Leader)
args := AppendEntriesArgs{Term: rf.currentTerm, LeaderId: rf.me}
rf.mu.Unlock()
if !flag {
	return
}

2A时就尽可能实现更多功能，而不是仅仅通过测试，中文版论文用来大致了解raft算法，对照英文版论文编写代码。
问题：
RequestVote RPC中 at least as up-to-date对应于：

Raft determines which of two logs is more up-to-date by comparing the index and term of the last entries in the logs. If the logs have last entries with different terms, then the log with the later term is more up-to-date. If the logs end with the same term, then whichever log is longer is more up-to-date.

由于是at least as，还包括相等的情况。

votedFor：candidateId that received vote in current term (or null if none)
当前任期接受到的选票的候选者 ID（初值为 null）
这意味着每当term改变，votedFor都需要重置。

If election timeout elapses without receiving AppendEntries RPC from current leader or granting vote to candidate: convert to candidate
如果选举定时器超时时，没有收到 leader 的的追加日志请求或者没有投票给候选者，该机器转化为候选者。
注意不要忽略granting vote to candidate这一条
参考：
MIT 6.824 Spring 2020 Lab2 Raft 实现笔记

log

2B阶段实现起来没用多久，但一直在调试。发现vscode自带的调试功能挺好用的（go test插件？）

一些细节：
currentTerm： latest term server has seen (initialized to 0 on first boot, increases monotonically)
当前看到的最新任期，所以每当看到更大的任期，都需设置为该任期
If RPC request or response contains term T > currentTerm: set currentTerm = T, convert to follower
votedFor：candidateId that received vote in current term (or null if none)
当前任期接收投票的候选者ID，所以任期改变时，votedFor也需要重置
log[]：log entries; each entry contains command for state machine, and term when entry was received by leader (first index is 1)
日志记录，起始索引为1，可以在初始化时加入term为0的假日志，便于程序统一处理（lastLogTerm prevLogTerm）
commitIndex：将被提交的日志记录索引，可以通过matchIndex拷贝后排序来得到新的commitIndex
AppendEntries RPC：
rule3：只有现有条目与entries中的条目才需要删除其后所有日志，如果不冲突则不需要删除，注意比较时取二者索引较小值，以免数组越界
rule 5：index of last new entry而不是本地日志最大索引值，prevLogIndex+len（entries）
注意不需要对空日志进行特殊处理
RequestVote RPC：
rule2两个条件：

	condition1 := (rf.votedFor == InvalidId) || (rf.votedFor == args.CandidateId)
	condition2 := (args.LastLogTerm > lastLogTerm) || ((args.LastLogTerm == lastLogTerm) && (args.LastLogIndex >= lastLogIndex))

遇到的一些问题：
问题表现：各节点一直在选举，却迟迟没有选出leader

问题原因：写2A的时候没有对lastLogIndex lastLogTerm赋值，RequestVote在对比时一直通不过。

问题表现：测试刚显示通过，然后程序就runtime error，单独运行某个测试样例又无法复现
问题原因：没有考虑RPC调用的返回值就直接对reply结果进行处理

偶发性问题：
在明显的问题解决完后后，运行一次go test -run 2B -race显示全部通过，但执行脚本运行100次有时候却出现了几次错误。这种一定几率出现的问题就比较麻烦了。

for i in {1..100}; do go test -run 2B -race; done

Test (2B): concurrent Start()s ...
--- FAIL: TestConcurrentStarts2B (31.27s)
    config.go:549: only 1 decided for index 6; wanted 3
Test (2B): RPC counts aren't too high ...
--- FAIL: TestCount2B (32.34s)
    config.go:549: only 2 decided for index 11; wanted 3

这种的一遍就需要先阅读一遍测试样例，并在样例合适的地方添加一些打印（有可能打印语句也会影响测试结果），并一次次点击测试按钮，思考问题原因了。

不过我另辟蹊径，懒得debug，直接把lab2前面建议的一些文章认真看了一遍（之前看太长就只是粗略地看了一遍）
Students’ Guide to Raft
Raft Q&A
Debugging by Pretty Printing
Instructors’ Guide to Raft
发现我遇到的一些问题都在Students’ Guide to Raft提到了，不过我也对这些点记忆深刻了。
Students’ Guide to Raft记录
1 论文中的表述大多时候是must而不是should，例如不是说每当server接收RPC调用时都应该重置选举定时器，而是在receiving AppendEntries RPC from current leader or granting vote to candidate时候才重置选举定时器。
2 心跳信息不应该被视作特殊的一种消息，当follower接收了该心跳消息，则隐式地表明当前日志已于该server匹配，而后进行错误的提交。
3 当接收心跳消息后，简单地切断server日志prevLogIndex的部分，这同样是不对的，论文中的表述为If an existing entry conflicts with a new one (same index but different terms), delete the existing entry and all that follow it，以上为一个条件语句。
4 当选举定时器过期后，即使上一任选举尚未完成，也应该进行下次的选举。
5 reply false意味着立即返回
6 AppendEntries处理过程中即使prevLogIndex对应位置条目不存在，同样视作日志不匹配
7 检查AppendEntries函数，即使entries为空也同样能执行成功，同样考虑本地日志越界的情况
8 认真理解last new entry与at least as up-to-date的含义
9 注意更新commitIndex时log[N].term == currentTerm
10 nextIndex是乐观估计，matchIndex是悲观估计，即使在很多时候赋值都为nextIndex = matchIndex + 1
11 在接收的旧的RPC回复时，比较当前term与arg中term，如果不一样，则直接返回，不进行处理

看完一遍文章后发现我的代码存在四个问题：
1：对心跳消息的回复进行特殊处理，只检查返回的term是否大于当前term，没有涉及到nextIndex与matchIndex的设置与重试。
2：对旧的RPC回复没有处理。
3：matchIndex没有保证单调递增
4：对于RPC调用失败直接返回而不是重试（return而不是continue）

1 对于心跳消息可以直接复用Start调用的日志协商函数，心跳消息同样可以用来同步各server的日志
2 在判断RPC是否成功之后，reply处理之前比对当前Term与参数Term，如果不一致则直接返回
3 避免matchIndex的后退

rf.matchIndex[server] = Max(rf.matchIndex[server], args.PrevLogIndex+len(args.Entries))
rf.nextIndex[server] = rf.matchIndex[server] + 1

4 我认为论文中重试不是立即重试，而是等到定时器过期时再进行重试。
选举时RPC失败，则选举定时器过期，进行下一次选举时重试RequestVote RPC
协商时RPC失败，则心跳定时器过期，同步日志时重试AppendEntries RPC
故我认为应该直接返回，而且写成continue也会变成无限循环，不符合实际情况。

在执行1000次go test时，以上的两个问题都没再出现，而是出现data race的警告

Test (2B): leader backs up quickly over incorrect follower logs ...
==================
WARNING: DATA RACE
Write at 0x00c0005b0988 by goroutine 277:
  runtime.slicecopy()
      /usr/lib/go-1.13/src/runtime/slice.go:197 +0x0
  6.824/raft.(*Raft).AppendEntries()
      /root/mit6.824/6.824/src/raft/raft.go:291 +0x663
  runtime.call32()
      /usr/lib/go-1.13/src/runtime/asm_amd64.s:539 +0x3a
  reflect.Value.Call()
      /usr/lib/go-1.13/src/reflect/value.go:321 +0xd3
  6.824/labrpc.(*Service).dispatch()
      /root/mit6.824/6.824/src/labrpc/labrpc.go:496 +0x811
  6.824/labrpc.(*Server).dispatch()
      /root/mit6.824/6.824/src/labrpc/labrpc.go:420 +0x607
  6.824/labrpc.(*Network).processReq.func1()
      /root/mit6.824/6.824/src/labrpc/labrpc.go:240 +0x93

Previous read at 0x00c0005b0988 by goroutine 323:
  encoding/gob.encInt()
      /usr/lib/go-1.13/src/reflect/value.go:976 +0x1df
  encoding/gob.(*Encoder).encodeStruct()
      /usr/lib/go-1.13/src/encoding/gob/encode.go:328 +0x436
  encoding/gob.encOpFor.func4()
      /usr/lib/go-1.13/src/encoding/gob/encode.go:581 +0xf0
  encoding/gob.(*Encoder).encodeArray()
      /usr/lib/go-1.13/src/encoding/gob/encode.go:351 +0x26f
  encoding/gob.encOpFor.func1()
      /usr/lib/go-1.13/src/encoding/gob/encode.go:551 +0x1a3
  encoding/gob.(*Encoder).encodeStruct()
      /usr/lib/go-1.13/src/encoding/gob/encode.go:328 +0x436
  encoding/gob.(*Encoder).encode()
      /usr/lib/go-1.13/src/encoding/gob/encode.go:701 +0x1fe
  encoding/gob.(*Encoder).EncodeValue()
      /usr/lib/go-1.13/src/encoding/gob/encoder.go:251 +0x666
  encoding/gob.(*Encoder).Encode()
      /usr/lib/go-1.13/src/encoding/gob/encoder.go:176 +0x5b
  6.824/labgob.(*LabEncoder).Encode()
      /root/mit6.824/6.824/src/labgob/labgob.go:36 +0x7b
  6.824/labrpc.(*ClientEnd).Call()
      /root/mit6.824/6.824/src/labrpc/labrpc.go:93 +0x198
  6.824/raft.(*Raft).sendAppendEntries()
      /root/mit6.824/6.824/src/raft/raft.go:336 +0xd5
  6.824/raft.(*Raft).agreementTask.func1()
      /root/mit6.824/6.824/src/raft/raft.go:388 +0x562

Goroutine 277 (running) created at:
  6.824/labrpc.(*Network).processReq()
      /root/mit6.824/6.824/src/labrpc/labrpc.go:239 +0x174

Goroutine 323 (running) created at:
  6.824/raft.(*Raft).agreementTask()
      /root/mit6.824/6.824/src/raft/raft.go:376 +0xa3
==================
--- FAIL: TestBackup2B (16.85s)
    testing.go:853: race detected during execution of test

这个问题在于构建AppendEntries时entries对rf.log进行浅拷贝，在远程调用过程中会读取arg，如果该server一直为leader，其日志不会被更改，倒不会引发竞态问题。但当leader下台，日志就可以被更改，此时AppendEntries对日志进行修改，远程调用Call函数中读取该参数且未加rf.mu.lock，就引发了竞态问题，故应该对log进行深拷贝。

// 错误的
args.Entries = rf.log[rf.nextIndex[server]:]
// 正确的
args.Entries = append([]logEntry{}, rf.log[rf.nextIndex[server]:]...)

go语言为什么空切片，nil切片可以继续使用？append()函数
可以直接在bash脚本中设置次数为10000，啥时候不想跑crtl+z就可以了。

persistence

2C编写的代码比较简单，只需要照着persist/readPersist的example实现对应函数，然后在3个持久状态改变后调用persist函数。AppendEntries RPC的next index优化参照guidance中提供的方法

The accelerated log backtracking optimization is very underspecified, probably because the authors do not see it as being necessary for most deployments. It is not clear from the text exactly how the conflicting index and term sent back from the client should be used by the leader to determine what nextIndex to use. We believe the protocol the authors probably want you to follow is:
If a follower does not have prevLogIndex in its log, it should return with conflictIndex = len(log) and conflictTerm = None.
If a follower does have prevLogIndex in its log, but the term does not match, it should return conflictTerm = log[prevLogIndex].Term, and then search its log for the first index whose entry has term equal to conflictTerm.
Upon receiving a conflict response, the leader should first search its log for conflictTerm. If it finds an entry in its log with that term, it should set nextIndex to be the one beyond the index of the last entry in that term in its log.
If it does not find an entry with that term, it should set nextIndex = conflictIndex.

the one beyond the index不怎么能理解，就直接设置成最后一个等于该term的索引。next index只影响效率，不怎么影响正确性。如果2B没啥问题，2C也应该没啥问题，如果2C有问题，就应该运行几千次2B测试，验证一下。

问题：

Test (2C): basic persistence ...
  ... Passed --   4.1  3  216   46172    6
Test (2C): more persistence ...
2022/10/30 06:44:20 next index retry. cur term:5 me:3 server:1 value:2   reply：term：-1  index:2
2022/10/30 06:44:20 next index retry. cur term:5 me:3 server:2 value:2   reply：term：-1  index:2
2022/10/30 06:44:22 next index retry. cur term:9 me:1 server:4 value:5   reply：term：-1  index:5
2022/10/30 06:44:22 next index retry. cur term:9 me:1 server:0 value:5   reply：term：-1  index:5
2022/10/30 06:44:24 next index retry. cur term:13 me:4 server:3 value:8   reply：term：-1  index:8
2022/10/30 06:44:24 next index retry. cur term:13 me:4 server:2 value:8   reply：term：-1  index:8
2022/10/30 06:44:29 next index retry. cur term:16 me:2 server:1 value:11   reply：term：-1  index:11
2022/10/30 06:44:29 next index retry. cur term:16 me:2 server:0 value:11   reply：term：-1  index:11
2022/10/30 06:44:31 next index retry. cur term:20 me:0 server:4 value:14   reply：term：-1  index:14
2022/10/30 06:44:31 next index retry. cur term:20 me:0 server:3 value:14   reply：term：-1  index:14
  ... Passed --  16.0  5  982  207944   16
Test (2C): partitioned leader and one follower crash, leader restarts ...
2022/10/30 06:44:35 next index retry. cur term:3 me:2 server:1 value:2   reply：term：-1  index:2
  ... Passed --   1.9  3   51   12120    4
Test (2C): Figure 8 ...
2022/10/30 06:44:37 next index retry. cur term:4 me:0 server:1 value:3   reply：term：-1  index:3
2022/10/30 06:44:41 next index retry. cur term:11 me:2 server:1 value:9   reply：term：-1  index:9
2022/10/30 06:44:41 next index retry. cur term:12 me:1 server:4 value:4   reply：term：-1  index:4
2022/10/30 06:44:41 next index retry. cur term:12 me:1 server:3 value:5   reply：term：-1  index:5
2022/10/30 06:44:43 next index retry. cur term:13 me:3 server:2 value:9   reply：term：8  index:9
2022/10/30 06:44:43 next index retry. cur term:14 me:4 server:1 value:10   reply：term：-1  index:10
2022/10/30 06:44:44 next index retry. cur term:15 me:2 server:3 value:11   reply：term：-1  index:11
2022/10/30 06:44:45 next index retry. cur term:16 me:3 server:0 value:10   reply：term：-1  index:10
2022/10/30 06:44:45 next index retry. cur term:16 me:3 server:0 value:9   reply：term：8  index:9
2022/10/30 06:44:46 next index retry. cur term:17 me:0 server:2 value:13   reply：term：-1  index:13
2022/10/30 06:44:47 next index retry. cur term:19 me:1 server:4 value:12   reply：term：-1  index:12
2022/10/30 06:44:48 next index retry. cur term:20 me:2 server:3 value:14   reply：term：-1  index:14
2022/10/30 06:44:49 next index retry. cur term:22 me:4 server:0 value:16   reply：term：-1  index:16
2022/10/30 06:44:50 next index retry. cur term:24 me:3 server:2 value:19   reply：term：-1  index:19
2022/10/30 06:44:52 next index retry. cur term:27 me:3 server:0 value:23   reply：term：-1  index:23
2022/10/30 06:44:52 next index retry. cur term:27 me:3 server:2 value:23   reply：term：-1  index:23
2022/10/30 06:44:53 next index retry. cur term:29 me:3 server:2 value:25   reply：term：-1  index:25
2022/10/30 06:44:53 next index retry. cur term:29 me:3 server:0 value:25   reply：term：-1  index:25
2022/10/30 06:44:54 next index retry. cur term:30 me:0 server:1 value:17   reply：term：-1  index:17
2022/10/30 06:44:54 next index retry. cur term:31 me:2 server:4 value:21   reply：term：-1  index:21
2022/10/30 06:44:55 next index retry. cur term:34 me:4 server:2 value:29   reply：term：-1  index:29
2022/10/30 06:44:55 next index retry. cur term:34 me:4 server:1 value:29   reply：term：-1  index:29
2022/10/30 06:44:57 next index retry. cur term:36 me:2 server:0 value:28   reply：term：-1  index:28
2022/10/30 06:44:58 next index retry. cur term:37 me:1 server:3 value:27   reply：term：-1  index:27
2022/10/30 06:44:58 next index retry. cur term:39 me:3 server:0 value:33   reply：term：-1  index:33
2022/10/30 06:44:58 next index retry. cur term:39 me:3 server:4 value:32   reply：term：-1  index:32
2022/10/30 06:44:59 next index retry. cur term:40 me:4 server:2 value:33   reply：term：-1  index:33
2022/10/30 06:45:00 next index retry. cur term:41 me:2 server:1 value:34   reply：term：-1  index:34
2022/10/30 06:45:01 next index retry. cur term:42 me:0 server:3 value:35   reply：term：-1  index:35
2022/10/30 06:45:02 next index retry. cur term:43 me:1 server:4 value:35   reply：term：40  index:35
2022/10/30 06:45:02 next index retry. cur term:43 me:1 server:3 value:35   reply：term：-1  index:35
race: limit on 8128 simultaneously alive goroutines is exceeded, dying

协程数目超出限制，参照Kill函数的注释，在每个定时任务中加入 !rf.killed()判断，以便在测试完成后退出协程。

//
// the tester doesn't halt goroutines created by Raft after each test,
// but it does call the Kill() method. your code can use killed() to
// check whether Kill() has been called. the use of atomic avoids the
// need for a lock.
//
// the issue is that long-running goroutines use memory and may chew
// up CPU time, perhaps causing later tests to fail and generating
// confusing debug output. any goroutine with a long-running loop
// should call killed() to check whether it should stop.
//
func (rf *Raft) Kill() {
	atomic.StoreInt32(&rf.dead, 1)
	// Your code here, if desired.
}

// 向各节点发送心跳消息
func (rf *Raft) heartBeatTask() {
	for !rf.killed() {
		rf.mu.Lock()
		flag := (rf.state == Leader)
		rf.mu.Unlock()
		if !flag {
			return
		}
		go rf.agreementTask()
		time.Sleep(HeartBeatInterval)
	}
}
// 向server提交命令（存在错误，见下文）
func (rf *Raft) Submitter() {
	for !rf.killed() {
		rf.mu.Lock()
		if rf.lastApplied < rf.commitIndex {
			rf.lastApplied++
			msg := ApplyMsg{CommandValid: true, Command: rf.log[rf.lastApplied].Command, CommandIndex: rf.lastApplied}
			rf.applyCh <- msg
		}
		rf.mu.Unlock()
		time.Sleep(10 * time.Millisecond)
	}
}

log compaction

2D实现起来更多的是细节问题，是否将所有访问log的操作，访问log长度都修改为以某个数为起点的操作

访问index位置上的日志
rf.log[index]    ->  rf.log[index-X]
日志长度/nextIndex
len(rf.log)      ->  X+len(rf.log)
本地日志最大索引
len(rf.log) - 1  ->  X+len(rf.log)-1

在2B实现中，我在log[0]处放置了一条term为0的假日志，便于统一处理。在2D修改日志起点的时候一直出问题，我突然想到可以直接在日志0位放置快照的最后一条记录，也就是X=lastIncludeIndex，这样程序处理起来也比较一致。也就是说，快照为空时对应的lastInclude日志是初始化加入的假日志，其余时候均为真实快照的lastInclude日志（command成员不关心）。
我实现的2D的一些约束为

日志永远不能为空
follower日志lastIncludeIndex/commitIndex之前的term默认与leader的term匹配
在AppendEntries函数中ConflictIndex应大于lastIncludedIndex
在AppendEntriesRPC处理重传时，find same term的范围应该大于lastIncludedIndex
在同一个协程中apply快照与日志

第一条很容易理解，无论如何，日志都应该存在一条lastIncludeIndex日志，故在InstallSnapshot的step7中虽然写的是discard the entire log，也要加上一条快照对应的lastIncludeIndex日志，Snapshot函数中，少删除一条日志，保证0位为lastIncludeIndex日志。

第二条 leader拥有所有已提交的日志，对于已提交日志相同索引的日志内容一致，所以没必要检查term（可用于AppendEntries中step2 preLogIndex 匹配与step3 different term），要时时刻刻想到这一点。

Leader Completeness: if a log entry is committed in a given term, then that entry will be present in the logs of the leaders for all higher-numbered terms.

第三第四条实际上描述的事情差不多，日志0位的term最好只在RequestVote RPC相关的LastLogTerm被用到，preLogTerm可根据第二条规则默认匹配，nextIndex相关的操作不访问日志0位。

第五条实际上和以下这段提示有关，我刚开始虽然没实现一个CondInstallSnapshot，但实现了一个单独提交快照的函数，在接收到其他server的快照或重新Make时直接发送快照至service。在crash之前的测试还可以通过，一到crash测试就直接卡死，其他server一直无法访问crash的主机，显示RPC调用失败。

Previously, this lab recommended that you implement a function called CondInstallSnapshot to avoid the requirement that snapshots and log entries sent on applyCh are coordinated. This vestigal API interface remains, but you are discouraged from implementing it: instead, we suggest that you simply have it return true.

// crash test测试函数
func snapcommon(t *testing.T, name string, disconnect bool, reliable bool, crash bool) {
	iters := 30
	servers := 3
	cfg := make_config(t, servers, !reliable, true)
	defer cfg.cleanup()

	cfg.begin(name)

	cfg.one(rand.Int(), servers, true)
	leader1 := cfg.checkOneLeader()

	for i := 0; i < iters; i++ {
		victim := (leader1 + 1) % servers
		sender := leader1
		if i%3 == 1 {
			sender = (leader1 + 1) % servers
			victim = leader1
		}
		DPrintf("%v time,vicitm is %v\n", i, victim)
		if disconnect {
			cfg.disconnect(victim)
			cfg.one(rand.Int(), servers-1, true)
		}
		if crash {
			cfg.crash1(victim)
			DPrintf("%v crash", victim)
			cfg.one(rand.Int(), servers-1, true)
		}

		// perhaps send enough to get a snapshot
		nn := (SnapShotInterval / 2) + (rand.Int() % SnapShotInterval)
		for i := 0; i < nn; i++ {
			cfg.rafts[sender].Start(rand.Int())
		}

		// let applier threads catch up with the Start()'s
		if disconnect == false && crash == false {
			// make sure all followers have caught up, so that
			// an InstallSnapshot RPC isn't required for
			// TestSnapshotBasic2D().
			cfg.one(rand.Int(), servers, true)
		} else {
			cfg.one(rand.Int(), servers-1, true)
		}

		if cfg.LogSize() >= MAXLOGSIZE {
			cfg.t.Fatalf("Log size too large")
		}
		if disconnect {
			// reconnect a follower, who maybe behind and
			// needs to rceive a snapshot to catch up.
			cfg.connect(victim)
			cfg.one(rand.Int(), servers, true)
			leader1 = cfg.checkOneLeader()
		}
		if crash {
			cfg.start1(victim, cfg.applierSnap)
			DPrintf("%v restart\n", victim)
			cfg.connect(victim)
			cfg.one(rand.Int(), servers, true)
			leader1 = cfg.checkOneLeader()
		}
	}
	cfg.end()
}

在test_test.go添加打印后发现，vicitm server一直卡在crash和restart中间阶段，检查代码发现卡死在Make函数中发送快照至service的操作。故修改实现，接收到 InstallSnapshot RPC调用或重启时只修改commitIndex，不提交至service，只在一个协程中修改lastApplied。

遇到的问题：
一：快照编码错误
在刚开始实现的时候，我将lastIncludeIndex和lastIncludeTerm持久化到Persister的snapshot成员中，一到crash测试就显示snapshot decode error。

Test (2D): install snapshots (crash) ...
2022/11/19 05:26:22 0 become leader
2022/11/19 05:26:24 2 become leader
2022/11/19 05:26:24 snapshot decode error
exit status 1

阅读config.go相关代码

// 生成快照
if (m.CommandIndex+1)%SnapShotInterval == 0 {
	w := new(bytes.Buffer)
	e := labgob.NewEncoder(w)
	e.Encode(m.CommandIndex)
	var xlog []interface{}
	for j := 0; j <= m.CommandIndex; j++ {
		xlog = append(xlog, cfg.logs[i][j])
	}
	e.Encode(xlog)
	rf.Snapshot(m.CommandIndex, w.Bytes())
}
// 读取快照
if cfg.saved[i] != nil {
	cfg.saved[i] = cfg.saved[i].Copy()

	snapshot := cfg.saved[i].ReadSnapshot()
	if snapshot != nil && len(snapshot) > 0 {
		// mimic KV server and process snapshot now.
		// ideally Raft should send it up on applyCh...
		err := cfg.ingestSnap(i, snapshot, -1)
		if err != "" {
			cfg.t.Fatal(err)
		}
	}
}

func (cfg *config) ingestSnap(i int, snapshot []byte, index int) string {
	if snapshot == nil {
		log.Fatalf("nil snapshot")
		return "nil snapshot"
	}
	r := bytes.NewBuffer(snapshot)
	d := labgob.NewDecoder(r)
	var lastIncludedIndex int
	var xlog []interface{}
	if d.Decode(&lastIncludedIndex) != nil ||
		d.Decode(&xlog) != nil {
		log.Panic()
		log.Fatalf("snapshot decode error")
		return "snapshot Decode() error"
	}
	if index != -1 && index != lastIncludedIndex {
		err := fmt.Sprintf("server %v snapshot doesn't match m.SnapshotIndex", i)
		return err
	}
	cfg.logs[i] = map[int]interface{}{}
	for j := 0; j < len(xlog); j++ {
		cfg.logs[i][j] = xlog[j]
	}
	cfg.lastApplied[i] = lastIncludedIndex
	return ""
}

可以看出snapshot成员只应该是上层生成的快照，不应该加上其他状态，所有的状态都编码到raftstate成员中；另外快照中还额外编码了lastIncludedIndex。

二：InstallSnapshot RPC理解错误

If existing log entry has same index and term as snapshot’s last included entry, retain log entries following it and reply

对于InstallSnapshot RPC的step 6理解错误，理解成是否存在相同的快照，比对lastIncludeIndex与lastIncludeTerm去了。实际上直接检查相同index term是否一致即可，commit之前默认一致。

logIndex := args.LastIncludedIndex - rf.lastIncludedIndex
if logIndex < 0 {
	WPrintf("server snapshot is newer. server id:%v lastIncludedIndex:%v  log len:%v\n", rf.me, rf.lastIncludedIndex, len(rf.log))
	return
}
if logIndex < len(rf.log) && rf.log[logIndex].Term == args.LastIncludedTerm { // step 5
	DPrintf("same log entry. server id:%v lastIncludedIndex:%v  log len:%v\n", rf.me, rf.lastIncludedIndex, len(rf.log))
	return
}

第一个分支从来没打印过

DPrintf打印一些普通信息
WPrintf打印一些异常情况
log.Panicf打印一些逻辑错误情况

记得时不时commit一下，便于查看代码修改情况。

三：没看出具体作用的修改
1 垃圾回收

Raft must discard old log entries in a way that allows the Go garbage collector to free and re-use the memory; this requires that there be no reachable references (pointers) to the discarded log entries.

进行如下语句的替换

// rf.log = rf.log[index-rf.lastIncludedIndex:]
rf.log = append([]logEntry{}, rf.log[index-rf.lastIncludedIndex:]...)

很多博客都使用copy来实现深拷贝，以指向不同的底层数组，但我觉得用append更方便一点。
切片(slice)性能及陷阱

If, when the server comes back up, it reads the updated snapshot, but the outdated log, it may end up applying some log entries that are already contained within the snapshot. This happens since the commitIndex and lastApplied are not persisted, and so Raft doesn’t know that those log entries have already been applied. The fix for this is to introduce a piece of persistent state to Raft that records what “real” index the first entry in Raft’s persisted log corresponds to. This can then be compared to the loaded snapshot’s lastIncludedIndex to determine what elements at the head of the log to discard.

Students’ Guide to Raft
比对快照编码的lastIncludeIndex与state中的lastIncludeIndex，去掉一些日志

func (rf *Raft) compareStateAndSnapshot() {
	if rf.snapshot == nil || len(rf.snapshot) < 1 {
		return
	}
	r := bytes.NewBuffer(rf.snapshot)
	d := labgob.NewDecoder(r)
	var lastIncludedIndex int
	if d.Decode(&lastIncludedIndex) != nil {
		log.Panicf("compareStateAndSnapshot: snapshot decode error")
	}
	if rf.lastIncludedIndex != lastIncludedIndex {
		WPrintf("snapshot lastIncludedIndex is different. snapshot:%v  state:%v\n", lastIncludedIndex, rf.lastIncludedIndex)
		rf.log = append([]logEntry{}, rf.log[lastIncludedIndex-rf.lastIncludedIndex:]...)
		rf.lastIncludedIndex = lastIncludedIndex
		rf.lastIncludedTerm = rf.log[0].Term // 日志0位的term
		rf.commitIndex = rf.lastIncludedIndex
	}
}

没啥用，WPrintf语句没打印过

由于3B在快照中编码了其他数据，也就不再做这个比较了
四：不知道有没有用的修改
在2D测试跑通后跑全部的测试，其中几次在2D的各个测试出现了panic: test timed out after 10m0s问题，看了看打印的goroutine堆栈，发现许多goroutine都在获取锁，并且apply协程阻塞在通道发送那一步（在当时实现中此时该线程持有锁），觉得有可能是service没有接收ApplyMsg，导致server一直无法推进，故修改提交协程实现，在发送消息时不持有锁。

// 向service发送命令（仍然存在问题，见下文）
func (rf *Raft) applyTask() {
	for !rf.killed() {
		var msg ApplyMsg
		sendMsg := false
		rf.mu.Lock()
		if rf.lastApplied < rf.commitIndex {
			sendMsg = true
			if rf.lastApplied < rf.lastIncludedIndex {
				msg = ApplyMsg{CommandValid: false, SnapshotValid: true, Snapshot: rf.snapshot, SnapshotIndex: rf.lastIncludedIndex, SnapshotTerm: rf.lastIncludedTerm}
				DPrintf("applySnapshot. server id:%v commitIndex:%v  lastApplied:%v  lastIncludedIndex:%v\n", rf.me, rf.commitIndex, rf.lastApplied, rf.lastIncludedIndex)
				rf.lastApplied = rf.lastIncludedIndex
			} else {
				rf.lastApplied++
				msg = ApplyMsg{CommandValid: true, Command: rf.log[rf.lastApplied-rf.lastIncludedIndex].Command, CommandIndex: rf.lastApplied}
			}
		}
		rf.mu.Unlock()
		if sendMsg {
			rf.applyCh <- msg
		}
		time.Sleep(10 * time.Millisecond)
	}
}

不过有没有真的解决问题我也不知道，反正之后测试没再出现这个问题了。
这个实验最麻烦的一点就在于永远不知道是否真正解决了某个问题，在2B阶段有时候测试到一千多次才出现错误，都有点无从调试，只能看看raft论文，Students’ Guide，自我完善一下代码，并时时检查一些关键变量，打印一些异常情况。

整体测试

在基本写完ABCD阶段后，再看一遍相关文章，重新看一遍代码，写一些注释，跑上一天全部测试，如果没有错就当lab2实现完成了（有限的正确性保证）。
6.824 Lab 2: Raft
Students’ Guide to Raft
Raft Q&A
Raft Locking Advice
Raft Structure Advice
Debugging by Pretty Printing
Lab guidance

摘录一些片段加深印象：

Rule 1: Whenever you have data that more than one goroutine uses, and at least one goroutine might modify the data, the goroutines should use locks to prevent simultaneous use of the data.

Rule 2: Whenever code makes a sequence of modifications to shared data, and other goroutines might malfunction if they looked at the data midway through the sequence, you should use a lock around the whole sequence.
Rule 3: Whenever code does a sequence of reads of shared data (orreads and writes), and would malfunction if another goroutine modified the data midway through the sequence, you should use a lock around the whole sequence.
Rule 4: It’s usually a bad idea to hold a lock while doing anything that might wait: reading a Go channel, sending on a channel, waiting for a timer, calling time.Sleep(), or sending an RPC (and waiting for the reply).
Rule 5: Be careful about assumptions across a drop and re-acquire of a lock. One place this can arise is when avoiding waiting with locks held.

commitIndex is volatile because Raft can figure out a correct value for it after a reboot using just the persistent state. Once a leader successfully gets a new log entry committed, it knows everything before that point is also committed. A follower that crashes and comes back up will be told about the right commitIndex whenever the current leader sends it an AppendEntries RPC.

lastApplied starts at zero after a reboot because the Figure 2 design assumes the service (e.g., a key/value database) doesn’t keep any persistent state. Thus its state needs to be completely recreated by replaying all log entries. If the service does keep persistent state, it is expected to persistently remember how far in the log it has executed, and to ignore entries before that point. Either way it’s safe to start with lastApplied = 0 after a reboot.

Instead, the best approach is usually to work backwards and narrow down the size of phase 2 until it is as small as possible, so that the location of the fault is readily apparent. This is done by expanding the instrumentation of your code to surface errors sooner, and thereby spend less time in phase 2. This generally involves adding additional debugging statements and/or assertions to your code.

When possible, consider writing your code to “fail loudly”. Instead of trying to tolerate unexpected states, try to explicitly detect states that should never be allowed to happen, and immediately report these errors. Consider even immediately calling the Go ‘panic’ function in these cases to fail especially loudly. See also the Wikipedia page on Offensive programming techniques. Remember that the longer you allow errors to remain latent, the longer it will take to narrow down the true underlying fault.

When you’re failing a test, and it’s not obvious why, it’s usually worth taking the time to understand what the test is actually doing, and which part of the test is observing the problem. It can be helpful to add print statements to the test code so that you know when events are happening.

注意检查是否所有持久状态改变时都调用了persist函数，快照改变时都调用了SaveStateAndSnapshot函数。
每个server commit index之前的日志都一模一样且无法更改

遇到的问题：
2D时为解决panic: test timed out after 10m0s问题，向service发送消息时不再持有锁

// 向service发送命令
func (rf *Raft) applyTask() {
	for !rf.killed() {
		var msg ApplyMsg
		sendMsg := false
		rf.mu.Lock()
		if rf.lastApplied < rf.commitIndex {
			sendMsg = true
			if rf.lastApplied < rf.lastIncludedIndex {
				msg = ApplyMsg{CommandValid: false, SnapshotValid: true, Snapshot: rf.snapshot, SnapshotIndex: rf.lastIncludedIndex, SnapshotTerm: rf.lastIncludedTerm}
				DPrintf("applySnapshot. server id:%v commitIndex:%v  lastApplied:%v  lastIncludedIndex:%v\n", rf.me, rf.commitIndex, rf.lastApplied, rf.lastIncludedIndex)
				rf.lastApplied = rf.lastIncludedIndex
			} else {
				rf.lastApplied++
				msg = ApplyMsg{CommandValid: true, Command: rf.log[rf.lastApplied-rf.lastIncludedIndex].Command, CommandIndex: rf.lastApplied}
			}
		}
		rf.mu.Unlock()
		if sendMsg {
			rf.applyCh <- msg
		}
		time.Sleep(10 * time.Millisecond)
	}
}

然后运行所有测试样例，TestReliableChurn2C时不时报以下错误

config.go:628: one(6739062427422661052) failed to reach agreement

在相应的语句前面加上log.Panicf语句，显示调用栈

goroutine 19 [running]:
testing.tRunner.func1(0xc0000e6100)
	/usr/lib/go-1.13/src/testing/testing.go:874 +0x3a3
panic(0x5b28c0, 0xc00031ec50)
	/usr/lib/go-1.13/src/runtime/panic.go:679 +0x1b2
log.Panicf(0x604122, 0x21, 0xc000127d70, 0x1, 0x1)
	/usr/lib/go-1.13/src/log/log.go:345 +0xc0
6.824/raft.(*config).one(0xc0000e2140, 0x5b1d40, 0xc0002c7188, 0x5, 0x301, 0xc0009f2000)
	/root/mit6.824/6.824/src/raft/config.go:628 +0x73b
6.824/raft.internalChurn(0xc0000e6100, 0xbf8db3b4c200)
	/root/mit6.824/6.824/src/raft/test_test.go:1079 +0xace
6.824/raft.TestReliableChurn2C(0xc0000e6100)
	/root/mit6.824/6.824/src/raft/test_test.go:1107 +0x30
testing.tRunner(0xc0000e6100, 0x609aa8)
	/usr/lib/go-1.13/src/testing/testing.go:909 +0xc9
created by testing.(*T).Run
	/usr/lib/go-1.13/src/testing/testing.go:960 +0x350
exit status 2

// 测试函数 协商某项日志
// do a complete agreement.
// it might choose the wrong leader initially,
// and have to re-submit after giving up.
// entirely gives up after about 10 seconds.
// indirectly checks that the servers agree on the
// same value, since nCommitted() checks this,
// as do the threads that read from applyCh.
// returns index.
// if retry==true, may submit the command multiple
// times, in case a leader fails just after Start().
// if retry==false, calls Start() only once, in order
// to simplify the early Lab 2B tests.
func (cfg *config) one(cmd interface{}, expectedServers int, retry bool) int {
	t0 := time.Now()
	starts := 0
	for time.Since(t0).Seconds() < 10 && cfg.checkFinished() == false {
		// try all the servers, maybe one is the leader.
		index := -1
		for si := 0; si < cfg.n; si++ {
			starts = (starts + 1) % cfg.n
			var rf *Raft
			cfg.mu.Lock()
			if cfg.connected[starts] {
				rf = cfg.rafts[starts]
			}
			cfg.mu.Unlock()
			if rf != nil {
				index1, _, ok := rf.Start(cmd)
				if ok {
					index = index1
					break
				}
			}
		}
		if index != -1 {
			// somebody claimed to be the leader and to have
			// submitted our command; wait a while for agreement.
			t1 := time.Now()
			for time.Since(t1).Seconds() < 2 {
				nd, cmd1 := cfg.nCommitted(index)
				if nd > 0 && nd >= expectedServers {
					// committed
					if cmd1 == cmd {
						// and it was the command we submitted.
						return index
					}
				}
				time.Sleep(20 * time.Millisecond)
			}
			if retry == false {
				cfg.t.Fatalf("one(%v) failed to reach agreement", cmd)
			}
		} else {
			time.Sleep(50 * time.Millisecond)
		}
	}

	if cfg.checkFinished() == false {
		for si := 0; si < cfg.n; si++ {
			starts = (starts + 1) % cfg.n
			var rf *Raft
			cfg.mu.Lock()
			if cfg.connected[starts] {
				rf = cfg.rafts[starts]
			}
			cfg.mu.Unlock()
			if rf != nil {
				rf.mu.Lock()
				DPrintf("%+v", rf)
				rf.mu.Unlock()
			}
		}
		log.Panicf("one(%v) failed to reach agreement", cmd)
		cfg.t.Fatalf("one(%v) failed to reach agreement", cmd)
	}
	return -1
}

func internalChurn(t *testing.T, unreliable bool) {

	servers := 5
	cfg := make_config(t, servers, unreliable, false)
	defer cfg.cleanup()

	if unreliable {
		cfg.begin("Test (2C): unreliable churn")
	} else {
		cfg.begin("Test (2C): churn")
	}

	stop := int32(0)

	// create concurrent clients
	cfn := func(me int, ch chan []int) {
		var ret []int
		ret = nil
		defer func() { ch <- ret }()
		values := []int{}
		for atomic.LoadInt32(&stop) == 0 {
			x := rand.Int()
			index := -1
			ok := false
			for i := 0; i < servers; i++ {
				// try them all, maybe one of them is a leader
				cfg.mu.Lock()
				rf := cfg.rafts[i]
				cfg.mu.Unlock()
				if rf != nil {
					index1, _, ok1 := rf.Start(x)
					if ok1 {
						ok = ok1
						index = index1
					}
				}
			}
			if ok {
				// maybe leader will commit our value, maybe not.
				// but don't wait forever.
				for _, to := range []int{10, 20, 50, 100, 200} {
					nd, cmd := cfg.nCommitted(index)
					if nd > 0 {
						if xx, ok := cmd.(int); ok {
							if xx == x {
								values = append(values, x)
							}
						} else {
							cfg.t.Fatalf("wrong command type")
						}
						break
					}
					time.Sleep(time.Duration(to) * time.Millisecond)
				}
			} else {
				time.Sleep(time.Duration(79+me*17) * time.Millisecond)
			}
		}
		ret = values
	}

	ncli := 3
	cha := []chan []int{}
	for i := 0; i < ncli; i++ {
		cha = append(cha, make(chan []int))
		go cfn(i, cha[i])
	}

	for iters := 0; iters < 20; iters++ {
		if (rand.Int() % 1000) < 200 {
			i := rand.Int() % servers
			cfg.disconnect(i)
		}

		if (rand.Int() % 1000) < 500 {
			i := rand.Int() % servers
			if cfg.rafts[i] == nil {
				cfg.start1(i, cfg.applier)
			}
			cfg.connect(i)
		}

		if (rand.Int() % 1000) < 200 {
			i := rand.Int() % servers
			if cfg.rafts[i] != nil {
				cfg.crash1(i)
			}
		}

		// Make crash/restart infrequent enough that the peers can often
		// keep up, but not so infrequent that everything has settled
		// down from one change to the next. Pick a value smaller than
		// the election timeout, but not hugely smaller.
		time.Sleep((RaftElectionTimeout * 7) / 10)
	}

	time.Sleep(RaftElectionTimeout)
	cfg.setunreliable(false)
	for i := 0; i < servers; i++ {
		if cfg.rafts[i] == nil {
			cfg.start1(i, cfg.applier)
		}
		cfg.connect(i)
	}

	atomic.StoreInt32(&stop, 1)

	values := []int{}
	for i := 0; i < ncli; i++ {
		vv := <-cha[i]
		if vv == nil {
			t.Fatal("client failed")
		}
		values = append(values, vv...)
	}

	time.Sleep(RaftElectionTimeout)

	lastIndex := cfg.one(rand.Int(), servers, true)

发现停留在lastIndex这行语句中，其中的expectedServers参数为servers，即所有server都需提交命令成功。
在出错前打印的DPrintf(“%+v”, rf)语句显示信息中可以看到

commitIndex:1153 lastApplied:996 nextIndex:[] matchIndex:[] state:0 leaderId:2
commitIndex:1153 lastApplied:1153 nextIndex:[] matchIndex:[] state:0 leaderId:2
commitIndex:1153 lastApplied:1153 nextIndex:[] matchIndex:[] state:0 leaderId:2
commitIndex:1153 lastApplied:1153 nextIndex:[] matchIndex:[] state:0 leaderId:2
commitIndex:1153 lastApplied:1153 nextIndex:[1154 1154 1154 1154 1154] matchIndex:[1153 1153 1153 1153 1153] state:2 leaderId:2

实际上有一个server commitIndex已经变成了1153，只不过还没来得及发送至service，故简单修改一下applyTask，不再是发送一次就休眠，变成本次成功发送则再进行一次尝试。

// 向service发送命令
func (rf *Raft) applyTask() {
	for !rf.killed() {
		var msg ApplyMsg
		sendMsg := false
		rf.mu.Lock()
		if rf.lastApplied < rf.commitIndex {
			sendMsg = true
			if rf.lastApplied < rf.lastIncludedIndex { // 发送快照
				rf.lastApplied = rf.lastIncludedIndex
				msg = ApplyMsg{CommandValid: false, SnapshotValid: true, Snapshot: rf.snapshot, SnapshotIndex: rf.lastIncludedIndex, SnapshotTerm: rf.lastIncludedTerm}
				DPrintf("applySnapshot. server id:%v commitIndex:%v  lastApplied:%v  lastIncludedIndex:%v\n", rf.me, rf.commitIndex, rf.lastApplied, rf.lastIncludedIndex)
			} else { // 发送日志
				rf.lastApplied++
				msg = ApplyMsg{CommandValid: true, Command: rf.log[rf.lastApplied-rf.lastIncludedIndex].Command, CommandIndex: rf.lastApplied}
			}
		}
		rf.mu.Unlock()
		if sendMsg { // 向service发送消息
			rf.applyCh <- msg
		} else { // 此次未发送消息，休眠10ms后再次查询
			time.Sleep(10 * time.Millisecond)
		}
	}
}

跑了一天测试，233次PASS，lab2基本正确

Test (2A): initial election ...
  ... Passed --   3.0  3   58   15996    0
Test (2A): election after network failure ...
  ... Passed --   5.1  3  124   24683    0
Test (2A): multiple elections ...
  ... Passed --   7.0  7  600  120200    0
Test (2B): basic agreement ...
  ... Passed --   1.1  3   22    5338    3
Test (2B): RPC byte count ...
  ... Passed --   1.5  3   48  113842   11
Test (2B): agreement after follower reconnects ...
  ... Passed --   5.7  3  131   35047    8
Test (2B): no agreement if too many followers disconnect ...
  ... Passed --   3.6  5  200   42417    3
Test (2B): concurrent Start()s ...
  ... Passed --   0.5  3   20    5838    6
Test (2B): rejoin of partitioned leader ...
  ... Passed --   6.2  3  193   46403    4
Test (2B): leader backs up quickly over incorrect follower logs ...
  ... Passed --  16.6  5 2100 1576435  102
Test (2B): RPC counts aren't too high ...
  ... Passed --   2.0  3   58   17896   12
Test (2C): basic persistence ...
  ... Passed --   4.1  3   86   21883    6
Test (2C): more persistence ...
  ... Passed --  15.7  5  932  198468   16
Test (2C): partitioned leader and one follower crash, leader restarts ...
  ... Passed --   1.7  3   37    9365    4
Test (2C): Figure 8 ...
  ... Passed --  31.4  5  870  188356   47
Test (2C): unreliable agreement ...
  ... Passed --   2.2  5 1056  348389  246
Test (2C): Figure 8 (unreliable) ...
  ... Passed --  29.9  5 9385 19728838  106
Test (2C): churn ...
  ... Passed --  16.2  5 4892 9100813 1114
Test (2C): unreliable churn ...
  ... Passed --  16.1  5 6409 10687659 1013
Test (2D): snapshots basic ...
  ... Passed --   2.1  3  480  164064  220
Test (2D): install snapshots (disconnect) ...
  ... Passed --  35.1  3 1442  637310  324
Test (2D): install snapshots (disconnect+unreliable) ...
  ... Passed --  45.8  3 1674  652143  313
Test (2D): install snapshots (crash) ...
  ... Passed --  28.0  3 1174  488096  339
Test (2D): install snapshots (unreliable+crash) ...
  ... Passed --  36.4  3 1206  602020  297
Test (2D): crash and restart all servers ...
  ... Passed --   8.5  3  276   79532   62
PASS
ok  	6.824/raft	325.469s

后面发现的问题

在测试3A的时候发现Leader节点自身的matchIndex为0，其余均为146，这与我的预想不符。在开始的设计中，在节点被选为Leader时matchIndex都为0，在Start函数中对自身的matchIndex赋值，在计算commitIndex时把自身的matchIndex也算进去了，这样比较便于理解。

// leader选举
	// 初始化Leader专属变量
	rf.nextIndex = make([]int, rf.peerNumber)
	nextIndex := rf.lastIncludedIndex + len(rf.log)
	for i := 0; i < rf.peerNumber; i++ {
		rf.nextIndex[i] = nextIndex
	}
	rf.matchIndex = make([]int, rf.peerNumber)
// Start函数
	// 同时更新自身nextIndex与matchIndex，便于计算commit index
	rf.nextIndex[rf.me] = rf.lastIncludedIndex + len(rf.log)
	rf.matchIndex[rf.me] = rf.lastIncludedIndex + len(rf.log) - 1
	
// 重新计算提交索引
func (rf *Raft) changeCommitIndex() { // leader rule 4
	matchIndex := append([]int{}, rf.matchIndex...) // 深拷贝
	sort.Ints(matchIndex)
	//  If there exists an N such that N > commitIndex, a majority
	// of matchIndex[i] ≥ N, and log[N].term == currentTerm:
	// set commitIndex = N
	for i := matchIndex[rf.peerNumber-rf.majorityPeer]; i > rf.commitIndex; i-- {
		if rf.log[i-rf.lastIncludedIndex].Term == rf.currentTerm {
			// DPrintf("%d change commindex from %d to %d\n", rf.me, rf.commitIndex, i)
			rf.commitIndex = i
			return
		}
	}
}

在绝大部分情况这种方式没啥问题，但当节点被选为leader，但一直未收到客户的命令，并且节点与半数节点在一个分区，其余节点在另一个分区，那么commit index永远无法增加。

例子：
节点0为leader，节点1 2与leader在一个分区
节点3 4在另一个分区
节点0在被选举后log长度为10，将这些日志同步到节点1 2
节点0的match index[0,10,10,0,0]  commit index永远为0

这个的解决方法也很简单，可以在计算commit index忽略leader自身，或者保证leader自身match index永远为日志最大索引（实际上可以为无穷大值），我选择了第二种。

// 重新计算提交索引
func (rf *Raft) changeCommitIndex() { // leader rule 4
	rf.matchIndex[rf.me] = rf.lastIncludedIndex + len(rf.log) - 1 // 默认与自己匹配
	matchIndex := append([]int{}, rf.matchIndex...)               // 深拷贝
	sort.Ints(matchIndex)

	//  If there exists an N such that N > commitIndex, a majority
	// of matchIndex[i] ≥ N, and log[N].term == currentTerm:
	// set commitIndex = N
	for i := matchIndex[rf.peerNumber-rf.majorityPeer]; i > rf.commitIndex; i-- {
		if rf.log[i-rf.lastIncludedIndex].Term == rf.currentTerm {
			// DPrintf("%d change commindex from %d to %d\n", rf.me, rf.commitIndex, i)
			rf.commitIndex = i
			return
		}
	}
}

这说明很多时候代码实际的效果与预想的效果不太一样，另外vscode的运行调试功能真好用。

参考代码

// 节点状态
type PeerState int

const (
	Follower PeerState = iota
	Candidate
	Leader
)

// 常量定义
const (
	InvalidId         = -1
	NoneTerm          = -1
	HeartBeatInterval = 100 * time.Millisecond
)

type ApplyMsg struct {
	CommandValid bool
	Command      interface{}
	CommandIndex int

	// For 2D:
	SnapshotValid bool
	Snapshot      []byte
	SnapshotTerm  int
	SnapshotIndex int
}

type logEntry struct {
	Command interface{}
	Term    int
}

type Raft struct {
	mu        sync.Mutex          // Lock to protect shared access to this peer's state
	peers     []*labrpc.ClientEnd // RPC end points of all peers
	persister *Persister          // Object to hold this peer's persisted state
	me        int                 // this peer's index into peers[]
	dead      int32               // set by Kill()

	currentTerm int
	votedFor    int // 当前任期接受到的选票的候选者ID
	log         []logEntry

	commitIndex int
	lastApplied int
	nextIndex   []int
	matchIndex  []int

	// 自定义变量
	state    PeerState
	leaderId int // 未用到

	// 使用三个变量只是表示更新选举定时器的三个条件，实际上用一个就可以
	lastHeartBeat       time.Time
	lastVote            time.Time
	lastInstallSnapshot time.Time

	// 快照相关变量
	snapshot          []byte
	lastIncludedIndex int
	lastIncludedTerm  int

	// 自定义常量(初始化一次后不再改变)
	applyCh      chan ApplyMsg
	peerNumber   int
	majorityPeer int
}

func Min(n1, n2 int) int {
	if n1 > n2 {
		return n2
	}
	return n1
}

func Max(n1, n2 int) int {
	if n1 > n2 {
		return n1
	}
	return n2
}

// 将raft当前状态持久化
func (rf *Raft) persist() {
	w := new(bytes.Buffer)
	e := labgob.NewEncoder(w)
	e.Encode(rf.currentTerm)
	e.Encode(rf.votedFor)
	e.Encode(rf.log)
	e.Encode(rf.lastIncludedIndex)
	e.Encode(rf.lastIncludedTerm)
	data := w.Bytes()
	rf.persister.SaveRaftState(data)
}

// 读取raft状态
func (rf *Raft) readPersist(data []byte) {
	if data == nil || len(data) < 1 { // bootstrap without any state?
		return
	}
	r := bytes.NewBuffer(data)
	d := labgob.NewDecoder(r)
	var currentTerm int
	var votedFor int
	var raftLog []logEntry
	var lastIncludedIndex int
	var lastIncludedTerm int
	if d.Decode(&currentTerm) != nil ||
		d.Decode(&votedFor) != nil || d.Decode(&raftLog) != nil || d.Decode(&lastIncludedIndex) != nil || d.Decode(&lastIncludedTerm) != nil {
		log.Panicf("state decode error")
	} else {
		rf.currentTerm = currentTerm
		rf.votedFor = votedFor
		rf.log = raftLog
		rf.lastIncludedIndex = lastIncludedIndex
		rf.lastIncludedTerm = lastIncludedTerm
		rf.commitIndex = rf.lastIncludedIndex // 同时设置commitIndex
	}
}

// 将raft当前状态与快照持久化
func (rf *Raft) persistStateAndSnapshot() {
	w := new(bytes.Buffer)
	e := labgob.NewEncoder(w)
	e.Encode(rf.currentTerm)
	e.Encode(rf.votedFor)
	e.Encode(rf.log)
	e.Encode(rf.lastIncludedIndex)
	e.Encode(rf.lastIncludedTerm)
	state := w.Bytes()
	rf.persister.SaveStateAndSnapshot(state, rf.snapshot)
}

// 比对raft state编码的lastIncludedIndex与快照中编码的lastIncludedIndex
func (rf *Raft) compareStateAndSnapshot() {
	if rf.snapshot == nil || len(rf.snapshot) < 1 {
		return
	}
	// DPrintf("exist snapshot\n")
	r := bytes.NewBuffer(rf.snapshot)
	d := labgob.NewDecoder(r)
	var lastIncludedIndex int
	if d.Decode(&lastIncludedIndex) != nil {
		log.Panicf("compareStateAndSnapshot: snapshot decode error")
	}
	if rf.lastIncludedIndex != lastIncludedIndex {
		WPrintf("snapshot lastIncludedIndex is different. snapshot:%v  state:%v\n", lastIncludedIndex, rf.lastIncludedIndex)
		// 丢掉快照lastIncludedIndex之前的日志并重新设置相关变量
		rf.log = append([]logEntry{}, rf.log[lastIncludedIndex-rf.lastIncludedIndex:]...)
		rf.lastIncludedIndex = lastIncludedIndex
		rf.lastIncludedTerm = rf.log[0].Term // 日志0位的term
		rf.commitIndex = rf.lastIncludedIndex
		rf.persist()
	}
}
func (rf *Raft) CondInstallSnapshot(lastIncludedTerm int, lastIncludedIndex int, snapshot []byte) bool {
	return true
}

func (rf *Raft) Snapshot(index int, snapshot []byte) {
	rf.mu.Lock()
	defer rf.mu.Unlock()
	if index <= rf.lastIncludedIndex { // 至少得比当前快照更新
		WPrintf("server%v no use snapshot.this index:%v last index:%v\n", rf.me, rf.lastIncludedIndex, index)
		return
	}
	// rf.log = rf.log[index-rf.lastIncludedIndex:]
	rf.log = append([]logEntry{}, rf.log[index-rf.lastIncludedIndex:]...)
	rf.lastIncludedIndex = index
	rf.lastIncludedTerm = rf.log[0].Term // 日志0位的term
	rf.snapshot = snapshot
	rf.persistStateAndSnapshot()
	// DPrintf("generate Snapshot. server id:%v Snapshot index:%v  log len:%v   commmit index:%v apply index:%v", rf.me, index, len(rf.log), rf.commitIndex, rf.lastApplied)
}

type RequestVoteArgs struct {
	Term         int
	CandidateId  int
	LastLogIndex int
	LastLogTerm  int
}

type RequestVoteReply struct {
	Term        int
	VoteGranted bool
}

type AppendEntriesArgs struct {
	Term         int
	LeaderId     int
	PrevLogIndex int
	PrevLogTerm  int
	Entries      []logEntry
	LeaderCommit int
}

type AppendEntriesReply struct {
	Term          int
	Success       bool
	ConflictIndex int
	ConflictTerm  int
}

type InstallSnapshotArgs struct {
	Term              int
	LeaderId          int
	LastIncludedIndex int
	LastIncludedTerm  int
	Data              []byte
}

type InstallSnapshotReply struct {
	Term int
}

func (rf *Raft) RequestVote(args *RequestVoteArgs, reply *RequestVoteReply) {
	rf.mu.Lock()
	defer rf.mu.Unlock()
	reply.Term = rf.currentTerm
	if rf.currentTerm > args.Term { // RequestVote rule 1
		reply.VoteGranted = false
		return
	}

	needPersist := false            // 是否需要持久化
	if rf.currentTerm < args.Term { // set currentTerm = T, convert to follower
		rf.currentTerm = args.Term
		rf.votedFor = InvalidId // 更新votedFor变量
		rf.state = Follower
		needPersist = true
	}

	lastLogIndex := rf.lastIncludedIndex + len(rf.log) - 1 // 本地日志最大索引值
	lastLogTerm := rf.log[lastLogIndex-rf.lastIncludedIndex].Term
	// votedFor is null or candidateId
	condition1 := (rf.votedFor == InvalidId) || (rf.votedFor == args.CandidateId)
	// at least as up-to-date
	condition2 := (args.LastLogTerm > lastLogTerm) || ((args.LastLogTerm == lastLogTerm) && (args.LastLogIndex >= lastLogIndex))
	// DPrintf("arg:%v %v  rf:%v %v\n", args.LastLogTerm, args.LastLogIndex, lastLogTerm, lastLogIndex)
	reply.VoteGranted = condition1 && condition2
	if reply.VoteGranted {
		rf.votedFor = args.CandidateId // 设置本次选择候选者ID
		rf.lastVote = time.Now()       // 更新投票时间
		needPersist = true
		// DPrintf("%d vote to %d\n", rf.me, args.CandidateId)
	}

	if needPersist {
		rf.persist()
	}
}

func (rf *Raft) AppendEntries(args *AppendEntriesArgs, reply *AppendEntriesReply) {
	rf.mu.Lock()
	defer rf.mu.Unlock()
	reply.Term = rf.currentTerm
	reply.Success = true
	if rf.currentTerm > args.Term { // AppendEntries rule 1
		reply.Success = false
		return
	}

	needPersist := false
	if rf.currentTerm < args.Term { // set currentTerm = T, convert to follower
		rf.currentTerm = args.Term
		rf.votedFor = InvalidId // 更新votedFor变量
		rf.state = Follower
		needPersist = true
	}

	// 重置Follower状态
	rf.lastHeartBeat = time.Now()
	rf.leaderId = args.LeaderId

	maxLocalLogIndex := rf.lastIncludedIndex + len(rf.log) - 1 // 本地最大日志索引
	var matchResult bool
	if maxLocalLogIndex < args.PrevLogIndex { // 不存在该日志项，匹配失败
		matchResult = false
	} else if rf.commitIndex >= args.PrevLogIndex { // 该日志项已提交，默认匹配成功
		// DPrintf("commited log is same:  PrevLogIndex %v  lastIncludedIndex %v commitIndex %v\n", args.PrevLogIndex, rf.lastIncludedIndex, rf.commitIndex)
		matchResult = true
	} else if rf.lastIncludedIndex <= args.PrevLogIndex { // 比对PrevLogIndex处term是否一致
		matchResult = (rf.log[args.PrevLogIndex-rf.lastIncludedIndex].Term == args.PrevLogTerm)
	} else {
		log.Panicf("unexpected condition. match error\n")
	}

	if !matchResult { // AppendEntries rule 2
		reply.Success = false
		// If a follower does not have prevLogIndex in its log, it should return with conflictIndex = len(log) and conflictTerm = None.
		// If a follower does have prevLogIndex in its log, but the term does not match, it should return conflictTerm = log[prevLogIndex].Term, and then search its log for the first index whose entry has term equal to conflictTerm.
		if maxLocalLogIndex < args.PrevLogIndex {
			reply.ConflictIndex = rf.lastIncludedIndex + len(rf.log) // conflictIndex = len(log)
			reply.ConflictTerm = NoneTerm
		} else {
			// next index相关，不应该涉及lastIncludedTerm
			reply.ConflictTerm = rf.log[args.PrevLogIndex-rf.lastIncludedIndex].Term
			reply.ConflictIndex = rf.commitIndex + 1 // ConflictIndex应大于commitIndex
			for i := args.PrevLogIndex; i > rf.commitIndex+1; i-- {
				if rf.log[i-1-rf.lastIncludedIndex].Term != reply.ConflictTerm { // 上一个entry term不匹配，该位置即为第一个为该term的entry
					reply.ConflictIndex = i
					break
				}
			}
		}
		DPrintf("prelog term doesn't match. server id:%v commit index:%d maxLocalLogIndex:%v args.PrevLogIndex:%v rf.lastIncludedIndex:%v  ConflictTerm:%v ConflictIndex:%v \n", rf.me, rf.commitIndex, maxLocalLogIndex, args.PrevLogIndex, rf.lastIncludedIndex, reply.ConflictTerm, reply.ConflictIndex)
		if needPersist {
			rf.persist()
		}
		return
	}
	// DPrintf("%d preIndex %d  entry size:%d\n", rf.me, args.PrevLogIndex, len(args.Entries))
	// 找到第一个term不同的索引
	lastNewEntryIndex := args.PrevLogIndex + len(args.Entries)
	minLogIndex := Max(rf.commitIndex+1, args.PrevLogIndex+1) // 已提交日志的term一定一致，不用检查
	maxLogIndex := Min(lastNewEntryIndex, maxLocalLogIndex)   // 取本地日志与消息日志最大索引的较小值
	diffItemIndex := maxLogIndex + 1
	for i := minLogIndex; i <= maxLogIndex; i++ {
		if rf.log[i-rf.lastIncludedIndex].Term != args.Entries[i-args.PrevLogIndex-1].Term {
			diffItemIndex = i
			break
		}
	}
	// AppendEntries rule 3
	if diffItemIndex != maxLogIndex+1 { // 只有存在不一致日志时才delete
		rf.log = rf.log[:diffItemIndex-rf.lastIncludedIndex]
		needPersist = true
	}

	if diffItemIndex-args.PrevLogIndex-1 <= len(args.Entries)-1 { // 起始索引小于最大索引，还有日志未加入本地日志
		rf.log = append(rf.log, args.Entries[diffItemIndex-args.PrevLogIndex-1:]...) // AppendEntries rule 4
		needPersist = true
	}

	// 取LeaderCommit与消息日志最大索引的较小值
	if rf.commitIndex < args.LeaderCommit { // AppendEntries rule 5
		rf.commitIndex = Min(args.LeaderCommit, lastNewEntryIndex)
		// DPrintf("%d change commit index to %d  LeaderCommit:%d  lastNewEntryIndex:%d\n", rf.me, rf.commitIndex, args.LeaderCommit, lastNewEntryIndex)
	}

	if needPersist {
		rf.persist()
	}

}

func (rf *Raft) InstallSnapshot(args *InstallSnapshotArgs, reply *InstallSnapshotReply) {
	rf.mu.Lock()
	defer rf.mu.Unlock()
	reply.Term = rf.currentTerm
	if rf.currentTerm > args.Term {
		return
	}

	if rf.currentTerm < args.Term { // // set currentTerm = T, convert to follower
		rf.currentTerm = args.Term
		rf.votedFor = InvalidId // 更新votedFor变量
		rf.state = Follower
		rf.persist() // 直接持久化
	}

	rf.lastInstallSnapshot = time.Now() // 更新接收快照时间
	rf.leaderId = args.LeaderId
	logIndex := args.LastIncludedIndex - rf.lastIncludedIndex // leader快照与本地快照对比
	if logIndex < 0 {
		// WPrintf("local server snapshot is newer. server id:%v lastIncludedIndex:%v  log len:%v\n", rf.me, rf.lastIncludedIndex, len(rf.log))
		return
	}
	if logIndex < len(rf.log) && rf.log[logIndex].Term == args.LastIncludedTerm { // step 5
		DPrintf("same log entry. server id:%v lastIncludedIndex:%v  log len:%v\n", rf.me, rf.lastIncludedIndex, len(rf.log))
		return
	}

	// 设置快照相关状态并持久化
	rf.snapshot = args.Data
	rf.lastIncludedIndex = args.LastIncludedIndex
	rf.lastIncludedTerm = args.LastIncludedTerm
	rf.log = append([]logEntry{}, logEntry{Term: rf.lastIncludedTerm}) // 删除所有日志，重新加入0位日志
	// 更新日志commitIndex
	rf.commitIndex = rf.lastIncludedIndex
	rf.persistStateAndSnapshot()
	DPrintf("%v recv snapshot. lastIncludedIndex:%v log len:%v commit index:%v  apply index:%v\n", rf.me, rf.lastIncludedIndex, len(rf.log), rf.commitIndex, rf.lastApplied)
}

func (rf *Raft) sendRequestVote(server int, args *RequestVoteArgs, reply *RequestVoteReply) bool {
	ok := rf.peers[server].Call("Raft.RequestVote", args, reply)
	return ok
}

// 在远程调用函数中解锁加锁，便于使用defer
func (rf *Raft) sendAppendEntries(server int, args *AppendEntriesArgs, reply *AppendEntriesReply) bool {
	rf.mu.Unlock()
	ok := rf.peers[server].Call("Raft.AppendEntries", args, reply)
	rf.mu.Lock()
	return ok
}

// 在远程调用函数中解锁加锁，便于使用defer
func (rf *Raft) sendInstallSnapshot(server int, args *InstallSnapshotArgs, reply *InstallSnapshotReply) bool {
	rf.mu.Unlock()
	ok := rf.peers[server].Call("Raft.InstallSnapshot", args, reply)
	rf.mu.Lock()
	return ok
}

func (rf *Raft) GetState() (int, bool) {
	rf.mu.Lock()
	term := rf.currentTerm
	isleader := (rf.state == Leader)
	rf.mu.Unlock()
	return term, isleader
}

func (rf *Raft) Start(command interface{}) (int, int, bool) {
	index := -1
	term := -1
	isLeader := true
	rf.mu.Lock()
	defer rf.mu.Unlock()
	if rf.state != Leader {
		isLeader = false
		return index, term, isLeader
	}
	entry := logEntry{Term: rf.currentTerm, Command: command}
	rf.log = append(rf.log, entry)
	index = rf.lastIncludedIndex + len(rf.log) - 1
	term = rf.currentTerm
	// // 同时更新自身nextIndex与matchIndex，便于计算commit index
	// rf.nextIndex[rf.me] = rf.lastIncludedIndex + len(rf.log)
	// rf.matchIndex[rf.me] = rf.lastIncludedIndex + len(rf.log) - 1
	rf.persist()
	go rf.agreementTask() // 进行一轮日志协商
	return index, term, isLeader
}

func (rf *Raft) Kill() {
	atomic.StoreInt32(&rf.dead, 1)
}

func (rf *Raft) killed() bool {
	z := atomic.LoadInt32(&rf.dead)
	return z == 1
}

// 重新计算提交索引
func (rf *Raft) changeCommitIndex() { // leader rule 4
	rf.matchIndex[rf.me] = rf.lastIncludedIndex + len(rf.log) - 1 // 默认与自己匹配
	matchIndex := append([]int{}, rf.matchIndex...)               // 深拷贝
	sort.Ints(matchIndex)

	//  If there exists an N such that N > commitIndex, a majority
	// of matchIndex[i] ≥ N, and log[N].term == currentTerm:
	// set commitIndex = N
	for i := matchIndex[rf.peerNumber-rf.majorityPeer]; i > rf.commitIndex; i-- {
		if rf.log[i-rf.lastIncludedIndex].Term == rf.currentTerm {
			// DPrintf("%d change commindex from %d to %d\n", rf.me, rf.commitIndex, i)
			rf.commitIndex = i
			return
		}
	}
}

// 进行一次日志协商
func (rf *Raft) agreementTask() {
	for i := 0; i < rf.peerNumber; i++ {
		if i == rf.me {
			continue
		}
		go func(server int) {
			rf.mu.Lock()
			defer rf.mu.Unlock()
			for !rf.killed() {
				if rf.state != Leader {
					return
				}
				if rf.nextIndex[server] <= rf.lastIncludedIndex { // 所需日志包含在快照中
					if rf.snapshot == nil || len(rf.snapshot) < 1 {
						log.Panicf("agreementTask: wrong snapshot.\n")
					}
					DPrintf("%v send snapshot to %v  LastIncludedInde:%v\n", rf.me, server, rf.lastIncludedIndex)
					args := InstallSnapshotArgs{Term: rf.currentTerm, LeaderId: rf.me, LastIncludedIndex: rf.lastIncludedIndex, LastIncludedTerm: rf.lastIncludedTerm, Data: rf.snapshot}
					reply := InstallSnapshotReply{}
					res := rf.sendInstallSnapshot(server, &args, &reply) // 在函数中解锁加锁
					if !res {
						// DPrintf("%v send InstallSnapshot to %v failed\n", server, rf.me)
						return
					}

					if rf.state != Leader {
						return
					}

					if args.Term != rf.currentTerm { // old rpc reply
						DPrintf("old InstallSnapshot rpc reply\n")
						return
					}

					if rf.currentTerm < reply.Term { // set currentTerm = T, convert to follower
						rf.currentTerm = reply.Term
						rf.votedFor = InvalidId
						rf.state = Follower
						rf.persist()
						return
					}
					// 发送快照成功
					rf.matchIndex[server] = Max(rf.matchIndex[server], args.LastIncludedIndex)
					rf.nextIndex[server] = rf.matchIndex[server] + 1
					rf.changeCommitIndex()
				} else {
					args := AppendEntriesArgs{Term: rf.currentTerm, LeaderId: rf.me, LeaderCommit: rf.commitIndex}
					args.PrevLogIndex = rf.nextIndex[server] - 1
					args.PrevLogTerm = rf.log[args.PrevLogIndex-rf.lastIncludedIndex].Term
					args.Entries = append([]logEntry{}, rf.log[rf.nextIndex[server]-rf.lastIncludedIndex:]...) // 深拷贝
					reply := AppendEntriesReply{}
					res := rf.sendAppendEntries(server, &args, &reply) // 在函数中解锁加锁
					if !res {
						// DPrintf("%v send AppendEntries to %v failed\n", server, rf.me)
						return
					}

					if rf.state != Leader {
						return
					}

					if args.Term != rf.currentTerm { // old rpc reply
						DPrintf("old AppendEntries rpc reply\n")
						return
					}

					if reply.Success {
						// 仔细设置nextIndex与matchIndex
						rf.matchIndex[server] = Max(rf.matchIndex[server], args.PrevLogIndex+len(args.Entries))
						rf.nextIndex[server] = rf.matchIndex[server] + 1
						rf.changeCommitIndex()
						return
					}

					if rf.currentTerm < reply.Term { // set currentTerm = T, convert to follower
						rf.currentTerm = reply.Term
						rf.votedFor = InvalidId
						rf.state = Follower
						rf.persist()
						return
					}
					// Upon receiving a conflict response, the leader should first search its log for conflictTerm. If it finds an entry in its log with that term, it should set nextIndex to be the one beyond the index of the last entry in that term in its log.
					// If it does not find an entry with that term, it should set nextIndex = conflictIndex.
					if reply.ConflictTerm == NoneTerm {
						rf.nextIndex[server] = reply.ConflictIndex
					} else {
						rf.nextIndex[server] = reply.ConflictIndex
						for i := args.PrevLogIndex; i > rf.lastIncludedIndex; i-- {
							if rf.log[i-rf.lastIncludedIndex].Term == reply.ConflictTerm {
								rf.nextIndex[server] = i
								break
							}
						}
					}
					if rf.nextIndex[server] < 1 {
						log.Panicf("next index set wrong value.")
					}
					DPrintf("next index retry. cur term:%v leader:%d server:%d commit index:%d next index:%d\n", rf.currentTerm, rf.me, server, rf.commitIndex, rf.nextIndex[server])
				}
			}

		}(i)
	}
}

// 向各节点发送心跳消息
func (rf *Raft) heartBeatTask() {
	for !rf.killed() {
		rf.mu.Lock()
		flag := (rf.state == Leader)
		rf.mu.Unlock()
		if !flag {
			return
		}
		go rf.agreementTask()
		time.Sleep(HeartBeatInterval)
	}
}

// 向各节点发送投票信息
func (rf *Raft) sendVoteTask(args RequestVoteArgs) {
	ticket := 1
	for i := 0; i < rf.peerNumber; i++ {
		if i == rf.me {
			continue
		}
		go func(server int) {
			reply := RequestVoteReply{}
			res := rf.sendRequestVote(server, &args, &reply)
			if !res {
				// DPrintf("sendRequestVote failed server:%d me:%d\n", server, rf.me)
				return
			}
			rf.mu.Lock()
			defer rf.mu.Unlock()
			if rf.state != Candidate { // 若当前状态不是候选者则直接返回
				return
			}

			if args.Term != rf.currentTerm {
				DPrintf("old RequestVote reply\n")
				return
			}

			if reply.VoteGranted {
				ticket++
				if ticket >= rf.majorityPeer { // 得到超过半数投票，转变为leader，并开启心跳任务
					rf.state = Leader
					rf.leaderId = rf.me
					// 初始化Leader专属变量
					rf.nextIndex = make([]int, rf.peerNumber)
					nextIndex := rf.lastIncludedIndex + len(rf.log)
					for i := 0; i < rf.peerNumber; i++ {
						rf.nextIndex[i] = nextIndex
					}
					rf.matchIndex = make([]int, rf.peerNumber)
					DPrintf("%d become leader. commit index:%d  next index:%d\n", rf.me, rf.commitIndex, rf.nextIndex[0])
					go rf.heartBeatTask()
				}
			} else if rf.currentTerm < reply.Term { // 返回的任期大于本身任期，转变为Follower，更新任期
				rf.currentTerm = reply.Term
				rf.state = Follower
				rf.votedFor = InvalidId
				rf.persist()
			}
		}(i)
	}
}

func (rf *Raft) ticker() {
	for !rf.killed() {
		// Your code here to check if a leader election should
		// be started and to randomize sleeping time using
		// time.Sleep().
		sleepTime := time.Duration(rand.Intn(300)+300) * time.Millisecond // 300~600ms
		time.Sleep(sleepTime)
		rf.mu.Lock() // 读取状态前加锁
		// && time.Since(rf.lastInstallSnapshot) > sleepTime  论文中没写，就不加上去了
		if rf.state != Leader && time.Since(rf.lastHeartBeat) > sleepTime && time.Since(rf.lastVote) > sleepTime {
			rf.currentTerm++
			rf.votedFor = rf.me
			rf.lastHeartBeat = time.Now()
			rf.state = Candidate
			lastLogIndex := rf.lastIncludedIndex + len(rf.log) - 1
			lastLogTerm := rf.log[lastLogIndex-rf.lastIncludedIndex].Term
			args := RequestVoteArgs{Term: rf.currentTerm, CandidateId: rf.me, LastLogIndex: lastLogIndex, LastLogTerm: lastLogTerm}
			// DPrintf("%d join vote  term：%d\n", rf.me, rf.currentTerm)
			rf.persist()
			go rf.sendVoteTask(args)
		}
		rf.mu.Unlock()
	}
}

// 向service发送命令
func (rf *Raft) applyTask() {
	for !rf.killed() {
		var msg ApplyMsg
		sendMsg := false
		rf.mu.Lock()
		if rf.lastApplied < rf.commitIndex {
			sendMsg = true
			if rf.lastApplied < rf.lastIncludedIndex { // 发送快照
				rf.lastApplied = rf.lastIncludedIndex
				msg = ApplyMsg{CommandValid: false, SnapshotValid: true, Snapshot: rf.snapshot, SnapshotIndex: rf.lastIncludedIndex, SnapshotTerm: rf.lastIncludedTerm}
				DPrintf("applySnapshot. server id:%v commitIndex:%v  lastApplied:%v  lastIncludedIndex:%v\n", rf.me, rf.commitIndex, rf.lastApplied, rf.lastIncludedIndex)
			} else { // 发送日志
				rf.lastApplied++
				msg = ApplyMsg{CommandValid: true, Command: rf.log[rf.lastApplied-rf.lastIncludedIndex].Command, CommandIndex: rf.lastApplied}
			}
		}
		rf.mu.Unlock()
		if sendMsg { // 向service发送消息
			rf.applyCh <- msg
		} else { // 此次未发送消息，休眠10ms后再次查询
			time.Sleep(10 * time.Millisecond)
		}
	}
}

func Make(peers []*labrpc.ClientEnd, me int, persister *Persister, applyCh chan ApplyMsg) *Raft {
	rf := &Raft{}
	rf.peers = peers
	rf.persister = persister
	rf.me = me

	// Your initialization code here (2A, 2B, 2C).
	rf.currentTerm = 0
	rf.votedFor = InvalidId
	rf.log = nil
	rf.commitIndex = 0
	rf.lastApplied = 0
	rf.nextIndex = nil
	rf.matchIndex = nil

	rf.state = Follower
	rf.leaderId = InvalidId
	rf.lastHeartBeat = time.Now()
	rf.lastVote = time.Now()
	rf.lastInstallSnapshot = time.Now()

	rf.lastIncludedIndex = 0
	rf.lastIncludedTerm = 0
	rf.log = append(rf.log, logEntry{Term: 0}) // 增加一条无效命令，作为最开始的lastInclude项

	rf.applyCh = applyCh
	rf.peerNumber = len(peers)
	rf.majorityPeer = (len(peers) / 2) + 1

	// initialize from state persisted before a crash
	rf.readPersist(persister.ReadRaftState())
	rf.snapshot = persister.ReadSnapshot()
	// rf.compareStateAndSnapshot()
	rand.Seed(time.Now().Unix()) // 设置随机种子
	DPrintf("%v start. start index:%v  log len:%v  commit index:%v term:%v", rf.me, rf.lastIncludedIndex, len(rf.log), rf.commitIndex, rf.currentTerm)

	// start ticker goroutine to start elections
	go rf.ticker()
	go rf.applyTask()
	return rf
}

你可能感兴趣的:(国外课程实验,raft,mit6.824,lab2)

自主创新，国产工业操作系统的破局之路一RTOS一鸿道Intewell操作系统工业操作系统新型工业化 Windows实时拓展 INtime国产替代方案
在当今数字化转型加速的时代，工业操作系统在工业领域的重要性日益凸显。然而，国产工业操作系统面临着诸多挑战。目前，国外的一些工业操作系统在市场上占据主导地位，凭借着先发优势、技术积累和完善的生态系统，给国产工业操作系统的发展带来巨大压力。从技术层面来看，国产工业操作系统在实时性、稳定性、兼容性等方面还需要不断提升。例如，在一些高精度的工业控制场景中，对操作系统的实时性要求极高，国产系统要达到国际先进
npm镜像源 jinboliu000 npm 前端
npm镜像源是npm软件包管理器的服务器地址，用于下载和安装npm包。常见的npm镜像源有以下几种：官方源：npm官方提供的默认源，地址是https://registry.npmjs.org/，但由于位于国外，速度较慢。淘宝源：由淘宝团队提供的镜像源，地址是https://registry.npm.taobao.org/，是国内服务器，因此速度较快。cnpm源：另一个由淘宝团队提供的镜像源，地址是
一个真正可用的docker-compse部署单机版kafka 版本2.x garen_dimon 软件研究 docker kafka 容器
注意：kafka3.x版本，Kafka3.x需要Java11或更高版本。确保系统已安装合适的Java版本。Kafka3.x推荐使用ZooKeeper3.5.x或更高版本。确保ZooKeeper集群与Kafka版本兼容。如果你计划使用KRaft模式替换传统的ZooKeeper模式，请确保你已经了解新模式的要求和配置。在网上搜索单机docker-compose部署kafka，出现最多的内容如下：ver
Docker-Compose以KRaft模式快速部署Kafka LUCIAZZZ docker kafka 容器 java 运维 spring boot
我们创建一个docker-compose.yaml文件然后后台启动我们的DockerComposedocker-composeup-d我们修改配置后可以关闭后重启docker-composedowndocker-compose.yaml文件内容version:"3"services:kafka:image:'bitnami/kafka:latest'user:rootenvironment:-KA
【云原生】Docker搭建开源翻译组件Deepl使用详解小码农叔叔 linux与容器实战 docker部署翻译组件 docker部署deepl docker搭建deepl java对接deepl 翻译组件使用
目录一、前言二、微服务项目使用翻译组件的场景2.1多语言用户界面2.2业务逻辑中的翻译需求2.3满足实时通信的要求2.4内容管理系统2.5个性化推荐系统2.6日志和监控三、开源类翻译组件解决方案3.1国内翻译组件方案汇总3.1.1百度翻译3.1.2腾讯翻译3.1.3阿里翻译(通用版)3.1.4华为翻译3.1.5小牛翻译3.1.6有道翻译3.1.7火山翻译3.1.8讯飞翻译3.2国外翻译组件方案汇总
using-aws-s3-buckets-cloudfront-distribution-with-craft-cms 青年夏日科技工作者 aws 云计算 java
UsingacloudstoragesystemlikeAWSS3withaCDNdistributioncanbeaconvenientandinexpensivewaytostoreyourassets.Here’showtosetitupright.Assetslikeimages,PDFs,andotherfilesareoftenanimportantpartofthe“content”
Composer国内镜像源修改教程网友阿贵 ThinkPHP php phpstorm vscode web
在ThinkPHP8.0或其他PHP项目的开发中，使用Composer来管理依赖时，由于默认的源位于国外，可能会导致下载速度慢或连接不稳定的问题。为了解决这个问题，可以切换到国内的镜像源，如阿里云的Composer镜像。以下是更换国内镜像源的步骤：一、全局配置镜像源打开命令行工具（如cmd、PowerShell、终端等）。执行以下命令来配置全局的Composer镜像源为阿里云镜像：composer
什么是反向海淘？如何入局反向海淘？ YONG823_API 大数据
什么是反向海淘？简单来说，反向海淘就是海外消费者通过国内的电商平台或独立站买入中国商品，然后通过跨境物流送到海外。以前是我们在国内买国外的东西，现在反过来，老外开始疯狂种草咱们的国货啦！为什么反向海淘这么火？原因超简单！中国商品不仅性价比超高，而且种类丰富，从数码产品到时尚穿搭，从家居好物到美妆产品，应有尽有。而且，现在的电商平台已经超级给力，不仅支持多语言界面，还提供多种国际支付方式，甚至还有“
RiskCloud-基于Markov算法精准的FTA、 JSA、FMEA软件资讯过客视点算法
这个美美的“花蝴蝶”是什么?样式规整、图案美化、脉络清晰、让人眼前一亮!由上海歌略软件科技有限公司自主研发打造,RiskCloud世界领先的企业级整体风险管理解决方案大作!“BowTie领结图”接下来,就让我们携手一起走进RiskCloud-BowTie领结图,一起领略她的风采吧!风险管理领结图介绍20世纪90年代末,领结图作为一种独特的安全管理工具,开始在国外石油化工领域得到较为广泛的应用。基于
100个高质量ChatGPT学术论文写作提示词分享--系列（一）迪娜学姐人工智能深度学习论文阅读
我是娜姐@迪娜学姐，一个SCI医学期刊编辑，探索用AI工具提效论文写作和发表。ChatGPT学术论文写作高质量提示词分享，今天先分享50个，涵盖论文写作、文献综述、研究方法设计、数据分析、学术演讲准备等方面。1.论文写作(ThesisWriting)1.为[研究主题]制定一个引人入胜的论文标题，突出其创新性和重要性。Craftanengagingthesistitlefor[researchtop
mc服务器优化mod,【教程】minecraft服务器优化教程让你用低配置带更多人！陈悦天 mc服务器优化mod
该楼层疑似违规已被系统折叠隐藏此楼查看此楼相信很多人都知道，开服需要通过一个写着一串代码的启动脚本来启动服务器，启动客户端也同理，只是客户端帮你简化了这件事。现在就来教大家如何通过修改启动Java脚本来优化Minecraft客户/服务端。以下是启动脚本java-server-d64-Xmx3550M-Xms3550M-Xss256k-XX:PermSize=256m-XX:MaxPermSize=
PHP 高性能框架 Workerman 凭什么能硬刚 Swoole ？ A码农先森技术杂谈 php swoole workerman
大家好，我是码农先森。一次偶然看到了国外某机构针对PHP周边生态框架及扩展的性能测试排行榜，看到Workerman竟遥遥领先Swoole。在我们PHP程序员现有的认知里，Swoole作为一个基于C/C++语言编写的扩展程序，性能居然落后了。第一眼看到这个结果的时候，我的心情久久不能平复，脑子里不经的浮现着「难道C/C++比PHP的性能还差了？」。说到Workerman和Swoole，就想起了那不争
Node.js中有关于npm、nrm 的命令，查漏补缺 qiqi-fairy 记录篇 Node npm node.js 前端
1.Node.js中的第三方模块又叫做包来源于：第三方个人或团队开发过来的免费的、开源的为什么需要包：为提高开发效率包基于内置模块开发的国外公司npm、Inc，网站http://www.npmjs.com是全球最大的包共享平台下载：http://registry.npmjs.org服务器上来下载npm:包管理工具npmistall包的完整名称简写：npmi完整的包名称例如：npmimoment2.
【python】常见的python下载库镜像源写代码也摆烂 #python基础知识点 python 开发语言
python中的第三方库大多由国外提供，在国内直接进行下载时，可能会因为访问国外网络较慢，而出现下载超时的报错提醒，为了避免出现类似问题，我们可以在下载库时加入国内的镜像源来下载，这样就不会出现网络较慢的情况前言以下时国内常见的镜像源正文一、下载库的方式在终端输入：pipinstall库名-i镜像源二、国内常见镜像源清华：https://pypi.tuna.tsinghua.edu.cn/simp
大型网站的架构设计问题----大型高并发高负载网站的系统架构 moailian J2EE 架构设计 myspace 数据库服务器数据库 sql server web服务
随着中国大型IT企业信息化速度的加快，大部分应用的数据量和访问量都急剧增加，大型企业网站正面临性能和高数据访问量的压力，而且对存储、安全以及信息检索等等方面都提出了更高的要求……本文中，我想通过几个国外大型IT企业及网站的成功案例，从Web技术人员角度探讨如何积极地应对国内大型网站即将面临的扩展（主要是技术方面，而较少涉及管理及营销等方面）矛盾。一、国外大型IT网站的成功之道(一)MySpace今
Grafter 项目常见问题解决方案虞耀炜
Grafter项目常见问题解决方案grafterGrafterisalibrarytoconfigureandwireScalaapplications项目地址:https://gitcode.com/gh_mirrors/gr/grafter项目基础介绍Grafter是一个用于配置和连接Scala应用程序的库。它通过使用构造函数注入来实现依赖注入，避免了反射、XML配置、实现继承或自类型等复杂机
PingCAP TiDB数据库专员PCTA认证笔记 handsomestWei 数据库 tidb 数据库
tidb-pcta-notePingCAPTiDB数据库专员PCTA认证笔记相关链接官网认证中心TiDB社区体系架构数据库设计存算分离。三层架构：PD（PlacementDriver）负责集群元信息管理和调度，TiDB负责sql计算，TiKV负责存储存储引擎1、基于LSM-Tree的RocksDB引擎，比B-Tree写入更快，用空间置换写入延迟2、数据冗余副本：multiraft-group副本机
通用免杀概论曦梦逐影安全
免杀：病毒木马免于被杀毒软件查杀，基于免杀的技术包含逆向工程、反汇编、系统漏洞等hack技术企业目前标配防护：EDR（终端主机防护）、IPS、IDS、HDR（流量监控）、XDR（相较于EDR更高级），早期的话，基于Server端、Agent端，以及后面更高级的Sass云端部署，早期赛门铁克比较多，目前亚信防毒墙。国内金融、护网：卡巴斯基居多。国外的话强对抗：猎鹰、S1（7x24小时人工智能）架构为
命令中心(Command Center) 开源项目指南宁烈廷
命令中心(CommandCenter)开源项目指南commandcenterStarcraftAIBot项目地址:https://gitcode.com/gh_mirrors/co/commandcenter项目介绍命令中心是一款由DaveChurchill开发的开源工具，旨在提供一个集中化的任务管理和执行平台。它允许开发者和系统管理员便捷地调度、监控和管理各种命令或脚本任务，支持跨平台操作，极大
Rust入门实战编写Minecraft启动器#4下载资源
首发于Enaium的个人博客首先我们需要添加几个依赖。model={path="../model"}parse={path="../parse"}reqwest={version="0.12",features=["blocking","json"]}file-hashing={version="0.1"}sha1={version="0.10"}reqwest用于发送请求，file-hashin
星露谷模组开发教程#7 自定义机器 c#
首发于Enaium的个人博客添加大型工艺品机器也算是大型工艺品，所以我们需要先添加它的大型工艺品。这里做一张16x32格式为png的图。if(e.Name.IsEquivalentTo("Data/BigCraftables")){e.Edit(assets=>{vardict=assets.AsDictionary();dict.Data["Awesome_Orearium"]=newBigCr
Rust入门实战编写Minecraft启动器#2建立资源模型
首发于Enaium的个人博客我们需要声明几个结构体来存储游戏的资源信息，之后我们需要将json文件解析成这几个结构体，所以我们需要添加serde依赖。serde={version="1.0",features=["derive"]}资源相关asset.rsuseserde::Deserialize;usestd::collections::HashMap;#[derive(Deserialize)
Rust入门实战编写Minecraft启动器#3解析资源配置
首发于Enaium的个人博客在上一篇文章中，我们已经建立了资源模型，接下来我们需要解析游戏的配置文件。首先我们添加serde_json依赖和model依赖。model={path="../model"}serde_json="1.0"之后我们在lib.rs中添加解析的trait。pubtraitParse:Sized{typeError;fnparse(value:T)->Result;}之后将所
国外各领域专家学者的一些谏言：如何使AI代理架构变得成功强哥之神人工智能语言模型 AI代理智能体大模型 Agent
最近在研究AI代理架构为什么比较难落地，看到有一篇文章是关于各领域专家学者对AI代理架构的一些看法，值得关注。我将其整理成了中文，大家可一起细品各家观点，全文如下。代理型人工智能被寄予厚望，其潜力在于能够独立完成复杂任务。然而，目前该领域的炒作热潮远超实际成功案例，背后原因复杂多样。“2024年，AI代理已成为众多供应商的营销热词。但对于用户组织而言，代理技术还处于早期探索阶段，充满好奇心与实验性
如何应对访问国外服务器缓慢的问题？SDWAN组网是性价比之选蓝讯小刘服务器运维
在全球化日益加深的今天，企业经常需要访问国外的服务器以进行远程办公、跨国业务处理、数据传输和视频会议等。然而，不少企业在使用中遇到了访问速度缓慢的问题。本文将介绍几种有效的解决方案，帮助提高访问效率。首先，我们来分析一下访问缓慢的原因：1.政策限制：为了维护国家网络的安全与稳定，我国对部分国外网站和服务器有一定的访问限制。2.技术障碍：国内与国际互联网的网络架构和协议存在差异，这可能导致数据传输不
SCI论文审稿：假期审稿慢，该如何有效催稿？（附催稿信模板）迪娜学姐人工智能论文阅读 prompt
我是娜姐@迪娜学姐，一个SCI医学期刊编辑，探索用AI工具提效论文写作和发表。每年12月-来年1月份，正逢圣诞节、春节，再加上年底各单位总结，国内要写本子，国内外期刊审稿人都会变慢。编辑找审稿人也很头疼：一是要错峰找，圣诞期间最好找国内的，春节期间找国外的；二是不回应的、婉拒的审稿人变多，需要不停更新审稿人列表。作者方面来讲，盯着这审稿状态，几个月不动，等的心焦。但是又不知道该如何催稿合适，生怕一
国内外大模型免费访问入口汇总 SmallerFL NLP&机器学习大模型 nlp 自然语言处理深度学习 gpt
1.前言2024年4月18日，清华大学基础模型研究中心发布了《SuperBench大模型综合能力评测报告》，评测涉及到的国内外大模型如下：文中从多个方面进行评测，具体包含：语义评测、代码评测、对齐评测、智能体评测、安全评测等五大方面，见下图：结论：GPT-4系列模型和Claude-3**等国外模型在多个能力上依然处于领先地位**，国内头部大模型表现亮眼，与国际一流模型水平接近，且差距已经逐渐缩小。
从零开始搭建 Maven 私有仓库并上传 Jar 包咕德猫宁丶 maven jar
一、为何搭建Maven私有仓库？在开发过程中，搭建Maven私有仓库有着诸多重要的优势，以下为你详细阐述：加速依赖下载当我们进行项目构建时，如果依赖的是公共的Maven中央仓库，由于使用人数众多且服务器可能位于国外等因素，下载速度往往会受到影响，特别是在网络环境不佳或者需要频繁下载大量依赖的时候，等待时间会很长。而搭建了私有仓库后，对于已经下载过的依赖，后续项目再次使用时可以直接从本地的私有仓库获
TiDB架构分析梦江河大数据 tidb 数据库
TiDB有三部分组成：存储层：TiKV计算层：TiDB调度层：PD（PlaceDriver）存储元数据存储层TiKV1）通过range分区算法将数据分成一个个region；2）每个region默认有3个副本，一个leader副本和两个follower副本，这些副本分布在不同节点上，通过raft协议保证数据一致性；3）如果副本数量发生了变化，pd会及时感知，做出应对措施；计算层TiDB将SQL请求映
TiKV －读写与Coprocessor m0_75231205 tidb
数据的写入日志持久化在rocksdbraft中，kv持久化在rocksdbkv中，Raft保证了数据的多副本一致性。raftstorepool：线程池，收到写请求，将写请求转化为raft日志，持久化日志，将日志发送给其他日志所在节点，其他的raftstorepool负责接收，将日志持久化到rocksdbraft中，当副本大多数TiKV节点返回append成功了，就认为Raft日志做的修改，它com
windows下源码安装golang 616050468 golang安装 golang环境 windows
系统： 64位win7，开发环境：sublime text 2， go版本： 1.4.1 1. 安装前准备(gcc, gdb, git) golang在64位系
redis批量删除带空格的key bylijinnan redis
redis批量删除的通常做法： redis-cli keys "blacklist*" | xargs redis-cli del 上面的命令在key的前后没有空格时是可以的，但有空格就不行了： $redis-cli keys "blacklist*" 1) "blacklist:12: [email protected]
oracle正则表达式的用法 0624chenhong oracle 正则表达式
方括号表达示方括号表达式描述 [[:alnum:]] 字母和数字混合的字符 [[:alpha:]] 字母字符 [[:cntrl:]] 控制字符 [[:digit:]] 数字字符 [[:graph:]] 图像字符 [[:lower:]] 小写字母字符 [[:print:]] 打印字符 [[:punct：]] 标点符号字符 [[:space:]]
2048源码(核心算法有，缺少几个anctionbar，以后补上) 不懂事的小屁孩 2048
2048游戏基本上有四部分组成， 1：主activity，包含游戏块的16个方格，上面统计分数的模块 2：底下的gridview，监听上下左右的滑动，进行事件处理， 3：每一个卡片，里面的内容很简单，只有一个text，记录显示的数字 4：Actionbar，是游戏用重新开始，设置等功能(这个在底下可以下载的代码里面还没有实现) 写代码的流程 1：设计游戏的布局，基本是两块，上面是分
jquery内部链式调用机理换个号韩国红果果 JavaScript jquery
只需要在调用该对象合适(比如下列的setStyles)的方法后让该方法返回该对象（通过this 因为一旦一个函数称为一个对象方法的话那么在这个方法内部this（结合下面的setStyles）指向这个对象） function create(type){ var element=document.createElement(type); //this=element;
你订酒店时的每一次点击背后都是NoSQL和云计算蓝儿唯美 NoSQL
全球最大的在线旅游公司Expedia旗下的酒店预订公司，它运营着89个网站，跨越68个国家，三年前开始实验公有云，以求让客户在预订网站上查询假期酒店时得到更快的信息获取体验。云端本身是用于驱动网站的部分小功能的，如搜索框的自动推荐功能，还能保证处理Hotels.com服务的季节性需求高峰整体储能。 Hotels.com的首席技术官Thierry Bedos上个月在伦敦参加“2015 Clou
java笔记1 a-john java
1，面向对象程序设计（Object-oriented Propramming，OOP）：java就是一种面向对象程序设计。 2，对象：我们将问题空间中的元素及其在解空间中的表示称为“对象”。简单来说，对象是某个类型的实例。比如狗是一个类型，哈士奇可以是狗的一个实例，也就是对象。 3，面向对象程序设计方式的特性： 3.1 万物皆为对象。
C语言 sizeof和strlen之间的那些事 C/C++软件开发求职面试题必备考点（一） aijuans C/C++求职面试必备考点
找工作在即，以后决定每天至少写一个知识点，主要是记录，逼迫自己动手、总结加深印象。当然如果能有一言半语让他人收益，后学幸运之至也。如有错误，还希望大家帮忙指出来。感激不尽。后学保证每个写出来的结果都是自己在电脑上亲自跑过的，咱人笨，以前学的也半吊子。很多时候只能靠运行出来的结果再反过来
程序员写代码时就不要管需求了吗？ asia007 程序员不能一味跟需求走
编程也有2年了，刚开始不懂的什么都跟需求走，需求是怎样就用代码实现就行，也不管这个需求是否合理，是否为较好的用户体验。当然刚开始编程都会这样，但是如果有了2年以上的工作经验的程序员只知道一味写代码，而不在写的过程中思考一下这个需求是否合理，那么，我想这个程序员就只能一辈写敲敲代码了。我的技术不是很好，但是就不代
Activity的四种启动模式百合不是茶 android 栈模式启动 Activity的标准模式启动栈顶模式启动单例模式启动
android界面的操作就是很多个activity之间的切换,启动模式决定启动的activity的生命周期 ; 启动模式xml中配置 <activity android:name=".MainActivity" android:launchMode="standard&quo
Spring中@Autowired标签与@Resource标签的区别 bijian1013 java spring @Resource @Autowired @Qualifier
Spring不但支持自己定义的@Autowired注解，还支持由JSR-250规范定义的几个注解，如：@Resource、 @PostConstruct及@PreDestroy。 1. @Autowired @Autowired是Spring 提供的，需导入 Package:org.springframewo
Changes Between SOAP 1.1 and SOAP 1.2 sunjing Changes Enable SOAP 1.1 SOAP 1.2
JAX-WS SOAP Version 1.2 Part 0: Primer (Second Edition) SOAP Version 1.2 Part 1: Messaging Framework (Second Edition) SOAP Version 1.2 Part 2: Adjuncts (Second Edition) Which style of WSDL
【Hadoop二】Hadoop常用命令 bit1129 hadoop
以Hadoop运行Hadoop自带的wordcount为例， hadoop脚本位于/home/hadoop/hadoop-2.5.2/bin/hadoop，需要说明的是，这些命令的使用必须在Hadoop已经运行的情况下才能执行 Hadoop HDFS相关命令 hadoop fs -ls 列出HDFS文件系统的第一级文件和第一级
java异常处理（初级）白糖_ java DAO spring 虚拟机 Ajax
从学习到现在从事java开发一年多了，个人觉得对java只了解皮毛，很多东西都是用到再去慢慢学习，编程真的是一项艺术，要完成一段好的代码，需要懂得很多。最近项目经理让我负责一个组件开发，框架都由自己搭建，最让我头疼的是异常处理，我看了一些网上的源码，发现他们对异常的处理不是很重视，研究了很久都没有找到很好的解决方案。后来有幸看到一个200W美元的项目部分源码，通过他们对异常处理的解决方案，我终
记录整理-工作问题 braveCS 工作
1）那位同学还是CSV文件默认Excel打开看不到全部结果。以为是没写进去。同学甲说文件应该不分大小。后来log一下原来是有写进去。只是Excel有行数限制。那位同学进步好快啊。 2）今天同学说写文件的时候提示jvm的内存溢出。我马上反应说那就改一下jvm的内存大小。同学说改用分批处理了。果然想问题还是有局限性。改jvm内存大小只能暂时地解决问题，以后要是写更大的文件还是得改内存。想问题要长远啊
org.apache.tools.zip实现文件的压缩和解压，支持中文 bylijinnan apache
刚开始用java.util.Zip，发现不支持中文（网上有修改的方法，但比较麻烦）后改用org.apache.tools.zip org.apache.tools.zip的使用网上有更简单的例子下面的程序根据实际需求，实现了压缩指定目录下指定文件的方法 import java.io.BufferedReader; import java.io.BufferedWrit
读书笔记-4 chengxuyuancsdn 读书笔记
1、JSTL 核心标签库标签 2、避免SQL注入 3、字符串逆转方法 4、字符串比较compareTo 5、字符串替换replace 6、分拆字符串 1、JSTL 核心标签库标签共有13个，学习资料：http://www.cnblogs.com/lihuiyy/archive/2012/02/24/2366806.html 功能上分为4类： (1)表达式控制标签：out
[物理与电子]半导体教材的一个小问题 comsci 问题
各种模拟电子和数字电子教材中都有这个词汇-空穴书中对这个词汇的解释是; 当电子脱离共价键的束缚成为自由电子之后,共价键中就留下一个空位,这个空位叫做空穴我现在回过头翻大学时候的教材,觉得这个
Flashback Database --闪回数据库 daizj oracle 闪回数据库
Flashback 技术是以Undo segment中的内容为基础的，因此受限于UNDO_RETENTON参数。要使用flashback 的特性，必须启用自动撤销管理表空间。在Oracle 10g中， Flash back家族分为以下成员： Flashback Database， Flashback Drop，Flashback Query(分Flashback Query,Flashbac
简单排序:插入排序 dieslrae 插入排序
public void insertSort(int[] array){ int temp; for(int i=1;i<array.length;i++){ temp = array[i]; for(int k=i-1;k>=0;k--)
C语言学习六指针小示例、一维数组名含义，定义一个函数输出数组的内容 dcj3sjt126com c
# include <stdio.h> int main(void) { int * p; //等价于 int *p 也等价于 int* p; int i = 5; char ch = 'A'; //p = 5; //error //p = &ch; //error //p = ch; //error p = &i; //
centos下php redis扩展的安装配置3种方法 dcj3sjt126com redis
方法一 1.下载php redis扩展包代码如下复制代码 #wget http://redis.googlecode.com/files/redis-2.4.4.tar.gz 2 tar -zxvf 解压压缩包，cd /扩展包（进入扩展包然后运行phpize 一下是我环境中phpize的目录，/usr/local/php/bin/phpize (一定要
线程池(Executors) shuizhaosi888 线程池
在java类库中，任务执行的主要抽象不是Thread，而是Executor，将任务的提交过程和执行过程解耦 public interface Executor { void execute(Runnable command); } public class RunMain implements Executor{ @Override pub
openstack 快速安装笔记 haoningabc openstack
前提是要配置好yum源版本icehouse，操作系统redhat6.5 最简化安装，不要cinder和swift 三个节点 172 control节点keystone glance horizon 173 compute节点nova 173 network节点neutron control /etc/sysctl.conf net.ipv4.ip_forward =
从c面向对象的实现理解c++的对象（二） jimmee C++面向对象虚函数
1. 类就可以看作一个struct，类的方法，可以理解为通过函数指针的方式实现的，类对象分配内存时，只分配成员变量的，函数指针并不需要分配额外的内存保存地址。 2. c++中类的构造函数，就是进行内存分配(malloc)，调用构造函数 3. c++中类的析构函数，就时回收内存(free) 4. c++是基于栈和全局数据分配内存的，如果是一个方法内创建的对象，就直接在栈上分配内存了。专门在
如何让那个一个div可以拖动 lingfeng520240 html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml
第10章高级事件（中） onestopweb 事件
index.html <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/
计算两个经纬度之间的距离 roadrunners 计算纬度 LBS 经度距离
要解决这个问题的时候，到网上查了很多方案，最后计算出来的都与百度计算出来的有出入。下面这个公式计算出来的距离和百度计算出来的距离是一致的。 /** * * @param longitudeA * 经度A点 * @param latitudeA * 纬度A点 * @param longitudeB *
最具争议的10个Java话题 tomcat_oracle java
1、Java8已经到来。什么！？ Java8 支持lambda。哇哦，RIP Scala！　　随着Java8 的发布，出现很多关于新发布的Java8是否有潜力干掉Scala的争论，最终的结论是远远没有那么简单。Java8可能已经在Scala的lambda的包围中突围，但Java并非是函数式编程王位的真正觊觎者。　　2、Java 9 即将到来　　 Oracle早在8月份就发布
zoj 3826 Hierarchical Notation(模拟) 阿尔萨斯 rar
题目链接：zoj 3826 Hierarchical Notation 题目大意：给定一些结构体，结构体有value值和key值，Q次询问，输出每个key值对应的value值。解题思路：思路很简单，写个类词法的递归函数，每次将key值映射成一个hash值，用map映射每个key的value起始终止位置，预处理完了查询就很简单了。这题是最后10分钟出的，因为没有考虑value为{}的情