接上一节继续
10. Follower接收同步数据(Learner.syncWithLeader)
protected void syncWithLeader(long newLeaderZxid) throws IOException, InterruptedException{
QuorumPacket ack = new QuorumPacket(Leader.ACK, 0, null, null);
QuorumPacket qp = new QuorumPacket();
long newEpoch = ZxidUtils.getEpochFromZxid(newLeaderZxid);
boolean snapshotNeeded = true;
// 读取数据包
readPacket(qp);
// 已经提交的packets
LinkedList packetsCommitted = new LinkedList();
// 未提交的packets
LinkedList packetsNotCommitted = new LinkedList();
synchronized (zk) {
if (qp.getType() == Leader.DIFF) {
LOG.info("Getting a diff from the leader 0x{}", Long.toHexString(qp.getZxid()));
snapshotNeeded = false;
} else if (qp.getType() == Leader.SNAP) {
LOG.info("Getting a snapshot from leader 0x" + Long.toHexString(qp.getZxid()));
// The leader is going to dump the database clear our own database and read
// 清空当前内存数据
zk.getZKDatabase().clear();
// 反序列化从leader传递过来的snapshotlog
zk.getZKDatabase().deserializeSnapshot(leaderIs);
String signature = leaderIs.readString("signature");
if (!signature.equals("BenWasHere")) {
LOG.error("Missing signature. Got " + signature);
throw new IOException("Missing signature");
}
// 设置lastZxid
zk.getZKDatabase().setlastProcessedZxid(qp.getZxid());
} else if (qp.getType() == Leader.TRUNC) {
// 当前server zxid > leader zxid的情况,清除leader zxid之后的log
//we need to truncate the log to the lastzxid of the leader
LOG.warn("Truncating log to get in sync with the leader 0x" + Long.toHexString(qp.getZxid()));
boolean truncated=zk.getZKDatabase().truncateLog(qp.getZxid());
if (!truncated) {
// not able to truncate the log
LOG.error("Not able to truncate the log " + Long.toHexString(qp.getZxid()));
System.exit(13);
}
zk.getZKDatabase().setlastProcessedZxid(qp.getZxid());
} else {
LOG.error("Got unexpected packet from leader "
+ qp.getType() + " exiting ... " );
System.exit(13);
}
zk.createSessionTracker();
long lastQueued = 0;
boolean isPreZAB1_0 = true;
boolean writeToTxnLog = !snapshotNeeded;
outerLoop:
while (self.isRunning()) {
// 循环读取
readPacket(qp);
switch(qp.getType()) {
case Leader.PROPOSAL:
// 包装消息头-Txnheader 消息体-Recode
PacketInFlight pif = new PacketInFlight();
pif.hdr = new TxnHeader();
// 反序列化消息体
pif.rec = SerializeUtils.deserializeTxn(qp.getData(), pif.hdr);
if (pif.hdr.getZxid() != lastQueued + 1) {
LOG.warn("Got zxid 0x" + Long.toHexString(pif.hdr.getZxid())
+ " expected 0x" + Long.toHexString(lastQueued + 1));
}
lastQueued = pif.hdr.getZxid();
// PacketInFlight放入未提交链表
packetsNotCommitted.add(pif);
break;
case Leader.COMMIT:
if (!writeToTxnLog) {
pif = packetsNotCommitted.peekFirst();
// 不相等说明不连续
if (pif.hdr.getZxid() != qp.getZxid()) {
LOG.warn("Committing " + qp.getZxid() + ", but next proposal is " + pif.hdr.getZxid());
} else {
// 更新DataTree
zk.processTxn(pif.hdr, pif.rec);
// 从notCommitted中删除
packetsNotCommitted.remove();
}
} else {
packetsCommitted.add(qp.getZxid());
}
break;
......
case Leader.UPTODATE:
// 1.0版本之前
if (isPreZAB1_0) {
zk.takeSnapshot();
self.setCurrentEpoch(newEpoch);
}
self.cnxnFactory.setZooKeeperServer(zk);
break outerLoop;
case Leader.NEWLEADER: // Getting NEWLEADER here instead of in discovery
// 创建updatingEpoch文件,说明正在进行newLeader数据同步,如果已经存在该文件则创建文件失败抛出异常
File updating = new File(self.getTxnFactory().getSnapDir(), QuorumPeer.UPDATING_EPOCH_FILENAME);
if (!updating.exists() && !updating.createNewFile()) {
throw new IOException("Failed to create " + updating.toString());
}
// Diff时不进行snapshot
if (snapshotNeeded) {
// 已经同步结束,生成snapshot文件
zk.takeSnapshot();
}
// 更新Epoch文件
self.setCurrentEpoch(newEpoch);
if (!updating.delete()) {
throw new IOException("Failed to delete " + updating.toString());
}
writeToTxnLog = true; //Anything after this needs to go to the transaction log, not applied directly in memory
isPreZAB1_0 = false;
// 发送ACK
writePacket(new QuorumPacket(Leader.ACK, newLeaderZxid, null, null), true);
break;
}
}
}
ack.setZxid(ZxidUtils.makeZxid(newEpoch, 0));
// 发送ack
writePacket(ack, true);
......
}
-
readPacket(qp)
读取leader发送的:QuorumPacket type:SNAP/TRUNC/DIFF; -
readPacket(qp)
循环读取leader发送数据,先是同步数据,然后是NEWLEADER; -
DIFF
1. case Leader.PROPOSAL
反序列化出TxnHeader/Recode,包装成PacketInFlight放入链表packetsNotCommitted;
2. case Leader.COMMIT
当前QuorumPacket zxid放入链表packetsCommitted; -
SNAP
1.
清空内存中ZKDatabase;
2.
反序列化从leader传递过来的snapshotlog;
3.
签名验证; -
TRUNC
1.
先清空内存中ZKDatabase;
2.
删除txnlog中大于leader zxid的部分;
3.
重新loadDataBase; -
case Leader.NEWLEADER
数据同步之后leader发送NEWLEADER
1.
创建updateEpoch文件,并在设置当前epoch后将其删除。 QuorumPeer.loadDataBase()使用此文件检测在拍摄快照之后设置当前epoch之前服务器宕机的情况;
2.
不是DIFF同步时,同步结束之后生成snapshot文件;
3.
更新epoch,更新成功之后删除updateEpoch文件;
4.
发送leader针对NEWLEADER的ACK;
11. Leader处理NEWLEADER-ACK(LearnerHandler.run)
public void run() {
......
qp = new QuorumPacket();
// 读取ack
ia.readRecord(qp, "packet");
if(qp.getType() != Leader.ACK){
LOG.error("Next packet was supposed to be an ACK");
return;
}
LOG.info("Received NEWLEADER-ACK message from " + getSid());
// 等待过半的follower的ack
leader.waitForNewLeaderAck(getSid(), qp.getZxid(), getLearnerType());
syncLimitCheck.start();
// now that the ack has been processed expect the syncLimit
sock.setSoTimeout(leader.self.tickTime * leader.self.syncLimit);
// 等待leader启动完成继续往下运行
synchronized(leader.zk){
while(!leader.zk.isRunning() && !this.isInterrupted()){
leader.zk.wait(20);
}
}
// leader启动后,发送UPTODATE
queuedPackets.add(new QuorumPacket(Leader.UPTODATE, -1, null, null));
......
}
-
ia.readRecord(qp, "packet")
读取NEWLEADER的ACK; -
leader.waitForNewLeaderAck(getSid(), qp.getZxid(), getLearnerType())
等待收到过半的服务器的NEWLEADER-ACK之后继续执行; -
leader.zk.wait(20)
2. Leader.lead中调用startZkServer(),zkServer启动之后state置为RUNNING,否则阻塞等待; -
queuedPackets.add(new QuorumPacket(Leader.UPTODATE, -1, null, null))
发送UPTODATE,告知follower leader已经启动完成;
12. Follower处理UPTODATE(Learner.syncWithLeader)
protected void syncWithLeader(long newLeaderZxid) throws IOException, InterruptedException{
......
case Leader.UPTODATE:
// 1.0版本之前
if (isPreZAB1_0) {
zk.takeSnapshot();
self.setCurrentEpoch(newEpoch);
}
self.cnxnFactory.setZooKeeperServer(zk);
break outerLoop;
......
ack.setZxid(ZxidUtils.makeZxid(newEpoch, 0));
// 发送ack
writePacket(ack, true);
sock.setSoTimeout(self.tickTime * self.syncLimit);
// 启动ZooKeeper
zk.startup();
self.updateElectionVote(newEpoch);
// 再以snapshot的方式同步数据时,开始同步数据到follower 收到uptodate数据包之间有可能leader也有数据变更,这时需要将这部分数据提交。
// 经过以上的步骤之后,follower 服务启动完成。作为follower 它一方面需要处理来自客户端的请求,
// 一方面也需要处理来自leader 的心跳数据、proposal、commit请求
if (zk instanceof FollowerZooKeeperServer) {
FollowerZooKeeperServer fzk = (FollowerZooKeeperServer)zk;
for(PacketInFlight p: packetsNotCommitted) {
// 写入日志文件
fzk.logRequest(p.hdr, p.rec);
}
for(Long zxid: packetsCommitted) {
fzk.commit(zxid);
}
}
......
}
-
case Leader.UPTODATE
1.
LearnerZooKeeperServer设置ServerCnxnFactory,默认NIOServerCnxnFactory,可以开始接收客户端请求;
2.
循环读取leader发送数据结束; -
writePacket(ack, true)
发送leader针对数据同步的ACK; -
zk.startup()
启动ZooKeeperServer; fzk.logRequest(p.hdr, p.rec)
-
fzk.commit(zxid)
构造Request入队SyncRequestProcessor.queuedRequests,写入日志文件(参考Zookeeper(五)-服务端单机模式-请求处理); -
fzk.commit(zxid)
Request放入链表CommitProcessor.committedRequests(后续分析请求流程时再详细分析);
至此follower端启动流程结束
12. Leader处理ACK(LearnerHandler.run)
public void run() {
......
// 同步完毕,等待接收follower或observer请求
while (true) {
qp = new QuorumPacket();
// 读取ack
ia.readRecord(qp, "packet");
switch (qp.getType()) {
case Leader.ACK:
syncLimitCheck.updateAck(qp.getZxid());
leader.processAck(this.sid, qp.getZxid(), sock.getLocalSocketAddress());
break;
......
}
-
ia.readRecord(qp, "packet")
读取ACK(该while循环还会处理follower或observer的请他请求,这里只分析启动过程的ACK); -
leader.processAck
此时outstandingProposals为空,直接返回;
至此leader端启动流程结束
---------over----------