Zookeeper(七)-服务端集群模式-启动流程-2

接上一节继续

10. Follower接收同步数据(Learner.syncWithLeader)

protected void syncWithLeader(long newLeaderZxid) throws IOException, InterruptedException{
    QuorumPacket ack = new QuorumPacket(Leader.ACK, 0, null, null);
    QuorumPacket qp = new QuorumPacket();
    long newEpoch = ZxidUtils.getEpochFromZxid(newLeaderZxid);
    boolean snapshotNeeded = true;
    // 读取数据包
    readPacket(qp);
    // 已经提交的packets
    LinkedList packetsCommitted = new LinkedList();
    // 未提交的packets
    LinkedList packetsNotCommitted = new LinkedList();
    synchronized (zk) {
        if (qp.getType() == Leader.DIFF) {
            LOG.info("Getting a diff from the leader 0x{}", Long.toHexString(qp.getZxid()));
            snapshotNeeded = false;
        } else if (qp.getType() == Leader.SNAP) {
            LOG.info("Getting a snapshot from leader 0x" + Long.toHexString(qp.getZxid()));
            // The leader is going to dump the database clear our own database and read
            // 清空当前内存数据
            zk.getZKDatabase().clear();
            // 反序列化从leader传递过来的snapshotlog
            zk.getZKDatabase().deserializeSnapshot(leaderIs);
            String signature = leaderIs.readString("signature");
            if (!signature.equals("BenWasHere")) {
                LOG.error("Missing signature. Got " + signature);
                throw new IOException("Missing signature");                   
            }
            // 设置lastZxid
            zk.getZKDatabase().setlastProcessedZxid(qp.getZxid());
        } else if (qp.getType() == Leader.TRUNC) {
            // 当前server zxid > leader zxid的情况,清除leader zxid之后的log
            //we need to truncate the log to the lastzxid of the leader
            LOG.warn("Truncating log to get in sync with the leader 0x" + Long.toHexString(qp.getZxid()));
            boolean truncated=zk.getZKDatabase().truncateLog(qp.getZxid());
            if (!truncated) {
                // not able to truncate the log
                LOG.error("Not able to truncate the log " + Long.toHexString(qp.getZxid()));
                System.exit(13);
            }
            zk.getZKDatabase().setlastProcessedZxid(qp.getZxid());
        } else {
            LOG.error("Got unexpected packet from leader "
                    + qp.getType() + " exiting ... " );
            System.exit(13);

        }
        zk.createSessionTracker();
        
        long lastQueued = 0;
        boolean isPreZAB1_0 = true;
        boolean writeToTxnLog = !snapshotNeeded;
        outerLoop:
        while (self.isRunning()) {
            // 循环读取
            readPacket(qp);
            switch(qp.getType()) {
            case Leader.PROPOSAL:
                // 包装消息头-Txnheader 消息体-Recode
                PacketInFlight pif = new PacketInFlight();
                pif.hdr = new TxnHeader();
                // 反序列化消息体
                pif.rec = SerializeUtils.deserializeTxn(qp.getData(), pif.hdr);
                if (pif.hdr.getZxid() != lastQueued + 1) {
                LOG.warn("Got zxid 0x" + Long.toHexString(pif.hdr.getZxid())
                        + " expected 0x" + Long.toHexString(lastQueued + 1));
                }
                lastQueued = pif.hdr.getZxid();
                // PacketInFlight放入未提交链表
                packetsNotCommitted.add(pif);
                break;
            case Leader.COMMIT:
                if (!writeToTxnLog) {
                    pif = packetsNotCommitted.peekFirst();
                    // 不相等说明不连续
                    if (pif.hdr.getZxid() != qp.getZxid()) {
                        LOG.warn("Committing " + qp.getZxid() + ", but next proposal is " + pif.hdr.getZxid());
                    } else {
                        // 更新DataTree
                        zk.processTxn(pif.hdr, pif.rec);
                        // 从notCommitted中删除
                        packetsNotCommitted.remove();
                    }
                } else {
                    packetsCommitted.add(qp.getZxid());
                }
                break;
            ......
            case Leader.UPTODATE:
                // 1.0版本之前
                if (isPreZAB1_0) {
                    zk.takeSnapshot();
                    self.setCurrentEpoch(newEpoch);
                }
                self.cnxnFactory.setZooKeeperServer(zk);                
                break outerLoop;
            case Leader.NEWLEADER: // Getting NEWLEADER here instead of in discovery 
                // 创建updatingEpoch文件,说明正在进行newLeader数据同步,如果已经存在该文件则创建文件失败抛出异常
                File updating = new File(self.getTxnFactory().getSnapDir(), QuorumPeer.UPDATING_EPOCH_FILENAME);
                if (!updating.exists() && !updating.createNewFile()) {
                    throw new IOException("Failed to create " + updating.toString());
                }
                // Diff时不进行snapshot
                if (snapshotNeeded) {
                    // 已经同步结束,生成snapshot文件
                    zk.takeSnapshot();
                }
                // 更新Epoch文件
                self.setCurrentEpoch(newEpoch);
                if (!updating.delete()) {
                    throw new IOException("Failed to delete " + updating.toString());
                }
                writeToTxnLog = true; //Anything after this needs to go to the transaction log, not applied directly in memory
                isPreZAB1_0 = false;
                // 发送ACK
                writePacket(new QuorumPacket(Leader.ACK, newLeaderZxid, null, null), true);
                break;
            }
        }
    }
    ack.setZxid(ZxidUtils.makeZxid(newEpoch, 0));
    // 发送ack
    writePacket(ack, true);
    ......
}
  • readPacket(qp)读取leader发送的:QuorumPacket type:SNAP/TRUNC/DIFF;
  • readPacket(qp)循环读取leader发送数据,先是同步数据,然后是NEWLEADER;
  • DIFF
    1. case Leader.PROPOSAL反序列化出TxnHeader/Recode,包装成PacketInFlight放入链表packetsNotCommitted;
    2. case Leader.COMMIT当前QuorumPacket zxid放入链表packetsCommitted;
  • SNAP
    1.清空内存中ZKDatabase;
    2.反序列化从leader传递过来的snapshotlog;
    3.签名验证;
  • TRUNC
    1.先清空内存中ZKDatabase;
    2.删除txnlog中大于leader zxid的部分;
    3.重新loadDataBase;
  • case Leader.NEWLEADER数据同步之后leader发送NEWLEADER
    1.创建updateEpoch文件,并在设置当前epoch后将其删除。 QuorumPeer.loadDataBase()使用此文件检测在拍摄快照之后设置当前epoch之前服务器宕机的情况;
    2.不是DIFF同步时,同步结束之后生成snapshot文件;
    3.更新epoch,更新成功之后删除updateEpoch文件;
    4.发送leader针对NEWLEADER的ACK;

11. Leader处理NEWLEADER-ACK(LearnerHandler.run)

public void run() {
    ......
    qp = new QuorumPacket();
    // 读取ack
    ia.readRecord(qp, "packet");
    if(qp.getType() != Leader.ACK){
        LOG.error("Next packet was supposed to be an ACK");
        return;
    }
    LOG.info("Received NEWLEADER-ACK message from " + getSid());
    // 等待过半的follower的ack
    leader.waitForNewLeaderAck(getSid(), qp.getZxid(), getLearnerType());
    syncLimitCheck.start();
    // now that the ack has been processed expect the syncLimit
    sock.setSoTimeout(leader.self.tickTime * leader.self.syncLimit);
    // 等待leader启动完成继续往下运行
    synchronized(leader.zk){
        while(!leader.zk.isRunning() && !this.isInterrupted()){
            leader.zk.wait(20);
        }
    }
    // leader启动后,发送UPTODATE
    queuedPackets.add(new QuorumPacket(Leader.UPTODATE, -1, null, null));
    ......
}
  • ia.readRecord(qp, "packet")读取NEWLEADER的ACK;
  • leader.waitForNewLeaderAck(getSid(), qp.getZxid(), getLearnerType())等待收到过半的服务器的NEWLEADER-ACK之后继续执行;
  • leader.zk.wait(20)2. Leader.lead中调用startZkServer(),zkServer启动之后state置为RUNNING,否则阻塞等待;
  • queuedPackets.add(new QuorumPacket(Leader.UPTODATE, -1, null, null))发送UPTODATE,告知follower leader已经启动完成;

12. Follower处理UPTODATE(Learner.syncWithLeader)

protected void syncWithLeader(long newLeaderZxid) throws IOException, InterruptedException{
    ......
    case Leader.UPTODATE:
        // 1.0版本之前
        if (isPreZAB1_0) {
            zk.takeSnapshot();
            self.setCurrentEpoch(newEpoch);
        }
        self.cnxnFactory.setZooKeeperServer(zk);                
        break outerLoop;
    ......
    ack.setZxid(ZxidUtils.makeZxid(newEpoch, 0));
    // 发送ack
    writePacket(ack, true);
        sock.setSoTimeout(self.tickTime * self.syncLimit);
    // 启动ZooKeeper
    zk.startup();
    self.updateElectionVote(newEpoch);
    // 再以snapshot的方式同步数据时,开始同步数据到follower 收到uptodate数据包之间有可能leader也有数据变更,这时需要将这部分数据提交。
    // 经过以上的步骤之后,follower 服务启动完成。作为follower 它一方面需要处理来自客户端的请求,
    // 一方面也需要处理来自leader 的心跳数据、proposal、commit请求
    if (zk instanceof FollowerZooKeeperServer) {
        FollowerZooKeeperServer fzk = (FollowerZooKeeperServer)zk;
        for(PacketInFlight p: packetsNotCommitted) {
            // 写入日志文件
            fzk.logRequest(p.hdr, p.rec);
        }
        for(Long zxid: packetsCommitted) {
            fzk.commit(zxid);
        }
    } 
    ......
}
  • case Leader.UPTODATE
    1.LearnerZooKeeperServer设置ServerCnxnFactory,默认NIOServerCnxnFactory,可以开始接收客户端请求;
    2.循环读取leader发送数据结束;
  • writePacket(ack, true)发送leader针对数据同步的ACK;
  • zk.startup()启动ZooKeeperServer;
  • fzk.logRequest(p.hdr, p.rec)
  • fzk.commit(zxid)构造Request入队SyncRequestProcessor.queuedRequests,写入日志文件(参考Zookeeper(五)-服务端单机模式-请求处理);
  • fzk.commit(zxid)Request放入链表CommitProcessor.committedRequests(后续分析请求流程时再详细分析);
    至此follower端启动流程结束

12. Leader处理ACK(LearnerHandler.run)

public void run() {
    ......
    // 同步完毕,等待接收follower或observer请求
    while (true) {
        qp = new QuorumPacket();
        // 读取ack
        ia.readRecord(qp, "packet");
        switch (qp.getType()) {
        case Leader.ACK:
            syncLimitCheck.updateAck(qp.getZxid());
            leader.processAck(this.sid, qp.getZxid(), sock.getLocalSocketAddress());
            break;
    ......
}
  • ia.readRecord(qp, "packet")读取ACK(该while循环还会处理follower或observer的请他请求,这里只分析启动过程的ACK);
  • leader.processAck此时outstandingProposals为空,直接返回;
    至此leader端启动流程结束
    ---------over----------

你可能感兴趣的:(Zookeeper(七)-服务端集群模式-启动流程-2)