zk源码阅读41:Leader和Learner的交互:LearnerHandler源码分析

摘要

为了保证整个集群内部的实时通信,同时为了确保可以控制所有的Follower/Observer服务器,Leader服务器会与每个Follower/Observer服务器建立一个TCP长连接。同时也会为每个Follower/Observer服务器创建一个名为LearnerHandler的实体。LearnerHandler是Learner服务器的管理者,主要负责Follower/Observer服务器和Leader服务器之间的一系列网络通信,包括数据同步、请求转发和Proposal提议的投票等。Leader服务器中保存了所有Follower/Observer对应的LearnerHandler。

讲解内容如下

简介
内部类
  SyncLimitCheck:控制leader等待当前learner给proposal回复ACK的时间
属性
  tickOfNextAckDeadline:下次回复ACK的deadline(周期数,不是时间)
方法
  构造函数
  发送packet
  验证,关闭相关
    shutdown:关闭handler
    ping:检测leader与learner是否有proposal超时了,发出ping命令
    synced:是否同步,根据tickOfNextAckDeadline和线程状态判断
  线程方法run:解析Learner信息,发送Leader状态,完成数据同步,启动Leader服务器,和Learner正常交互

简介

LearnerHandler主要是处理Leader和Learner之间的交互.
Leader和每个Learner连接会维持一个长连接,并有一个单独的LearnerHandler线程和一个Learner进行交互
可以认为,Learner和LearnerHandler是一一对应的关系.

内部类

SyncLimitCheck:作用就是控制leader等待当前learner给proposal回复ACK的时间

在Leader发出proposal时更新对应时间,zxid记录
在Leader收到对应ACK时,清除对应zxid的记录
检查时,判断当前时间和最早已经发出proposal但是没有收到ack的时间对比,看是否超时

    private class SyncLimitCheck 
        private boolean started = false;
        private long currentZxid = 0;//最久一次更新了但是没有收到ack的proposal的zxid
        private long currentTime = 0;//最久一次更新了但是没有收到ack的proposal的时间
        private long nextZxid = 0;//最新一次更新了但是没有收到ack的proposal的zxid
        private long nextTime = 0;//最新一次更新了但是没有收到ack的proposal的时间

        public synchronized void start() {//启动同步超时检测
            started = true;
        }

        public synchronized void updateProposal(long zxid, long time) {//发送proposal时,更新提议的统计时间
            if (!started) {
                return;
            }
            if (currentTime == 0) {//如果还没初始化就初始化
                currentTime = time;
                currentZxid = zxid;
            } else {//如果已经初始化,就记录下一次的时间
                nextTime = time;
                nextZxid = zxid;
            }
        }

        public synchronized void updateAck(long zxid) {//收到Learner关于zxid的ack了,更新ack的统计时间
             if (currentZxid == zxid) {//如果是刚刚发送的ack
                 currentTime = nextTime;//传递到下一个记录
                 currentZxid = nextZxid;
                 nextTime = 0;
                 nextZxid = 0;
             } else if (nextZxid == zxid) {//如果旧的ack还没收到 但是收到了 新的ack
                 LOG.warn("ACK for " + zxid + " received before ACK for " + currentZxid + "!!!!");
                 nextTime = 0;
                 nextZxid = 0;
             }
        }

        public synchronized boolean check(long time) {//如果没有等待超时,返回true
            if (currentTime == 0) {
                return true;
            } else {
                long msDelay = (time - currentTime) / 1000000;//当前时间与最久一次没收到ack的proposal的时间差
                return (msDelay < (leader.self.tickTime * leader.self.syncLimit));
            }
        }
    };

属性

LearnerHandler属性

除了IO,sock以外,其余部分袁娜如下

    final Leader leader;//对应Leader角色

    /** Deadline for receiving the next ack. If we are bootstrapping then
     * it's based on the initLimit, if we are done bootstrapping it's based
     * on the syncLimit. Once the deadline is past this learner should
     * be considered no longer "sync'd" with the leader. */
    volatile long tickOfNextAckDeadline;//下一个接收ack的deadline,启动时(数据同步)是一个标准,完成启动后(正常交互),是另一个标准
    
    /**
     * ZooKeeper server identifier of this learner
     */
    protected long sid = 0;//当前这个learner的sid

    protected int version = 0x1;//当前这个learner的version

    final LinkedBlockingQueue queuedPackets =
        new LinkedBlockingQueue();//待发送packet的队列

    private SyncLimitCheck syncLimitCheck = new SyncLimitCheck();//proposal,ack检测

    final QuorumPacket proposalOfDeath = new QuorumPacket();//代表一个关闭shutdown的packet来关闭发送packet的线程

    private LearnerType  learnerType = LearnerType.PARTICIPANT;//默认的learner类型(也叫Follower),也可以设置为OBSERVER

方法

LearnerHandler方法

看得出来重要函数只有几个

构造函数

调用方是Leader,之后再讲

    LearnerHandler(Socket sock, Leader leader) throws IOException {//sock已经连接上了
        super("LearnerHandler-" + sock.getRemoteSocketAddress());
        this.sock = sock;
        this.leader = leader;
        leader.addLearnerHandler(this);//记录在leader的LearnerHandler集合中
    }

发送packet

queuePacket

将packet加入异步发送队列

    void queuePacket(QuorumPacket p) {
        queuedPackets.add(p);
    }

sendPackets

消费 不断消费发送队列,遇到proposalOfDeath就break,遇到Proposal则进行对应的记录

    private void sendPackets() throws InterruptedException {//消费queuedPackets,发送消息,直到接受到proposalOfDeath的packet
        long traceMask = ZooTrace.SERVER_PACKET_TRACE_MASK;
        while (true) {
            try {
                QuorumPacket p;
                p = queuedPackets.poll();
                if (p == null) {
                    bufferedOutput.flush();
                    p = queuedPackets.take();
                }

                if (p == proposalOfDeath) {//如果调用了shutDown
                    // Packet of death!
                    break;
                }
                if (p.getType() == Leader.PING) {
                    traceMask = ZooTrace.SERVER_PING_TRACE_MASK;
                }
                if (p.getType() == Leader.PROPOSAL) {
                    syncLimitCheck.updateProposal(p.getZxid(), System.nanoTime());//更新当前proposal的时间统计
                }
                if (LOG.isTraceEnabled()) {
                    ZooTrace.logQuorumPacket(LOG, traceMask, 'o', p);
                }
                oa.writeRecord(p, "packet");
            } catch (IOException e) {
                if (!sock.isClosed()) {
                    LOG.warn("Unexpected exception at " + this, e);
                    try {
                        // this will cause everything to shutdown on
                        // this learner handler and will help notify
                        // the learner/observer instantaneously
                        sock.close();
                    } catch(IOException ie) {
                        LOG.warn("Error closing socket for handler " + this, ie);
                    }
                }
                break;
            }
        }
    }

验证,关闭相关

shutdown

关闭当前handler,sock,以及让sendPackets的异步线程最终停止

    public void shutdown() {
        // Send the packet of death
        try {
            queuedPackets.put(proposalOfDeath);
        } catch (InterruptedException e) {
            LOG.warn("Ignoring unexpected exception", e);
        }
        try {
            if (sock != null && !sock.isClosed()) {
                sock.close();
            }
        } catch (IOException e) {
            LOG.warn("Ignoring unexpected exception during socket close", e);
        }
        this.interrupt();
        leader.removeLearnerHandler(this);
    }

ping

发送ping命令,本质就是检测leader与learner是否有proposal超时了(超过指定时长没有收到ack)

    public void ping() {
        long id;
        if (syncLimitCheck.check(System.nanoTime())) {//如果还没有超时
            synchronized(leader) {
                id = leader.lastProposed;
            }
            QuorumPacket ping = new QuorumPacket(Leader.PING, id, null, null);
            queuePacket(ping);
        } else {//如果已经超时,就关闭这个handler
            LOG.warn("Closing connection to peer due to transaction timeout.");
            shutdown();//关闭当前handler,sock,以及让syncLimitCheck任务最终停止
        }
    }

synced

是否保持同步

    public boolean synced() {
        return isAlive()
        && leader.self.tick <= tickOfNextAckDeadline;//线程活着,且当前周期数

tickOfNextAckDeadline 在下面思考中会提到

线程方法

run方法是这个类最核心的方法,完成了和Learner的启动时的数据同步,同步完成后进行正常的交互

    public void run() {
        try {
            tickOfNextAckDeadline = leader.self.tick
                    + leader.self.initLimit + leader.self.syncLimit;//初始化,是leader当前周期(leader.self.tick) 再加上初始化以及同步的limit(initLimit + syncLimit)

            ia = BinaryInputArchive.getArchive(new BufferedInputStream(sock
                    .getInputStream()));
            bufferedOutput = new BufferedOutputStream(sock.getOutputStream());
            oa = BinaryOutputArchive.getArchive(bufferedOutput);

            QuorumPacket qp = new QuorumPacket();
            ia.readRecord(qp, "packet");
            if(qp.getType() != Leader.FOLLOWERINFO && qp.getType() != Leader.OBSERVERINFO){
                LOG.error("First packet " + qp.toString()
                        + " is not FOLLOWERINFO or OBSERVERINFO!");
                return;
            }
            byte learnerInfoData[] = qp.getData();//接收learner发送过来的LearnerInfo,包含sid
            if (learnerInfoData != null) {
                if (learnerInfoData.length == 8) {
                    ByteBuffer bbsid = ByteBuffer.wrap(learnerInfoData);
                    this.sid = bbsid.getLong();
                } else {
                    LearnerInfo li = new LearnerInfo();
                    ByteBufferInputStream.byteBuffer2Record(ByteBuffer.wrap(learnerInfoData), li);
                    this.sid = li.getServerid();
                    this.version = li.getProtocolVersion();
                }
            } else {
                this.sid = leader.followerCounter.getAndDecrement();
            }

            LOG.info("Follower sid: " + sid + " : info : "
                    + leader.self.quorumPeers.get(sid));
                        
            if (qp.getType() == Leader.OBSERVERINFO) {
                  learnerType = LearnerType.OBSERVER;
            }            
            
            long lastAcceptedEpoch = ZxidUtils.getEpochFromZxid(qp.getZxid());//记录当前learner的最新的epoch
            
            long peerLastZxid;
            StateSummary ss = null;
            long zxid = qp.getZxid();
            long newEpoch = leader.getEpochToPropose(this.getSid(), lastAcceptedEpoch);//6.如果learner的epoch比自己高,更新自己的
            
            if (this.getVersion() < 0x10000) {//leader是旧版本
                // we are going to have to extrapolate the epoch information
                long epoch = ZxidUtils.getEpochFromZxid(zxid);
                ss = new StateSummary(epoch, zxid);
                // fake the message
                leader.waitForEpochAck(this.getSid(), ss);
            } else {//leader是新版本
                byte ver[] = new byte[4];
                ByteBuffer.wrap(ver).putInt(0x10000);
                QuorumPacket newEpochPacket = new QuorumPacket(Leader.LEADERINFO, ZxidUtils.makeZxid(newEpoch, 0), ver, null); //7.发送leader状态,以LEADERINFO的形式
                oa.writeRecord(newEpochPacket, "packet");
                bufferedOutput.flush();
                QuorumPacket ackEpochPacket = new QuorumPacket();
                ia.readRecord(ackEpochPacket, "packet");//9.接收learner的ACKEPOCH
                if (ackEpochPacket.getType() != Leader.ACKEPOCH) {
                    LOG.error(ackEpochPacket.toString()
                            + " is not ACKEPOCH");
                    return;
                }
                ByteBuffer bbepoch = ByteBuffer.wrap(ackEpochPacket.getData());
                ss = new StateSummary(bbepoch.getInt(), ackEpochPacket.getZxid());
                leader.waitForEpochAck(this.getSid(), ss);
            }
            peerLastZxid = ss.getLastZxid();
            
            /* the default to send to the follower */
            int packetToSend = Leader.SNAP;
            long zxidToSend = 0;
            long leaderLastZxid = 0;
            /** the packets that the follower needs to get updates from **/
            long updates = peerLastZxid;
            
            /* we are sending the diff check if we have proposals in memory to be able to 
             * send a diff to the 
             */ 
            ReentrantReadWriteLock lock = leader.zk.getZKDatabase().getLogLock();
            ReadLock rl = lock.readLock();
            try {
                rl.lock();        
                final long maxCommittedLog = leader.zk.getZKDatabase().getmaxCommittedLog();//内存中记录的最大事务日志的id
                final long minCommittedLog = leader.zk.getZKDatabase().getminCommittedLog();//内存中记录的最小事务日志的id
                LOG.info("Synchronizing with Follower sid: " + sid
                        +" maxCommittedLog=0x"+Long.toHexString(maxCommittedLog)
                        +" minCommittedLog=0x"+Long.toHexString(minCommittedLog)
                        +" peerLastZxid=0x"+Long.toHexString(peerLastZxid));

                LinkedList proposals = leader.zk.getZKDatabase().getCommittedLog();//获取提交的Proposal, packet的type都是Leader.PROPOSAL

                if (peerLastZxid == leader.zk.getZKDatabase().getDataTreeLastProcessedZxid()) {
                    // Follower is already sync with us, send empty diff
                    LOG.info("leader and follower are in sync, zxid=0x{}",
                            Long.toHexString(peerLastZxid));
                    packetToSend = Leader.DIFF;
                    zxidToSend = peerLastZxid;//如果learner已经是同步的了,也发送DIFF,只是发送的zxidToSend和learner本地一样,相当于空的DIFF
                } else if (proposals.size() != 0) {
                    LOG.debug("proposal size is {}", proposals.size());
                    if ((maxCommittedLog >= peerLastZxid)
                            && (minCommittedLog <= peerLastZxid)) {//如果learner的zxid在leader的[minCommittedLog, maxCommittedLog]范围内
                        LOG.debug("Sending proposals to follower");

                        // as we look through proposals, this variable keeps track of previous
                        // proposal Id.
                        long prevProposalZxid = minCommittedLog;

                        // Keep track of whether we are about to send the first packet.
                        // Before sending the first packet, we have to tell the learner
                        // whether to expect a trunc or a diff
                        boolean firstPacket=true;

                        // If we are here, we can use committedLog to sync with
                        // follower. Then we only need to decide whether to
                        // send trunc or not
                        packetToSend = Leader.DIFF;//默认是DIFF操作
                        zxidToSend = maxCommittedLog;

                        for (Proposal propose: proposals) {
                            // skip the proposals the peer already has
                            if (propose.packet.getZxid() <= peerLastZxid) {//leader提交的proposal已经被learner处理过了,那么就跳过
                                prevProposalZxid = propose.packet.getZxid();
                                continue;
                            } else {
                                // If we are sending the first packet, figure out whether to trunc
                                // in case the follower has some proposals that the leader doesn't
                                if (firstPacket) {//第一个发送的packet
                                    firstPacket = false;
                                    // Does the peer have some proposals that the leader hasn't seen yet
                                    if (prevProposalZxid < peerLastZxid) {//如果learner有一些leader不知道的请求(正常来说应该是prevProposalZxid == peerLastZxid)
                                        // send a trunc message before sending the diff
                                        packetToSend = Leader.TRUNC;//让learner回滚
                                        zxidToSend = prevProposalZxid;
                                        updates = zxidToSend;//让learner回滚
                                    }
                                }
                                queuePacket(propose.packet);//发送PROPOSAL
                                QuorumPacket qcommit = new QuorumPacket(Leader.COMMIT, propose.packet.getZxid(),
                                        null, null);
                                queuePacket(qcommit);//让刚刚的PROPOSAL进行COMMIT,让learner同步
                            }
                        }
                    } else if (peerLastZxid > maxCommittedLog) {//learner的zxid比leader的大,让learner回滚
                        LOG.debug("Sending TRUNC to follower zxidToSend=0x{} updates=0x{}",
                                Long.toHexString(maxCommittedLog),
                                Long.toHexString(updates));

                        packetToSend = Leader.TRUNC;
                        zxidToSend = maxCommittedLog;
                        updates = zxidToSend;
                    } else {
                        LOG.warn("Unhandled proposal scenario");
                    }
                } else {
                    // just let the state transfer happen
                    LOG.debug("proposals is empty");
                }               

                LOG.info("Sending " + Leader.getPacketType(packetToSend));
                leaderLastZxid = leader.startForwarding(this, updates);

            } finally {
                rl.unlock();
            }

             QuorumPacket newLeaderQP = new QuorumPacket(Leader.NEWLEADER,
                    ZxidUtils.makeZxid(newEpoch, 0), null, null);//生成NEWLEADER的packet,发给learner代表自己需要同步的信息发完了
             if (getVersion() < 0x10000) {
                oa.writeRecord(newLeaderQP, "packet");
            } else {
                queuedPackets.add(newLeaderQP);//加入发送队列
            }
            bufferedOutput.flush();//发送NEWLEADER消息
            //Need to set the zxidToSend to the latest zxid
            if (packetToSend == Leader.SNAP) {//如果是SNAP同步,获取zxid
                zxidToSend = leader.zk.getZKDatabase().getDataTreeLastProcessedZxid();
            }
            oa.writeRecord(new QuorumPacket(packetToSend, zxidToSend, null, null), "packet");//告诉learner如何同步
            bufferedOutput.flush();
            
            /* if we are not truncating or sending a diff just send a snapshot */
            if (packetToSend == Leader.SNAP) {//如果发出snap,代表告知learner进行snap方式的数据同步
                LOG.info("Sending snapshot last zxid of peer is 0x"
                        + Long.toHexString(peerLastZxid) + " " 
                        + " zxid of leader is 0x"
                        + Long.toHexString(leaderLastZxid)
                        + "sent zxid of db as 0x" 
                        + Long.toHexString(zxidToSend));
                // Dump data to peer
                leader.zk.getZKDatabase().serializeSnapshot(oa);//SNAP恢复就是把当前的db的序列化内容发送出去
                oa.writeString("BenWasHere", "signature");//有特定的签名
            }
            bufferedOutput.flush();
            
            // Start sending packets
            new Thread() {
                public void run() {
                    Thread.currentThread().setName(
                            "Sender-" + sock.getRemoteSocketAddress());
                    try {
                        sendPackets();//不断发送packets直到接受到proposalOfDeath
                    } catch (InterruptedException e) {
                        LOG.warn("Unexpected interruption",e);
                    }
                }
            }.start();//启动线程,发送消息
            
            /*
             * Have to wait for the first ACK, wait until 
             * the leader is ready, and only then we can
             * start processing messages.
             */
            qp = new QuorumPacket();
            ia.readRecord(qp, "packet");
            if(qp.getType() != Leader.ACK){//Learner接收到NEWLEADER 一定会返回ACK
                LOG.error("Next packet was supposed to be an ACK");
                return;
            }
            LOG.info("Received NEWLEADER-ACK message from " + getSid());
            leader.waitForNewLeaderAck(getSid(), qp.getZxid(), getLearnerType());//等待有过半参与者返回ACK

            syncLimitCheck.start();//开始同步超时检测
            
            // now that the ack has been processed expect the syncLimit
            sock.setSoTimeout(leader.self.tickTime * leader.self.syncLimit);//请求阶段的读取超时时间 为 tickTime * syncLimit

            /*
             * Wait until leader starts up
             */
            synchronized(leader.zk){
                while(!leader.zk.isRunning() && !this.isInterrupted()){
                    leader.zk.wait(20);
                }
            }
            // Mutation packets will be queued during the serialize,
            // so we need to mark when the peer can actually start
            // using the data
            //
            queuedPackets.add(new QuorumPacket(Leader.UPTODATE, -1, null, null));//发送update代表过半的机器回复了NEWLEADER的ACK

            while (true) {//正常交互,处理learner的请求等
                qp = new QuorumPacket();
                ia.readRecord(qp, "packet");

                long traceMask = ZooTrace.SERVER_PACKET_TRACE_MASK;
                if (qp.getType() == Leader.PING) {
                    traceMask = ZooTrace.SERVER_PING_TRACE_MASK;
                }
                if (LOG.isTraceEnabled()) {
                    ZooTrace.logQuorumPacket(LOG, traceMask, 'i', qp);
                }
                tickOfNextAckDeadline = leader.self.tick + leader.self.syncLimit;


                ByteBuffer bb;
                long sessionId;
                int cxid;
                int type;

                switch (qp.getType()) {
                case Leader.ACK:
                    if (this.learnerType == LearnerType.OBSERVER) {
                        if (LOG.isDebugEnabled()) {
                            LOG.debug("Received ACK from Observer  " + this.sid);
                        }
                    }
                    syncLimitCheck.updateAck(qp.getZxid());//更新proposal对应的ack时间
                    leader.processAck(this.sid, qp.getZxid(), sock.getLocalSocketAddress());
                    break;
                case Leader.PING:
                    // Process the touches
                    ByteArrayInputStream bis = new ByteArrayInputStream(qp
                            .getData());
                    DataInputStream dis = new DataInputStream(bis);
                    while (dis.available() > 0) {
                        long sess = dis.readLong();
                        int to = dis.readInt();
                        leader.zk.touch(sess, to);//会话管理,激活
                    }
                    break;
                case Leader.REVALIDATE:
                    bis = new ByteArrayInputStream(qp.getData());
                    dis = new DataInputStream(bis);
                    long id = dis.readLong();
                    int to = dis.readInt();
                    ByteArrayOutputStream bos = new ByteArrayOutputStream();
                    DataOutputStream dos = new DataOutputStream(bos);
                    dos.writeLong(id);
                    boolean valid = leader.zk.touch(id, to);
                    if (valid) {
                        try {
                            //set the session owner
                            // as the follower that
                            // owns the session
                            leader.zk.setOwner(id, this);//设置owner是当前learnerHandler
                        } catch (SessionExpiredException e) {
                            LOG.error("Somehow session " + Long.toHexString(id) + " expired right after being renewed! (impossible)", e);
                        }
                    }
                    if (LOG.isTraceEnabled()) {
                        ZooTrace.logTraceMessage(LOG,
                                                 ZooTrace.SESSION_TRACE_MASK,
                                                 "Session 0x" + Long.toHexString(id)
                                                 + " is valid: "+ valid);
                    }
                    dos.writeBoolean(valid);
                    qp.setData(bos.toByteArray());//返回是否valid
                    queuedPackets.add(qp);
                    break;
                case Leader.REQUEST:                    
                    bb = ByteBuffer.wrap(qp.getData());
                    sessionId = bb.getLong();
                    cxid = bb.getInt();
                    type = bb.getInt();
                    bb = bb.slice();
                    Request si;
                    if(type == OpCode.sync){
                        si = new LearnerSyncRequest(this, sessionId, cxid, type, bb, qp.getAuthinfo());
                    } else {
                        si = new Request(null, sessionId, cxid, type, bb, qp.getAuthinfo());
                    }
                    si.setOwner(this);
                    leader.zk.submitRequest(si);//提交请求
                    break;
                default:
                    LOG.warn("unexpected quorum packet, type: {}", packetToString(qp));
                    break;
                }
            }
        } catch (IOException e) {
            if (sock != null && !sock.isClosed()) {
                LOG.error("Unexpected exception causing shutdown while sock "
                        + "still open", e);
                //close the socket to make sure the 
                //other side can see it being close
                try {
                    sock.close();
                } catch(IOException ie) {
                    // do nothing
                }
            }
        } catch (InterruptedException e) {
            LOG.error("Unexpected exception causing shutdown", e);
        } finally {
            LOG.warn("******* GOODBYE " 
                    + (sock != null ? sock.getRemoteSocketAddress() : "")
                    + " ********");
            shutdown();
        }
    }

步骤分析在思考中

思考

tickOfNextAckDeadline的检测

tickOfNextAckDeadline 每次赋值,都要以leader.self.tick为base,代表一个当前的基准周期数
启动时,需要完成初始化和数据同步,因此值为 leader.self.tick + leader.self.initLimit + leader.self.syncLimit;

启动后,完成了数据同步,需要处理后续的正常的和Learner的交互,值为tickOfNextAckDeadline = leader.self.tick + leader.self.syncLimit;

run方法步骤总结

包含了之前源码阅读39中步骤概述的如下步骤

Leader解析Learner信息,计算新的epoch(leader端)
发送Leader状态。(leader端)
数据同步(leader,learner端)
启动Leader和Learner服务器。(learner和leader)

主要代码集中在数据同步

根据Learner发送的lastZxid与自己lastZxid,maxCommittedLog,minCommittedLog作比较
来决定是SNAP,DIFF还是TRUNC,后面往往都会跟着PROPOSAL和COMMIT
发完了之后跟着一个NEWLEADER,代表自己是新的LEADER
再跟着一个UPTODATE,代表需要同步的信息发完了

等收到过半LEARNER回复给NEWLEADER的ACK后,即启动完成,可以和LEARNER完成正常的交互,处理请求

LearnerHandler启动时等过半有几次

1.等待过半机器注册,调用的leader.waitForEpochAck(this.getSid(), ss);
2.等待过半Follower回复NEWLEADER的ACK

NEWLEADER和UPTODATE的区别是什么

其实感觉完全可以合在一起,看Learner#syncWithLeader,好像两者也是兼容性的问题

问题

LearnerHandler#run让observer同步?

之前表格里面讲过,observer是通过INFORM消息,Follower才是COMMIT和PROPOSAL消息
但是run方法里面没有针对observer发送INFORM,都是同一套逻辑

等待过半机器注册,哪些机器count?

之前讲到的过半,都是参与者过半,即LearnerType.PARTICIPANT(也称为Follower)
但是run方法中,等待过半机器注册,调用的

leader.waitForEpochAck(this.getSid(), ss);

该方法里面没有针对PARTICIPANT才算数,也就是有可能很多Follower还未注册
但是observer连上了,这样也算"过半"了

leader获取LearnerInfo,更新epoch

为什么需要更新epoch,什么时候learner的epoch比leader高,之前选举的时候leader不是最高的么?

refer

http://www.cnblogs.com/leesf456/p/6139266.html
http://blog.csdn.net/vinowan/article/details/22196707

你可能感兴趣的:(zk源码阅读41:Leader和Learner的交互:LearnerHandler源码分析)