zk源码阅读42:Leader源码解析

摘要

前面两节讲了Learner,定义了Learner角色。
以及LearnerHandler,完成Learner与Leader的交互,
这一节讲解Leader,定义Leader的角色,主要讲解

内部类
  Proposal,提议的数据结构
  ToBeAppliedRequestProcessor
  XidRolloverException
  LearnerCnxAcceptor,线程,监听Learner连接,启动LearnerHandler不断交互
属性
  消息类型相关
  其他
函数
  启动期相关函数
    lead,入口函数
    getEpochToPropose:获取集群最大的lastAcceptedEpoch,+1用于设置新的acceptedEpoch
    waitForEpochAck:等待过半机器(Learner和leader)针对Leader发出的LEADERINFO回复ACKEPOCH
    waitForNewLeaderAck:等到有过半的参与者针对Leader发出的NEWLEADER返回ACK
    startZkServer:启动zk
  运行期处理相关
    处理ACK
      processAck:针对提议回复ACK的处理逻辑,如果过半验证了就通知所有Learner
      commit,sendPacket:针对提议的确认,给参与者进行通知
      inform,sendObserverPacket:针对提议的确认,给观察者进行通知
    处理提议
      propose:根据Request产生提议,发给所有参与者
    处理同步请求
      processSync:处理同步请求(暂时不懂)
      sendSync:发送Sync请求给合适的server
  LearnerHandler相关:
    startForwarding:启动交互时,把内存中的提议,即将生效的提议(都还没记录在事务日志当中)告诉给Learner

内部类

Leader内部类

Proposal

即经常看的的 "提议"
源码如下

    static public class Proposal {//提议
        public QuorumPacket packet;//集群间传递的包

        public HashSet ackSet = new HashSet();//接收到ack的机器sid集合

        public Request request;//请求

        @Override
        public String toString() {
            return packet.getType() + ", " + packet.getZxid() + ", " + request;
        }
    }

ToBeAppliedRequestProcessor

这是一个请求处理的调用链,这里写成了内部类,以后讲调用链的时候再讲

XidRolloverException

Xid回滚异常,一般用于Xid低32位满了的时候用(高32位是epoch号码)
就是一个构造函数就没了

public static class XidRolloverException extends Exception {//xid回滚异常
        public XidRolloverException(String message) {
            super(message);
        }
    }

LearnerCnxAcceptor

用于接收Learner的连接,启动LearnerHandler

    class LearnerCnxAcceptor extends ZooKeeperThread{
        private volatile boolean stop = false;

        public LearnerCnxAcceptor() {
            super("LearnerCnxAcceptor-" + ss.getLocalSocketAddress());
        }

        @Override
        public void run() {
            try {
                while (!stop) {
                    try{
                        Socket s = ss.accept();//接收socket链接
                        // start with the initLimit, once the ack is processed
                        // in LearnerHandler switch to the syncLimit
                        s.setSoTimeout(self.tickTime * self.initLimit);
                        s.setTcpNoDelay(nodelay);
                        LearnerHandler fh = new LearnerHandler(s, Leader.this);//构造LearnerHandler
                        fh.start();//启动LearnerHandler
                    } catch (SocketException e) {
                        if (stop) {
                            LOG.info("exception while shutting down acceptor: "
                                    + e);

                            // When Leader.shutdown() calls ss.close(),
                            // the call to accept throws an exception.
                            // We catch and set stop to true.
                            stop = true;
                        } else {
                            throw e;
                        }
                    }
                }
            } catch (Exception e) {
                LOG.warn("Exception while accepting follower", e);
            }
        }
        
        public void halt() {
            stop = true;
        }
    }

属性

集群消息类型相关

各类型意义在源码分析38已经讲过,源码如下

    final static int DIFF = 13;
    final static int TRUNC = 14;
    final static int SNAP = 15;
    final static int OBSERVERINFO = 16;
    final static int NEWLEADER = 10;//leader发给learner的,当leader发送了同步数据的相关的packets之后发送
    final static int FOLLOWERINFO = 11;
    final static int UPTODATE = 12;
    public static final int LEADERINFO = 17;
    public static final int ACKEPOCH = 18;  
    final static int REQUEST = 1;
    public final static int PROPOSAL = 2;
    final static int ACK = 3;
    final static int COMMIT = 4;
    final static int PING = 5;
    final static int REVALIDATE = 6;
    final static int SYNC = 7;
    final static int INFORM = 8;//通知observer 有commit消息

其他主要属性

    final LeaderZooKeeperServer zk;

    final QuorumPeer self;

    private boolean quorumFormed = false;//是否已有过半参与者确认当前leader并且完成同步

    LearnerCnxAcceptor cnxAcceptor;//读取Learner消息的线程

    private final HashSet learners =
            new HashSet();//learnerHandler集合

    private final HashSet forwardingFollowers =
            new HashSet();//参与者的LearnerHandler集合

    private final HashSet observingLearners =
            new HashSet();//观察者LearnerHandler集合

    private final HashMap> pendingSyncs =
        new HashMap>();//正在处理的同步(要等到过半ack才算完)

    final AtomicLong followerCounter = new AtomicLong(-1);//只不过是一个临时变量,并不是Follower count,看调用方

    ConcurrentMap outstandingProposals = new ConcurrentHashMap();//已经提出,还没有处理完的提议的map,key是zxid

    ConcurrentLinkedQueue toBeApplied = new ConcurrentLinkedQueue();//即将生效的提议(已有过半确认)

    Proposal newLeaderProposal = new Proposal();//newLeader的提议

    StateSummary leaderStateSummary;//leader状态总结

    long epoch = -1;//新的acceptEpoch号码,各端的max(lastAcceptedEpoch) + 1, 先默认-1
    boolean waitingForNewEpoch = true;//是否在等待新的acceptEpoch号生成
    long lastCommitted = -1;//最近commit的zxid
    long lastProposed;//最近一次提议的zxid
    private HashSet connectingFollowers = new HashSet();//连接上leader的sid集合
    private HashSet electingFollowers = new HashSet();//针对LEADERINFO回复ACKEPOCH的集合
    private boolean electionFinished = false;//是否过半机器注册成功

函数

简单的一些getset之类的函数不讲

启动期相关函数

leader启动,会先和learner完成数据同步等,这部分函数如下

lead

选举完leader之后,leader的入口函数,源码如下

    void lead() throws IOException, InterruptedException {
        self.end_fle = System.currentTimeMillis();
        LOG.info("LEADING - LEADER ELECTION TOOK - " +
              (self.end_fle - self.start_fle));
        self.start_fle = 0;//开始fast leader election时间
        self.end_fle = 0;//结束时间

        zk.registerJMX(new LeaderBean(this, zk), self.jmxLocalPeerBean);

        try {
            self.tick = 0;//初始tick为0,代表第几个tickTime
            zk.loadData();//先loadData加载数据
            
            leaderStateSummary = new StateSummary(self.getCurrentEpoch(), zk.getLastProcessedZxid());//根据epoch和zxid记录状态

            // Start thread that waits for connection requests from 
            // new followers.
            cnxAcceptor = new LearnerCnxAcceptor();//等待learner的连接
            cnxAcceptor.start();
            
            readyToStart = true;
            long epoch = getEpochToPropose(self.getId(), self.getAcceptedEpoch());
            
            zk.setZxid(ZxidUtils.makeZxid(epoch, 0));
            
            synchronized(this){
                lastProposed = zk.getZxid();//获取zxid
            }
            
            newLeaderProposal.packet = new QuorumPacket(NEWLEADER, zk.getZxid(),
                    null, null);//生成NEWLEADER包


            if ((newLeaderProposal.packet.getZxid() & 0xffffffffL) != 0) {
                LOG.info("NEWLEADER proposal has Zxid of "
                        + Long.toHexString(newLeaderProposal.packet.getZxid()));
            }
            
            waitForEpochAck(self.getId(), leaderStateSummary);
            self.setCurrentEpoch(epoch);

            // We have to get at least a majority of servers in sync with
            // us. We do this by waiting for the NEWLEADER packet to get
            // acknowledged
            try {
                waitForNewLeaderAck(self.getId(), zk.getZxid(), LearnerType.PARTICIPANT);//等待一段时间,收到过半的参与者返回的ACK
            } catch (InterruptedException e) {
                shutdown("Waiting for a quorum of followers, only synced with sids: [ "
                        + getSidSetString(newLeaderProposal.ackSet) + " ]");
                HashSet followerSet = new HashSet();
                for (LearnerHandler f : learners)
                    followerSet.add(f.getSid());
                    
                if (self.getQuorumVerifier().containsQuorum(followerSet)) {
                    LOG.warn("Enough followers present. "
                            + "Perhaps the initTicks need to be increased.");
                }

                Thread.sleep(self.tickTime);
                self.tick++;
                return;
            }
            
            startZkServer();//启动zk
            
            /**
             * WARNING: do not use this for anything other than QA testing
             * on a real cluster. Specifically to enable verification that quorum
             * can handle the lower 32bit roll-over issue identified in
             * ZOOKEEPER-1277. Without this option it would take a very long
             * time (on order of a month say) to see the 4 billion writes
             * necessary to cause the roll-over to occur.
             * 
             * This field allows you to override the zxid of the server. Typically
             * you'll want to set it to something like 0xfffffff0 and then
             * start the quorum, run some operations and see the re-election.
             */
            String initialZxid = System.getProperty("zookeeper.testingonly.initialZxid");
            if (initialZxid != null) {
                long zxid = Long.parseLong(initialZxid);
                zk.setZxid((zk.getZxid() & 0xffffffff00000000L) | zxid);
            }
            
            if (!System.getProperty("zookeeper.leaderServes", "yes").equals("no")) {
                self.cnxnFactory.setZooKeeperServer(zk);
            }
            // Everything is a go, simply start counting the ticks
            // WARNING: I couldn't find any wait statement on a synchronized
            // block that would be notified by this notifyAll() call, so
            // I commented it out
            //synchronized (this) {
            //    notifyAll();
            //}
            // We ping twice a tick, so we only update the tick every other
            // iteration
            boolean tickSkip = true;//一个tick ping两次,
    
            while (true) {
                Thread.sleep(self.tickTime / 2);//每tickTime周期的一半
                if (!tickSkip) {
                    self.tick++;//每两次代表一个tick
                }
                HashSet syncedSet = new HashSet();

                // lock on the followers when we use it.
                syncedSet.add(self.getId());

                for (LearnerHandler f : getLearners()) {
                    // Synced set is used to check we have a supporting quorum, so only
                    // PARTICIPANT, not OBSERVER, learners should be used
                    if (f.synced() && f.getLearnerType() == LearnerType.PARTICIPANT) {//如果是同步的,且是参与者
                        syncedSet.add(f.getSid());
                    }
                    f.ping();//验证有没有LearnerHandler的proposal超时了没有处理
                }

                // check leader running status
                if (!this.isRunning()) {
                    shutdown("Unexpected internal error");
                    return;
                }

              if (!tickSkip && !self.getQuorumVerifier().containsQuorum(syncedSet)) {//需要验证时,集群验证过半验证失败
                //if (!tickSkip && syncedCount < self.quorumPeers.size() / 2) {
                    // Lost quorum, shutdown
                    shutdown("Not sufficient followers synced, only synced with sids: [ "
                            + getSidSetString(syncedSet) + " ]");
                    // make sure the order is the same!
                    // the leader goes to looking
                    return;
              } 
              tickSkip = !tickSkip;//翻转,进入下半个tickTime
            }
        } finally {
            zk.unregisterJMX(this);
        }
    }

这里不展开讲LearnerCnxAcceptor启动LearnerHandler与Learner完成交互的细节
主要完成的事情,就是

启动LearnerCnxAcceptor,利用LearnerHandler与各Learner进行IO
Leader(以及LearnerHandler)调用getEpochToPropose,接收Leader(以及Learner向Leader注册时发送的LearnerInfo),更新epoch号,作为新的AcceptedEpoch和CurrentEpoch
LearnerHandler把发送LEADERINFO包,把上面定的epoch也发送出去
调用waitForEpochAck,等待过半learner(以及leader),针对LEADERINFO返回ACKEPOCH包
(省略:LearnHandler中间不断给各Learner同步数据)
LearnerHandler发出NEWLEADER,Learner接收到,返回ACK
LearnerHandler以及Leader调用waitForNewLeaderAck等待过半PARTICIPANT回复ACK
启动zkServer,不断完成ping,保持过半参与者同步

里面涉及的函数,按照调用顺序来排,有
getEpochToPropose,waitForEpochAck,waitForNewLeaderAck,startZkServer,讲解如下

getEpochToPropose

被Leader和LearnerHandler调用,即获取集群最大的lastAcceptedEpoch,+1用于设置新的acceptedEpoch

    public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws InterruptedException, IOException {//被Leader和LearnerHandler调用,即获取集群最大的lastAcceptedEpoch,+1用于设置新的acceptedEpoch
        synchronized(connectingFollowers) {
            if (!waitingForNewEpoch) {//如果还在等待new epoch
                return epoch;
            }
            if (lastAcceptedEpoch >= epoch) {//选出leader以及learner中最大一个lastAcceptedEpoch
                epoch = lastAcceptedEpoch+1;//更新自己的epoch为最大的+1,
            }
            connectingFollowers.add(sid);//连接的Learner添加记录
            QuorumVerifier verifier = self.getQuorumVerifier();
            if (connectingFollowers.contains(self.getId()) && 
                                            verifier.containsQuorum(connectingFollowers)) {//如果自己也连接上,并且已经有过半机器连接上
                waitingForNewEpoch = false;//不用再等了
                self.setAcceptedEpoch(epoch);//设置acceptedEpoch
                connectingFollowers.notifyAll();
            } else {
                long start = System.currentTimeMillis();
                long cur = start;
                long end = start + self.getInitLimit()*self.getTickTime();
                while(waitingForNewEpoch && cur < end) {//如果已经有机器连接上leader了,那么最多等待一段时间直到其他机器通过 过半验证
                    connectingFollowers.wait(end - cur);
                    cur = System.currentTimeMillis();
                }
                if (waitingForNewEpoch) {
                    throw new InterruptedException("Timeout while waiting for epoch from quorum");        
                }
            }
            return epoch;
        }
    }

waitForEpochAck

验证leader的StateSummary是最新的
等待过半机器(Learner和leader)针对Leader发出的LEADERINFO回复ACKEPOCH

    public void waitForEpochAck(long id, StateSummary ss) throws IOException, InterruptedException {
        synchronized(electingFollowers) {
            if (electionFinished) {//如果已经过半返回了
                return;
            }
            if (ss.getCurrentEpoch() != -1) {
                if (ss.isMoreRecentThan(leaderStateSummary)) {//如果有currentEpoch,zxid比Leader自己的还要新
                    throw new IOException("Follower is ahead of the leader, leader summary: " 
                                                    + leaderStateSummary.getCurrentEpoch()
                                                    + " (current epoch), "
                                                    + leaderStateSummary.getLastZxid()
                                                    + " (last zxid)");
                }
                electingFollowers.add(id);
            }
            QuorumVerifier verifier = self.getQuorumVerifier();
            if (electingFollowers.contains(self.getId()) && verifier.containsQuorum(electingFollowers)) {//过半验证通过
                electionFinished = true;
                electingFollowers.notifyAll();
            } else {                
                long start = System.currentTimeMillis();
                long cur = start;
                long end = start + self.getInitLimit()*self.getTickTime();
                while(!electionFinished && cur < end) {//等待一段时间
                    electingFollowers.wait(end - cur);
                    cur = System.currentTimeMillis();
                }
                if (!electionFinished) {//等待结束了还没有过半验证通过
                    throw new InterruptedException("Timeout while waiting for epoch to be acked by quorum");
                }
            }
        }
    }

waitForNewLeaderAck

//等到有过半的参与者针对Leader发出的NEWLEADER返回ACK

    public void waitForNewLeaderAck(long sid, long zxid, LearnerType learnerType)
            throws InterruptedException {

        synchronized (newLeaderProposal.ackSet) {

            if (quorumFormed) {//如果已经过半同步了
                return;
            }

            long currentZxid = newLeaderProposal.packet.getZxid();
            if (zxid != currentZxid) {
                LOG.error("NEWLEADER ACK from sid: " + sid
                        + " is from a different epoch - current 0x"
                        + Long.toHexString(currentZxid) + " receieved 0x"
                        + Long.toHexString(zxid));
                return;
            }

            if (learnerType == LearnerType.PARTICIPANT) {
                newLeaderProposal.ackSet.add(sid);//参与者返回的ACK才算数
            }

            if (self.getQuorumVerifier().containsQuorum(
                    newLeaderProposal.ackSet)) {//过半同步
                quorumFormed = true;//标记已经过半同步了
                newLeaderProposal.ackSet.notifyAll();
            } else {
                long start = System.currentTimeMillis();
                long cur = start;
                long end = start + self.getInitLimit() * self.getTickTime();
                while (!quorumFormed && cur < end) {//先同步的,等待一段时间,等过半的机器都同步
                    newLeaderProposal.ackSet.wait(end - cur);
                    cur = System.currentTimeMillis();
                }
                if (!quorumFormed) {//等了一段时间还没有过半同步,就报错
                    throw new InterruptedException(
                            "Timeout while waiting for NEWLEADER to be acked by quorum");
                }
            }
        }
    }

startZkServer

启动zk

    private synchronized void startZkServer() {//启动zk
        // Update lastCommitted and Db's zxid to a value representing the new epoch
        lastCommitted = zk.getZxid();
        LOG.info("Have quorum of supporters, sids: [ "
                + getSidSetString(newLeaderProposal.ackSet)
                + " ]; starting up and setting last processed zxid: 0x{}",
                Long.toHexString(zk.getZxid()));
        zk.startup();
        /*
         * Update the election vote here to ensure that all members of the
         * ensemble report the same vote to new servers that start up and
         * send leader election notifications to the ensemble.
         * 
         * @see https://issues.apache.org/jira/browse/ZOOKEEPER-1732
         */
        self.updateElectionVote(getEpoch());

        zk.getZKDatabase().setlastProcessedZxid(zk.getZxid());
    }

运行期处理相关

processAck

针对提议回复ACK的处理逻辑,如果过半验证了就通知所有Learner

synchronized public void processAck(long sid, long zxid, SocketAddress followerAddr) {
        if (LOG.isTraceEnabled()) {//log相关
            LOG.trace("Ack zxid: 0x{}", Long.toHexString(zxid));
            for (Proposal p : outstandingProposals.values()) {
                long packetZxid = p.packet.getZxid();
                LOG.trace("outstanding proposal: 0x{}",
                        Long.toHexString(packetZxid));
            }
            LOG.trace("outstanding proposals all");
        }

        if ((zxid & 0xffffffffL) == 0) {
            /*
             * We no longer process NEWLEADER ack by this method. However,
             * the learner sends ack back to the leader after it gets UPTODATE
             * so we just ignore the message.
             */
            return;
        }
    
        if (outstandingProposals.size() == 0) {//没有处理当中的提议
            if (LOG.isDebugEnabled()) {
                LOG.debug("outstanding is 0");
            }
            return;
        }
        if (lastCommitted >= zxid) {//提议已经处理过了
            if (LOG.isDebugEnabled()) {
                LOG.debug("proposal has already been committed, pzxid: 0x{} zxid: 0x{}",
                        Long.toHexString(lastCommitted), Long.toHexString(zxid));
            }
            // The proposal has already been committed
            return;
        }
        Proposal p = outstandingProposals.get(zxid);//获取zxid对应的提议
        if (p == null) {
            LOG.warn("Trying to commit future proposal: zxid 0x{} from {}",
                    Long.toHexString(zxid), followerAddr);
            return;
        }
        
        p.ackSet.add(sid);//对应提议的ack集合添加sid记录
        if (LOG.isDebugEnabled()) {
            LOG.debug("Count for zxid: 0x{} is {}",
                    Long.toHexString(zxid), p.ackSet.size());
        }
        if (self.getQuorumVerifier().containsQuorum(p.ackSet)){//过半回复ack
            if (zxid != lastCommitted+1) {
                LOG.warn("Commiting zxid 0x{} from {} not first!",
                        Long.toHexString(zxid), followerAddr);
                LOG.warn("First is 0x{}", Long.toHexString(lastCommitted + 1));
            }
            outstandingProposals.remove(zxid);//该proposal已经处理完了
            if (p.request != null) {
                toBeApplied.add(p);//即将应用的队列 添加该proposal
            }

            if (p.request == null) {
                LOG.warn("Going to commmit null request for proposal: {}", p);
            }
            commit(zxid);//提交,发给所有参与者
            inform(p);//告诉所有观察者
            zk.commitProcessor.commit(p.request);//leader自己也提交
            if(pendingSyncs.containsKey(zxid)){
                for(LearnerSyncRequest r: pendingSyncs.remove(zxid)) {//
                    sendSync(r);//发送同步请求给LearnerSyncRequest记录的server
                }
            }
        }
    }

里面调用了commit,inform函数,如下

commit

//更新lastCommitted,生成commit包,发给所有参与者

    public void commit(long zxid) {
        synchronized(this){
            lastCommitted = zxid;//更新最近commit的zxid
        }
        QuorumPacket qp = new QuorumPacket(Leader.COMMIT, zxid, null, null);//生成commit消息
        sendPacket(qp);//发给所有参与者
    }

inform

//生成inform包,告诉所有observer

    public void inform(Proposal proposal) {
        QuorumPacket qp = new QuorumPacket(Leader.INFORM, proposal.request.zxid, 
                                            proposal.packet.getData(), null);
        sendObserverPacket(qp);//告诉所有observer
    }

又分别调用了sendPacket和sendObserverPacket函数,如下

sendPacket&sendObserverPacket

即把QuorumPacket 加入到对应的LearnerHandler的列表里面去

    void sendPacket(QuorumPacket qp) {//QuorumPacket发给所有 参与者
        synchronized (forwardingFollowers) {
            for (LearnerHandler f : forwardingFollowers) {                
                f.queuePacket(qp);
            }
        }
    }

    void sendObserverPacket(QuorumPacket qp) {//QuorumPacket发给所有 观察者
        for (LearnerHandler f : getObservingLearners()) {
            f.queuePacket(qp);
        }
    }

propose

根据Request产生提议,发给所有参与者

    public Proposal propose(Request request) throws XidRolloverException {//根据Request产生提议,发给所有参与者
        /**
         * Address the rollover issue. All lower 32bits set indicate a new leader
         * election. Force a re-election instead. See ZOOKEEPER-1277
         */
        if ((request.zxid & 0xffffffffL) == 0xffffffffL) {//如果zxid低的32位已经满了,为了不溢出,就关闭掉,重新选举
            String msg =
                    "zxid lower 32 bits have rolled over, forcing re-election, and therefore new epoch start";
            shutdown(msg);
            throw new XidRolloverException(msg);//xid回滚异常
        }

        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        BinaryOutputArchive boa = BinaryOutputArchive.getArchive(baos);
        try {
            request.hdr.serialize(boa, "hdr");
            if (request.txn != null) {
                request.txn.serialize(boa, "txn");
            }
            baos.close();
        } catch (IOException e) {
            LOG.warn("This really should be impossible", e);
        }
        QuorumPacket pp = new QuorumPacket(Leader.PROPOSAL, request.zxid, 
                baos.toByteArray(), null);//产生提议
        
        Proposal p = new Proposal();
        p.packet = pp;
        p.request = request;
        synchronized (this) {
            if (LOG.isDebugEnabled()) {
                LOG.debug("Proposing:: " + request);
            }

            lastProposed = p.packet.getZxid();//更新最近一次提议的zxid
            outstandingProposals.put(lastProposed, p);//出现了一个尚未处理的提议
            sendPacket(pp);//提议发给所有参与者
        }
        return p;
    }

processSync

处理同步请求,这里不是很懂

    synchronized public void processSync(LearnerSyncRequest r){//处理同步请求,注意上锁了
        if(outstandingProposals.isEmpty()){//没有正在处理的提议
            sendSync(r);//发送同步请求给LearnerSyncRequest记录的server
        } else {
            List l = pendingSyncs.get(lastProposed);//把当时最新的lastProposed记录下来
            if (l == null) {
                l = new ArrayList();
            }
            l.add(r);//lastProposed对应的记录添加进当前的request,这个列表的同步都是到lastProposed这个位置
            pendingSyncs.put(lastProposed, l);//放入map,等到这个提议通过
        }
    }

调用了sendSync如下

sendSync

//发送Sync请求给合适的server(LearnerSyncRequest记录的)

    public void sendSync(LearnerSyncRequest r){
        QuorumPacket qp = new QuorumPacket(Leader.SYNC, 0, null, null);
        r.fh.queuePacket(qp);
    }

LearnerHandler相关

startForwarding

就是learner同步的时候,同步方式是和leader的commitLog相关的,在此期间,记录在内存中的提议,以及即将生效的提议也要告诉给learner

    synchronized public long startForwarding(LearnerHandler handler,
            long lastSeenZxid) {//让leader知道Follower在进行同步,另外看zk当前有没有新的提议,或者同步的信息发送过去的
        // Queue up any outstanding requests enabling the receipt of
        // new requests
        if (lastProposed > lastSeenZxid) {//如果自己的zxid比发给learner的zxid大
            for (Proposal p : toBeApplied) {
                if (p.packet.getZxid() <= lastSeenZxid) {
                    continue;//即将生效的zxid,对应learner已经有记录了
                }
                handler.queuePacket(p.packet);
                // Since the proposal has been committed we need to send the
                // commit message also
                QuorumPacket qp = new QuorumPacket(Leader.COMMIT, p.packet//添加commit
                        .getZxid(), null, null);
                handler.queuePacket(qp);//加入handler的发送队列
            }
            // Only participant need to get outstanding proposals
            if (handler.getLearnerType() == LearnerType.PARTICIPANT) {//如果是参与者,顺便把提议也发送过去
                Listzxids = new ArrayList(outstandingProposals.keySet());
                Collections.sort(zxids);
                for (Long zxid: zxids) {
                    if (zxid <= lastSeenZxid) {
                        continue;
                    }
                    handler.queuePacket(outstandingProposals.get(zxid).packet);//添加packet
                }
            }
        }
        if (handler.getLearnerType() == LearnerType.PARTICIPANT) {//如果是参与者,就加到 参与者集合中
            addForwardingFollower(handler);//加入 参与者 集合中
        } else {
            addObserverLearnerHandler(handler);//否则加入到观察者 集合中
        }
                
        return lastProposed;
    }

思考

CurrentEpoch和AcceptedEpoch

在源码31节思考中已经讲过,
CurrentEpoch是当前周期,AcceptedEpoch是即将成为的周期,
前者相当于最终决定,后者相当于决定前的提议(个人理解)

getEpochToPropose函数的调用

这里是Leader和LearnerHandler都会调用,因此lastAcceptedEpoch是所有机器(包含leader)的最大lastAcceptedEpoch+1
(之前以为Leader先初始化好epoch值,然后各LearnerHandler再调用,还在想怎么会出现Learner的LastAcceptedEpoch比Leader的高)

Leader和Learner的交互,什么时候各自可以启动zkServer

Leader:自己发送NEWLEADER之后,过半参与者返回ACK,即可启动
Learner:对应的LearnerHandler知道了Leader满足了上述条件,给Learner发送UPTODATE消息,Learner知道过半机器都认可了Leader的数据,即可启动

getEpochToPropose,waitForEpochAck,waitForNewLeaderAck的共性

三个函数都是等待一段时间,如果指定时间内没有完成过半验证,就认为超时,抛出异常

问题

getEpochToPropose函数的过半返回机制以及lastAcceptedEpoch的矛盾

这里意思就是说,如果过半机器(包含leader和Learner)都调用了这个函数,那么就不用管后面的lastAcceptedEpoch比现在决定的AcceptedEpoch大了。
这样会出现一个矛盾,假如不考虑网络延迟,Learner过半的机器都调用了这个函数但是Leader还没调用,那么Leader自己本身的epoch会比大家决定的还要大,这样很不合理

processSync函数

现在还不是很理解这个函数,调用方是ProposalRequestProcessor
针对的是LearnerSyncRequest,对应的是OpCode.sync
也就是ZooKeeper#sync函数被调用才会触发
也就是zookeeper client api调用这个sync函数时会触发,
目前不太理解这个函数有什么用,其他什么getData之类的都好理解
看注释好像是client请求同步到最新?

吐槽

变量名,注释

followerCounter变量和命名都多大关系

你可能感兴趣的:(zk源码阅读42:Leader源码解析)