7.6.3 QuorumCnxManager源码

从QuorumCnxManager的类介绍里面知道,这是一个专门用于处理Leader选举的Manager。本节主要讲解的内容:

内部类
  SendWorker类作为网络IO的发送者,从发送队列取出,发给对应sid的机器
  Message类定义了消息结构,包含sid以及消息体ByteBuffer
  RecvWorker类作为网络IO的接受者
  Listener类作为electionPort端口的监听器,等待其他机器的连接
属性
  recvQueue作为接受队列
  queueSendMap表示每个sid对应的发送的发送队列
函数
  连接相关
  sender,recv生产消费相关
  其他
思考以及总结

内部类 :
7.6.3 QuorumCnxManager源码_第1张图片
内部类

可以看到有四个内部类,SendWorker,Message,RecvWorker,Listener

SendWorker

代码注释中对这个类的介绍是这样的:

/**
  * Thread to send messages. Instance waits on a queue, and send a message 
  * as soon as there is one available. If connection breaks, then opens a new
  * one.
  */

这个线程继承自ZookeeperThread,维护了一个Socket,是用来发送消息用的.这个线程会阻塞在队列上知道有消息进来,如果连接断开,会创建一个新的。
属性Field:

  Long sid;//要连接的ServerId
  Socket sock;
  RecvWorker recvWorker;
  volatile boolean running = true;
  DataOutputStream dout;

主要方法:

    /**
    * SendWorker线程的一个实例通过一个队列接收消息来发送到对应的     Server sid
    * An instance of this thread receives messages to send
    * through a queue and sends them to the server sid.
    * 
    * @param sock
    *            Socket to remote peer
    * @param sid
    *            Server identifier of remote peer
    */
    SendWorker(Socket sock, Long sid) {
    //用来指定线程名
    super("SendWorker:" + sid);
    this.sid = sid;
    this.sock = sock;
    recvWorker = null;
    try {
    dout = new DataOutputStream(sock.getOutputStream());
    } catch (IOException e) {
    LOG.error("Unable to access socket output stream", e);
    closeSocket(sock);
    running = false;
    }
    LOG.debug("Address of remote peer: " + this.sid);
    }

既然是个线程,自然少不了run方法.

@Override
public void run() {
    threadCnt.incrementAndGet();
    try {
        /**
         * If there is nothing in the queue to send, then we
         * send the lastMessage to ensure that the last message
         * was received by the peer. The message could be dropped
         * in case self or the peer shutdown their connection
         * (and exit the thread) prior to reading/processing
         * the last message. Duplicate messages are handled correctly
         * by the peer.
         *
         * If the send queue is non-empty, then we have a recent
         * message than that stored in lastMessage. To avoid sending
         * stale message, we should send the message in the send queue.
         */
         //queueSendMap key为sid,value是一个队列
        ArrayBlockingQueue bq = queueSendMap.get(sid);
        if (bq == null || isSendQueueEmpty(bq)) {
        //lastMessageSent里存着sid最后的一条发送记录 当队列为空是发送最后一条记录是为了检查对方是否在存活着
           ByteBuffer b = lastMessageSent.get(sid);
           if (b != null) {
               LOG.debug("Attempting to send lastMessage to sid=" + sid);
          //发送消息
          send(b);
           }
        }
    } catch (IOException e) {
        LOG.error("Failed to send last message. Shutting down thread.", e);
        this.finish();
    }

    try {
        while (running && !shutdown && sock != null) {

            ByteBuffer b = null;
            try {
                ArrayBlockingQueue bq = queueSendMap
                        .get(sid);
                if (bq != null) {
                    b = pollSendQueue(bq, 1000, TimeUnit.MILLISECONDS);
                } else {
                    LOG.error("No queue of incoming messages for " +
                              "server " + sid);
                    break;
                }

                if(b != null){
                    lastMessageSent.put(sid, b);
                    send(b);
                }
            } catch (InterruptedException e) {
                LOG.warn("Interrupted while waiting for message on queue",
                        e);
            }
        }
    } catch (Exception e) {
        LOG.warn("Exception when using channel: for id " + sid + " my id = " + 
                self.getId() + " error = " + e);
    }
    this.finish();
    LOG.warn("Send worker leaving thread");
}

synchronized void send(ByteBuffer b) throws IOException {
            byte[] msgBytes = new byte[b.capacity()];
            try {
                b.position(0);
                b.get(msgBytes);
            } catch (BufferUnderflowException be) {
                LOG.error("BufferUnderflowException ", be);
                return;
            }
            //先放入字节数组的长度 下面RecvWorker 接受的时候会先根据首个Int判断长度是否超标
            dout.writeInt(b.capacity());
            dout.write(b.array());
            dout.flush();
        }

在SendWorker中,一旦Zookeeper发现针对当前服务器的消息发送队列为空,那么此时需要从lastMessageSent中取出一个最近发送过的消息来进行再次发送,这是为了解决接收方在消息接收前或者接收到消息后服务器挂了,导致消息尚未被正确处理。同时,Zookeeper能够保证接收方在处理消息时,会对重复消息进行正确的处理。

Message

static public class Message {

        Message(ByteBuffer buffer, long sid) {
            this.buffer = buffer;
            this.sid = sid;
        }

        ByteBuffer buffer;
        long sid;
    }

sid为消息来源方的sid,buffer即消息体

RecvWorker

这个类作为“接受者”,类似SendWorker,继承ZooKeeperThread,线程不断地从网络IO中读取数据,放入接收队列
属性

Long sid;
Socket sock;
volatile boolean running = true;
DataInputStream din;
final SendWorker sw;

可以看到SendWorker 和RecvWorker相互持有句柄
主要方法:

构造器
RecvWorker(Socket sock, Long sid, SendWorker sw) {
            super("RecvWorker:" + sid);
            this.sid = sid;
            this.sock = sock;
            this.sw = sw;
            try {
                din = new DataInputStream(sock.getInputStream());
                // OK to wait until socket disconnects while reading.
                sock.setSoTimeout(0);
            } catch (IOException e) {
                LOG.error("Error while accessing socket for " + sid, e);
                closeSocket(sock);
                running = false;
            }
        }

 @Override
        public void run() {
            threadCnt.incrementAndGet();
            try {
                while (running && !shutdown && sock != null) {
                    /**
                     * Reads the first int to determine the length of the
                     * message
                     */
                    int length = din.readInt();
                    if (length <= 0 || length > PACKETMAXSIZE) {
                        throw new IOException(
                                "Received packet with invalid packet: "
                                        + length);
                    }
                    /**
                     * Allocates a new ByteBuffer to receive the message
                     */
                    byte[] msgArray = new byte[length];
                    din.readFully(msgArray, 0, length);
                    ByteBuffer message = ByteBuffer.wrap(msgArray);
                    // 将message复制一份然后加入到RecvQueue addToRecvQueue没啥东西
                    addToRecvQueue(new Message(message.duplicate(), sid));
                }
            } catch (Exception e) {
                LOG.warn("Connection broken for id " + sid + ", my id = " + 
                        self.getId() + ", error = " , e);
            } finally {
                LOG.warn("Interrupting SendWorker");
                sw.finish();
                if (sock != null) {
                    closeSocket(sock);
                }
            }
        }

Listener

也是ZookeeperThread的子类.

//只有一个成员变量,ServerSocket,用来监听election port然后做一系列判断
volatile ServerSocket ss = null;

线程run方法

 /**
  * Sleeps on accept().
  */
@Override
public void run() {
   ......
//最主要的方法
while (!shutdown) {
    Socket client = ss.accept();
    setSockOpts(client);
    LOG.info("Received connection request "
            + client.getRemoteSocketAddress());
    receiveConnection(client);
    numRetries = 0;
}
  ......
}

ss.accept(),监听一个接口,直到有请求到来.
receiveConnection()最重要的方法

public void receiveConnection(Socket sock) {
    Long sid = null;
    try {
      ......  
    //If wins the challenge, then close the new connection.这边判断接受到的Socket请求的发起者的sid是否比自己大,这里有一个规则,选举算法的socket连接由sid大的发起
    if (sid < self.getId()) {
    /*
     * This replica might still believe that the connection to sid is
     * up, so we have to shut down the workers before trying to open a
     * new connection.
     */
    SendWorker sw = senderWorkerMap.get(sid);
    if (sw != null) {
        sw.finish();
    }

    /*
     * Now we start a new connection
     */
    LOG.debug("Create new connection to server: " + sid);
    closeSocket(sock);
    //这个方法后续会讲到,主要用来连接sid服务器
    connectOne(sid);

    // Otherwise start worker threads to receive data.
} else {
    //在Listener内部初始化了SendWorker和RecvWorker
    SendWorker sw = new SendWorker(sock, sid);
    RecvWorker rw = new RecvWorker(sock, sid, sw);
    sw.setRecv(rw);

    SendWorker vsw = senderWorkerMap.get(sid);

    if(vsw != null)
        vsw.finish();

    senderWorkerMap.put(sid, sw);

    if (!queueSendMap.containsKey(sid)) {
        queueSendMap.put(sid, new ArrayBlockingQueue(
                SEND_CAPACITY));
    }

    sw.start();
    rw.start();

    return;
}
}

至此内部类大致讲完了,接下来我们总结下:
SendWorker和RecvWorker互相依赖对方 彼此握有对方的句柄
QuorumCnxManager中按照sid区分了SendWorker和RecvWorker,以及相互隔离开.
两者都有属性sid,表示每个机器和其他机器连接时,按sid区分不同的RecvWorker和SendWorker
好比sid1和其余(n-1)个server建立连接,那么就按sid分开,有(n-1)个RecvWorker和SendWorker

Message作为消息的封装,包含sid和ByteBuffer作为消息体
Listener主要监听本机配置的electionPort,不断的接收外部连接,负责启动RecvWorker和SendWorker

讲完内部类,我们分析主类的相关信息:

QuorumCnxManager

属性:

属性 默认值 备注
RECV_CAPACITY 100 接收队列的长度
SEND_CAPACITY 1 发送队列的长度,原因在"思考"中提到
ConcurrentHashMap senderWorkerMap; sid对应的SendWorker
ConcurrentHashMap> queueSendMap 消息发送队列,key为各机器sid
ConcurrentHashMap lastMessageSent; 上一次发送给sid机器的内容
ArrayBlockingQueue recvQueue 接收队列

方法:
上面的Listener里面有讲到如果当前自己的sid大于连接过来的sid,那么会调用connectOne发起连接。
connectOne:连接上一个sid的服务器

//连接上某个sid的server
synchronized void connectOne(long sid){
if (senderWorkerMap.get(sid) == null){//如果没有记录在sender的map里面
    InetSocketAddress electionAddr;
    if (self.quorumPeers.containsKey(sid)) {
    //从配置文件获取对应sid机器的选举端口 
        electionAddr = 
               self.quorumPeers.get(sid).electionAddr;
    } else {
        LOG.warn("Invalid server id: " + sid);
        return;
    }
    try {
        if (LOG.isDebugEnabled()) {
            LOG.debug("Opening channel to server " + sid);
        }
        Socket sock = new Socket();
        setSockOpts(sock);            sock.connect(self.getView().get(sid).electionAddr, cnxTO);//连接上对应socket
        if (LOG.isDebugEnabled()) {
            LOG.debug("Connected to server " + sid);
        }
        //初始化连接
        initiateConnection(sock, sid);//初始化连接
    } catch (UnresolvedAddressException e) {
       ......
        }
    }

最重要的流程在initiateConnection
initiateConnection初始化连接

/**如果当前服务器已经为sid初始化了一个连接,那么这个连接会被废弃当他在挑战中失败,否则继续保持连接
  * If this server has initiated the connection, then it gives up on the
  * connection if it loses challenge. Otherwise, it keeps the connection.
  */
    public boolean initiateConnection(Socket sock, Long sid) {
        DataOutputStream dout = null;
        try {
         // 先会给对方服务器发一个challege
          // Sending id and challenge
            dout = new DataOutputStream(sock.getOutputStream());
            dout.writeLong(self.getId());
            dout.flush();
        } catch (IOException e) {
            LOG.warn("Ignoring exception reading or writing challenge: ", e);
            closeSocket(sock);
            return false;
        }
        //本地比较小,需不需要断开连接,
        // If lost the challenge, then drop the new connection
        if (sid > self.getId()) {
            LOG.info("Have smaller server identifier, so dropping the " +
                     "connection: (" + sid + ", " + self.getId() + ")");
            closeSocket(sock);
            // Otherwise proceed with the connection
        } else {
            SendWorker sw = new SendWorker(sock, sid);
            RecvWorker rw = new RecvWorker(sock, sid, sw);
            sw.setRecv(rw);

            SendWorker vsw = senderWorkerMap.get(sid);

            if(vsw != null)
                vsw.finish();

            senderWorkerMap.put(sid, sw);
            if (!queueSendMap.containsKey(sid)) {
                queueSendMap.put(sid, new ArrayBlockingQueue(
                        SEND_CAPACITY));
            }

            sw.start();
            rw.start();
            return true;   
        }
        return false;
    }

这里有个概念,就是win challenge和lose challenge
在zk中,为了保证每一对server只有一个socket,Zookeeper只允许SID大的服务器主动和其他机器建立连接,否则断开连接。
发出连接时,要求自己sid大,完成SendWorker和ReceiveWorker的构造以及线程启动,否则close
接收连接时,要求自己sid小,完成SendWorker和ReceiveWorker的构造以及线程启动,否则close

思考与总结

1.tricky方法的体现:每一对server之间只有一个连接
可以理解成n个server,互相之间都要用connection
好比n个点,用无向的边连起来,用[sidn,sid1]表示sidn向sid1建立了连接
那么,[sid1,sidn]就没有存在的必要了,也就是n*(n-1)/2条边就够了
1+2+3...+n-1

2.为什么发送队列的长度为1,入队时满了就要把前面的踢出去
长度为1 QuorumCnxManager#SEND_CAPACITY
踢出去 QuorumCnxManager#addToSendQueue
应该参考SEND_CAPACITY注释
// Initialized to 1 to prevent sending
// stale notifications to peers

因为是选举leader投票,有特殊的要求:如果之前的票还没有投出去又产生了新的票,那么旧的票就可以直接作废了,不用真正的投出去

你可能感兴趣的:(7.6.3 QuorumCnxManager源码)