zk通信机制源码分析

zk通信本文讲解客户端Zookeeper
zk通信分为两部分来说明,第一部分叫做消息的发送和接收,第二部分是客户端和服务端会话的建立。

1.消息的发送和接收

我们平常构建一个zk客户端都是如下代码来构建,

ZooKeeper zooKeeper = new ZooKeeper("127.0.0.1:2181", 1000, new Watcher() {
                public void process(WatchedEvent event) {
                    //do sth
                }
            });

追踪这段代码进去是这样的:

    public ZooKeeper(String connectString, int sessionTimeout, Watcher watcher,
            boolean canBeReadOnly, HostProvider aHostProvider,
            ZKClientConfig clientConfig) throws IOException {
        LOG.info("Initiating client connection, connectString=" + connectString
                + " sessionTimeout=" + sessionTimeout + " watcher=" + watcher);

        if (clientConfig == null) {
            clientConfig = new ZKClientConfig();
        }
        this.clientConfig = clientConfig;
        //1--1:创建默认watcher
        watchManager = defaultWatchManager();
        watchManager.defaultWatcher = watcher;
        ConnectStringParser connectStringParser = new ConnectStringParser(
                connectString);
        //1--2:设置zk服务器地址列表
        hostProvider = aHostProvider;
        //1--3:创建ClientCnxn,同时初始化outgoingqueue和pendingqueue
        cnxn = createConnection(connectStringParser.getChrootPath(),
                hostProvider, sessionTimeout, this, watchManager,
                getClientCnxnSocket(), canBeReadOnly);
        //1--4:初始化sendThread和eventThread
        cnxn.start();
    }

变量说明:

  • connectString:ZooKeeper集群的服务器地址列表
  • sessionTimeout:最终会引出三个时间设置:和服务端协商后的sessionTimeout、readTimeout、connectTimeout,服务器端使用协商后的sessionTimeout:即超过该时间后,客户端没有向服务器端发送任何请求(正常情况下客户端会每隔一段时间发送心跳请求,此时服务器端会从新计算客户端的超时时间点的),则服务器端认为session超时,清理数据。此时客户端的ZooKeeper对象就不再起作用了,需要再重新new一个新的对象了。
    客户端使用connectTimeout、readTimeout分别用于检测连接超时和读取超时,一旦超时,则该客户端认为该服务器不稳定,就会从新连接下一个服务器地址。
  • watcher:作为ZooKeeper对象一个默认的Watcher,用于接收一些事件通知。如和服务器连接成功的通知、断开连接的通知、Session过期的通知等。
  • canBeReadOnly:是否是只读客户端
  • aHostProvider:zk服务器列表(就是由connectString得来的)

这段代码可以做了这几件事情:

  • 1:创建watcher,注意这个watcher是交给了Zookeeper类的ZKWatchManager来管理,这个内容会在这个系列后面讲解,本文不涉及到。
  • 2:设置zk服务器列表,HostProvider实现
  • 3:创建ClientCnxn,同时初始化outgoingqueue和pendingqueue。
  • 4:初始化sendThread和eventThread,这两个线程需要结合上面两个队列来说,都会在下面具体说明。
    上面两个队列和两个线程都是ClientCnxn这个类中的,所以本文主角ClientCnxn登场。

ClientCnxn是客户端和服务端底层通信接口,你可以认为是它来主导通信这件事情。打开ClientCnxn类源码,有下面这几个变量和内部类需要你关注:


两队列两线程

线程和Packet

这样就可以引出通信具体过程:

Zookeeper构建Packet,并放入outgoingQueue队列中,ClientCnxn中发送线程sendThread发送outgoingQueue队列中内容,同时sendThread也兼顾接受服务端响应(SendThread#readResponse处理服务端响应,clientCnxnSocket有两个实现类ClientCnxnSocketNIO和ClientCnxnSocketNetty,但是这两个类最终都是调用到SendThread#readResponse),做的处理就是生成watcherEvent事件放入eventThread中的waitingEvents队列,等到eventThread处理,同时也放入pendingqueue(放入这个队列的作用在于接收到响应后和原始我应该要发送的请求作对比来保证接受消息的顺序性)。注意这里的步骤是去掉了客户端和服务端建立连接过程,因为这部分也比较复杂,在这里展开不太好。

以上就是客户端通信的具体过程,很绕,接下来我们跟着代码来走一遍。比如创建一个节点。

    //客户端调用create在集群内创建node,返回成功创建的路径
    zooKeeper.create("/test","".getBytes(),ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL);

    //--ZooKeeper对象负责创建出Request,并交给ClientCnxn来执行,ZooKeeper对象再对返回结果进行处理。
    public String create(final String path, byte data[], List acl,
            CreateMode createMode)
        throws KeeperException, InterruptedException
    {
        final String clientPath = path;
        PathUtils.validatePath(clientPath, createMode.isSequential());
        EphemeralType.validateTTL(createMode, -1);

        final String serverPath = prependChroot(clientPath);

        RequestHeader h = new RequestHeader();
        h.setType(createMode.isContainer() ? ZooDefs.OpCode.createContainer : ZooDefs.OpCode.create);
        //--1:构建CreateRequest包,
        CreateRequest request = new CreateRequest();
        CreateResponse response = new CreateResponse();
        request.setData(data);
        request.setFlags(createMode.toFlag());
        request.setPath(serverPath);
        if (acl != null && acl.size() == 0) {
            throw new KeeperException.InvalidACLException();
        }
        request.setAcl(acl);
        //2:传递给服务端
        //submitRequest将CreateRequest包转换成Packet包,调用sendPacket将发送包放入队列outgoingQueue,等待发送线程发送给服务端,调用线程wait方法等待返回
        //3:服务端响应后结果填充到response实体返回给客户端
        ReplyHeader r = cnxn.submitRequest(h, request, response, null);
        if (r.getErr() != 0) {
            throw KeeperException.create(KeeperException.Code.get(r.getErr()),
                    clientPath);
        }
        if (cnxn.chrootPath == null) {
            return response.getPath();
        } else {
            return response.getPath().substring(cnxn.chrootPath.length());
        }
    }

你可以按照这个顺序来追踪代码:
ClientCnxn#submitRequest()->ClientCnxn#queuePacket()

    public Packet queuePacket(RequestHeader h, ReplyHeader r, Record request,
            Record response, AsyncCallback cb, String clientPath,
            String serverPath, Object ctx, WatchRegistration watchRegistration,
            WatchDeregistration watchDeregistration) {
        Packet packet = null;

        // Note that we do not generate the Xid for the packet yet. It is
        // generated later at send-time, by an implementation of ClientCnxnSocket::doIO(),
        // where the packet is actually sent.
        packet = new Packet(h, r, request, response, watchRegistration);
        packet.cb = cb;
        packet.ctx = ctx;
        packet.clientPath = clientPath;
        packet.serverPath = serverPath;
        packet.watchDeregistration = watchDeregistration;
        // The synchronized block here is for two purpose:
        // 1. synchronize with the final cleanup() in SendThread.run() to avoid race
        // 2. synchronized against each packet. So if a closeSession packet is added,
        // later packet will be notified.
        synchronized (state) {
            if (!state.isAlive() || closing) {
                conLossPacket(packet);
            } else {
                // If the client is asking to close the session then
                // mark as closing
                if (h.getType() == OpCode.closeSession) {
                    closing = true;
                }
                //放入outgoingQueue,等待发送线程发送给服务端
                outgoingQueue.add(packet);
            }
        }
        sendThread.getClientCnxnSocket().packetAdded();
        return packet;
    }

这里便是放入了outgoingQueue,等待sendThread处理。还记得我们在创建zk客户端的时候启动了这个线程,那么我们查看这个线程具体干了什么。
在这里,我先把和这部分无关代码屏蔽掉,后面再展开(比如zk的心跳检查,zk的断线重连等等这里都先屏蔽掉)

        @Override
        public void run() {
            //---在这里赋值给了具体的实现类()  ClientCnxnSocketNIO和ClientCnxnSocketNetty
            clientCnxnSocket.introduce(this, sessionId, outgoingQueue);
            clientCnxnSocket.updateNow();
            clientCnxnSocket.updateLastSendAndHeard();
            int to;
            long lastPingRwServer = Time.currentElapsedTime();
            final int MAX_SEND_PING_INTERVAL = 10000; //10 seconds
            InetSocketAddress serverAddress = null;
            while (state.isAlive()) {
                try {
                    //....此部分是屏蔽代码部分

                    //sendThread发送
                    clientCnxnSocket.doTransport(to, pendingQueue, ClientCnxn.this);
                } catch (Throwable e) {
                    //...屏蔽代码部分
            }
            }
            synchronized (state) {
                // When it comes to this point, it guarantees that later queued
                // packet to outgoingQueue will be notified of death.
                cleanup();
            }
            clientCnxnSocket.close();
            if (state.isAlive()) {
                eventThread.queueEvent(new WatchedEvent(Event.EventType.None,
                        Event.KeeperState.Disconnected, null));
            }
            eventThread.queueEvent(new WatchedEvent(Event.EventType.None,
                        Event.KeeperState.Closed, null));
            ZooTrace.logTraceMessage(LOG, ZooTrace.getTextTraceLevel(),
                    "SendThread exited loop for session: 0x"
                           + Long.toHexString(getSessionId()));
        }

clientCnxnSocket有两个实现类ClientCnxnSocketNIO和ClientCnxnSocketNetty,我们分别进入这两个类并一直往下面走,你会发现我们要发送的packet都是来自之前所说ClientCnxn中的outgoingQueue,这也和前面对应之前,同时最终的发送给服务端都还是调到ClientCnxn.Packet#createBB,兜来兜去还是回到了ClientCnxn。这个方法便是将从outgoingQueue中封装好的Packet真正发送给服务端的方法,源码如下:

        public void createBB() {
            try {
                ByteArrayOutputStream baos = new ByteArrayOutputStream();
                BinaryOutputArchive boa = BinaryOutputArchive.getArchive(baos);
                boa.writeInt(-1, "len"); // We'll fill this in later
                if (requestHeader != null) {
                    requestHeader.serialize(boa, "header");
                }
                if (request instanceof ConnectRequest) {
                    request.serialize(boa, "connect");
                    // append "am-I-allowed-to-be-readonly" flag
                    boa.writeBool(readOnly, "readOnly");
                } else if (request != null) {
                    request.serialize(boa, "request");
                }
                baos.close();
                this.bb = ByteBuffer.wrap(baos.toByteArray());
                this.bb.putInt(this.bb.capacity() - 4);
                this.bb.rewind();
            } catch (IOException e) {
                LOG.warn("Ignoring unexpected exception", e);
            }
        }

接下来我们就可以看到回调这个内容,既然是通信,肯定a发给b,b接受a,b回复a,a接受b,这里的a就是客户端,b就是服务端,我们说完了a发给b,那我们再来看a如何接受b。
前面说到SendThread#readResponse处理服务端响应,但是不止它一个处理回调,因为Nio和Netty处理不一样,我直接说明答案。

  • ClientCnxnSocketNIO的回调是ClientCnxnSocketNIO.doIO(),这个方法不仅是发送也是接受,
  • ClientCnxnSocketNetty的回调ClientCnxnSocketNetty.ZKClientHandler#channelRead0(不同版本这里不一样,我的是3.5.7,但代码都一样)
    先看向ClientCnxnSocketNIO:
//1--回调事件和发送请求
    void doIO(List pendingQueue, ClientCnxn cnxn)
      throws InterruptedException, IOException {
        SocketChannel sock = (SocketChannel) sockKey.channel();
        if (sock == null) {
            throw new IOException("Socket is null!");
        }
        //1--回调
        if (sockKey.isReadable()) {
            int rc = sock.read(incomingBuffer);
            if (rc < 0) {
                throw new EndOfStreamException(
                        "Unable to read additional data from server sessionid 0x"
                                + Long.toHexString(sessionId)
                                + ", likely server has closed socket");
            }
            if (!incomingBuffer.hasRemaining()) {
                incomingBuffer.flip();
                if (incomingBuffer == lenBuffer) {
                    recvCount.getAndIncrement();
                    readLength();
                } else if (!initialized) {//1--也就是当前客户端和服务端之间正在进行会话创建
                    //1--首先判断当前的客户端状态是否是“已初始化”
                    readConnectResult();
                    enableRead();
                    if (findSendablePacket(outgoingQueue,
                            sendThread.tunnelAuthInProgress()) != null) {
                        // Since SASL authentication has completed (if client is configured to do so),
                        // outgoing packets waiting in the outgoingQueue can now be sent.
                        enableWrite();
                    }
                    lenBuffer.clear();
                    incomingBuffer = lenBuffer;
                    updateLastHeard();
                    initialized = true;
                } else {//1--常规请求,create,getdata,exist,事件响应也在里面
                    sendThread.readResponse(incomingBuffer);
                    lenBuffer.clear();
                    incomingBuffer = lenBuffer;
                    updateLastHeard();
                }
            }
        }
        if (sockKey.isWritable()) {
            //4--并把Packet放入pendingQueue(CLientCnxn接受服务端返回结果的队列),以便等待服务端响应后进行相应的处理
            //4--p.requestHeader != null&& p.requestHeader.getType() != OpCode.ping&& p.requestHeader.getType() != OpCode.auth
            Packet p = findSendablePacket(outgoingQueue,
                    sendThread.tunnelAuthInProgress());

            if (p != null) {
                updateLastSend();
                // If we already started writing p, p.bb will already exist
                if (p.bb == null) {
                    if ((p.requestHeader != null) &&
                            (p.requestHeader.getType() != OpCode.ping) &&
                            (p.requestHeader.getType() != OpCode.auth)) {
                        //1--生成一个客户端请求序号xid设置到packet请求头里面去
                        p.requestHeader.setXid(cnxn.getXid());
                    }
                    //---这里才是真正发送给服务端
                    p.createBB();
                }
                sock.write(p.bb);
                if (!p.bb.hasRemaining()) {
                    sentCount.getAndIncrement();
                    outgoingQueue.removeFirstOccurrence(p);
                    if (p.requestHeader != null
                            && p.requestHeader.getType() != OpCode.ping
                            && p.requestHeader.getType() != OpCode.auth) {
                        //1--这个if判断在后面pendingQueue取出来的前面是有判断的(常规的请求,create,getdata,exist)
                        //1--也就是在sendThread.readResponse
                        //1--ClientCnxn的pendingQueue
                        synchronized (pendingQueue) {
                            //1---在这里加进去pendingQueue
                            pendingQueue.add(p);
                        }
                    }
                }
            }
            if (outgoingQueue.isEmpty()) {
                // No more packets to send: turn off write interest flag.
                // Will be turned on later by a later call to enableWrite(),
                // from within ZooKeeperSaslClient (if client is configured
                // to attempt SASL authentication), or in either doIO() or
                // in doTransport() if not.
                disableWrite();
            } else if (!initialized && p != null && !p.bb.hasRemaining()) {
                // On initial connection, write the complete connect request
                // packet, but then disable further writes until after
                // receiving a successful connection response.  If the
                // session is expired, then the server sends the expiration
                // response and immediately closes its end of the socket.  If
                // the client is simultaneously writing on its end, then the
                // TCP stack may choose to abort with RST, in which case the
                // client would never receive the session expired event.  See
                // http://docs.oracle.com/javase/6/docs/technotes/guides/net/articles/connection_release.html
                disableWrite();
            } else {
                // Just in case
                enableWrite();
            }
        }
    }

ClientCnxnSocketNetty.ZKClientHandler#channelRead0代码最终也是调用到sendThread.readResponse(),这里就不再多叙述。
直接看向sendThread.readResponse()

void readResponse(ByteBuffer incomingBuffer) throws IOException {
            ByteBufferInputStream bbis = new ByteBufferInputStream(
                    incomingBuffer);
            BinaryInputArchive bbia = BinaryInputArchive.getArchive(bbis);
            ReplyHeader replyHdr = new ReplyHeader();

            //反序列化(解码)
            replyHdr.deserialize(bbia, "header");
            //1--这些if判断在pendingQueue加进去时是有限制的
            if (replyHdr.getXid() == -2) {
                // -2 is the xid for pings
                if (LOG.isDebugEnabled()) {
                    LOG.debug("Got ping response for sessionid: 0x"
                            + Long.toHexString(sessionId)
                            + " after "
                            + ((System.nanoTime() - lastPingSentNs) / 1000000)
                            + "ms");
                }
                return;
            }
            if (replyHdr.getXid() == -4) {
                // -4 is the xid for AuthPacket               
                if(replyHdr.getErr() == KeeperException.Code.AUTHFAILED.intValue()) {
                    state = States.AUTH_FAILED;                    
                    eventThread.queueEvent( new WatchedEvent(Watcher.Event.EventType.None, 
                            Watcher.Event.KeeperState.AuthFailed, null) );
                    eventThread.queueEventOfDeath();
                }
                if (LOG.isDebugEnabled()) {
                    LOG.debug("Got auth sessionid:0x"
                            + Long.toHexString(sessionId));
                }
                return;
            }
            if (replyHdr.getXid() == -1) {//10--通知消息
                // -1 means notification
                if (LOG.isDebugEnabled()) {
                    LOG.debug("Got notification sessionid:0x"
                        + Long.toHexString(sessionId));
                }
                //10--1:字节流反序列化WatcherEvent
                WatcherEvent event = new WatcherEvent();
                event.deserialize(bbia, "response");

                // convert from a server path to a client path
                //10--2:处理chrootpath
                if (chrootPath != null) {
                    String serverPath = event.getPath();
                    if(serverPath.compareTo(chrootPath)==0)
                        event.setPath("/");
                    else if (serverPath.length() > chrootPath.length())
                        event.setPath(serverPath.substring(chrootPath.length()));
                    else {
                        LOG.warn("Got server path " + event.getPath()
                                + " which is too short for chroot path "
                                + chrootPath);
                    }
                }
                //10--3:还原watchedEvent
                WatchedEvent we = new WatchedEvent(event);
                if (LOG.isDebugEnabled()) {
                    LOG.debug("Got " + we + " for sessionid 0x"
                            + Long.toHexString(sessionId));
                }
                //10--4:将watcher事件放入eventThread线程等待队列中,回调watcher
                eventThread.queueEvent( we );
                return;
            }

            // If SASL authentication is currently in progress, construct and
            // send a response packet immediately, rather than queuing a
            // response as with other packets.
            if (tunnelAuthInProgress()) {
                GetSASLRequest request = new GetSASLRequest();
                request.deserialize(bbia,"token");
                zooKeeperSaslClient.respondToServer(request.getToken(),
                  ClientCnxn.this);
                return;
            }

            Packet packet;
            synchronized (pendingQueue) {
                if (pendingQueue.size() == 0) {
                    throw new IOException("Nothing in the queue, but got "
                            + replyHdr.getXid());
                }
                packet = pendingQueue.remove();
            }
            /*
             * Since requests are processed in order, we better get a response
             * to the first request!
             */
            try {
                //保证消息的有序性
                if (packet.requestHeader.getXid() != replyHdr.getXid()) {
                    packet.replyHeader.setErr(
                            KeeperException.Code.CONNECTIONLOSS.intValue());
                    throw new IOException("Xid out of order. Got Xid "
                            + replyHdr.getXid() + " with err " +
                            + replyHdr.getErr() +
                            " expected Xid "
                            + packet.requestHeader.getXid()
                            + " for a packet with details: "
                            + packet );
                }

                packet.replyHeader.setXid(replyHdr.getXid());
                packet.replyHeader.setErr(replyHdr.getErr());
                packet.replyHeader.setZxid(replyHdr.getZxid());
                if (replyHdr.getZxid() > 0) {
                    lastZxid = replyHdr.getZxid();
                }
                //--相应消息填充到response字段
                if (packet.response != null && replyHdr.getErr() == 0) {
                    packet.response.deserialize(bbia, "response");
                }

                if (LOG.isDebugEnabled()) {
                    LOG.debug("Reading reply sessionid:0x"
                            + Long.toHexString(sessionId) + ", packet:: " + packet);
                }
            } finally {
                //1--完成watch注册等逻辑
                finishPacket(packet);
            }
        }

这里你可以看到逻辑就是最终调用eventThread.queueEvent(),这个方法就是生成watcherEvent事件放入eventThread中的waitingEvents队列,和之前对应上。

2.客户端和服务端会话的建立

前面先说明了sendThread和eventThread这些线程的作用,你再来看这样一张图,会清除一些。


zk客户端会话建立

这里的会话建立其实是两部分内容:

  • 客户端与服务器端的TCP连接
  • 在TCP连接的基础上建立session关联
    我们看向之前省略掉的代码:
@Override
        public void run() {
            //---在这里赋值给了具体的实现类()  ClientCnxnSocketNIO和ClientCnxnSocketNetty
            clientCnxnSocket.introduce(this, sessionId, outgoingQueue);
            clientCnxnSocket.updateNow();
            clientCnxnSocket.updateLastSendAndHeard();
            int to;
            long lastPingRwServer = Time.currentElapsedTime();
            final int MAX_SEND_PING_INTERVAL = 10000; //10 seconds
            InetSocketAddress serverAddress = null;
            while (state.isAlive()) {
                try {
                    //--没有链接
                    //--一旦客户端开始创建Zookeeper对象,客户端Zookeeper状态state设置为CONNECTING,成功连接上服务器后,客户端Zookeeper状态变更为CONNECTED。
                    if (!clientCnxnSocket.isConnected()) {
                        // don't re-establish connection if we are closing
                        //--服务器正在关闭
                        if (closing) {
                            break;
                        }
                        if (rwServerAddress != null) {
                            serverAddress = rwServerAddress;
                            rwServerAddress = null;
                        } else {
                            //--1000毫秒后尝试下一个服务端地址
                            //--当在sessionTimeout时间内,即还未超时,此时TCP连接断开
                            //--服务器端仍然认为该sessionId处于存活状态。此时,客户端会选择下一个ZooKeeper服务器地址进行TCP连接建立
                            //--TCP连接建立完成后,拿着之前的sessionId和密码发送ConnectRequest请求
                            //--如果还未到该sessionId的超时时间,则表示自动重连成功,对客户端用户是透明的,一切都在背后默默执行,ZooKeeper对象是有效的
                            //--休息的原因在于可能列表中地址都连不上,所以休息一段时间再去链接
                            serverAddress = hostProvider.next(1000);
                        }
                        //--发起链接
                        //-- 没有tcp链接
                        //--内部还有session链接
                        startConnect(serverAddress);
                        //--更新最后一次发送和接受时间
                        clientCnxnSocket.updateLastSendAndHeard();
                    }

                    //--链接状态
                    if (state.isConnected()) {
                        // determine whether we need to send an AuthFailed event.
                        if (zooKeeperSaslClient != null) {
                            boolean sendAuthEvent = false;
                            if (zooKeeperSaslClient.getSaslState() == ZooKeeperSaslClient.SaslState.INITIAL) {
                                try {
                                    zooKeeperSaslClient.initialize(ClientCnxn.this);
                                } catch (SaslException e) {
                                   LOG.error("SASL authentication with Zookeeper Quorum member failed: " + e);
                                    state = States.AUTH_FAILED;
                                    sendAuthEvent = true;
                                }
                            }
                            KeeperState authState = zooKeeperSaslClient.getKeeperState();
                            if (authState != null) {
                                if (authState == KeeperState.AuthFailed) {
                                    // An authentication error occurred during authentication with the Zookeeper Server.
                                    state = States.AUTH_FAILED;
                                    sendAuthEvent = true;
                                } else {
                                    if (authState == KeeperState.SaslAuthenticated) {
                                        sendAuthEvent = true;
                                    }
                                }
                            }

                            if (sendAuthEvent) {
                                eventThread.queueEvent(new WatchedEvent(
                                      Watcher.Event.EventType.None,
                                      authState,null));
                                if (state == States.AUTH_FAILED) {
                                    //--添加死亡事件
                                  eventThread.queueEventOfDeath();
                                }
                            }
                        }
                        //--链接状态判断读超时
                        to = readTimeout - clientCnxnSocket.getIdleRecv();
                    } else {
                        //--断开状态判断链接超时
                        to = connectTimeout - clientCnxnSocket.getIdleRecv();
                    }
                    
                    if (to <= 0) {
                        String warnInfo;
                        warnInfo = "Client session timed out, have not heard from server in "
                            + clientCnxnSocket.getIdleRecv()
                            + "ms"
                            + " for sessionid 0x"
                            + Long.toHexString(sessionId);
                        LOG.warn(warnInfo);
                        throw new SessionTimeoutException(warnInfo);
                    }
                    //--不断发送ping通知,从当前时间计算ssession过期时间,会话迁移激活
                    //--会话迁移公式:
                    //long expireTime = currentTime + sessionTimeout);
                    //expireTime = (expireTime / expirationInterval + 1) * expirationInterval;
                    if (state.isConnected()) {
                        //1000(1 second) is to prevent race condition missing to send the second ping
                        //also make sure not to send too many pings when readTimeout is small 
                        int timeToNextPing = readTimeout / 2 - clientCnxnSocket.getIdleSend() - 
                                ((clientCnxnSocket.getIdleSend() > 1000) ? 1000 : 0);
                        //send a ping request either time is due or no packet sent out within MAX_SEND_PING_INTERVAL
                        //--最后一次发送数据包的时间与当前时间的间隔  clientCnxnSocket.getIdleSend()
                        if (timeToNextPing <= 0 || clientCnxnSocket.getIdleSend() > MAX_SEND_PING_INTERVAL) {
                            //--也是放到outgoingQueue等待发送
                            sendPing();
                            clientCnxnSocket.updateLastSend();
                        } else {
                            if (timeToNextPing < to) {
                                to = timeToNextPing;
                            }
                        }
                    }

                    // If we are in read-only mode, seek for read/write server
                    if (state == States.CONNECTEDREADONLY) {
                        long now = Time.currentElapsedTime();
                        int idlePingRwServer = (int) (now - lastPingRwServer);
                        if (idlePingRwServer >= pingRwTimeout) {
                            lastPingRwServer = now;
                            idlePingRwServer = 0;
                            pingRwTimeout =
                                Math.min(2*pingRwTimeout, maxPingRwTimeout);
                            pingRwServer();
                        }
                        to = Math.min(to, pingRwTimeout - idlePingRwServer);
                    }
                    //sendThread发送
                    //---这里是建立tcp链接的地方,这里还是不清楚,到底哪里建立了tcp链接
                    //--ClientCnxnSocket负责和服务器创建一个TCP长连接
                    //---执行IO操作,即发送请求队列中的请求和读取服务器端的响应数据。---NIO
                    //---Netty发送请求
                    //--- ClientCnxnSocketNIO.doTransport()->doIO()->(这里面有两个发送)
                    //--- ClientCnxnSocketNetty.doTransport()->doWrite()->sendPktOnly()->sendPkt()
                    //---总而言之,这个方法不仅是发送请求的,也是接受返回的,也就是回调(ClientCnxnSocketNIO是的,ClientCnxnSocketNetty不是的)
                    //4回调ClientCnxnSocketNIO.doIO()
                    //4回调ClientCnxnSocketNetty.channelRead0
                    clientCnxnSocket.doTransport(to, pendingQueue, ClientCnxn.this);
                } catch (Throwable e) {
                    if (closing) {
                        if (LOG.isDebugEnabled()) {
                            // closing so this is expected
                            LOG.debug("An exception was thrown while closing send thread for session 0x"
                                    + Long.toHexString(getSessionId())
                                    + " : " + e.getMessage());
                        }
                        break;
                    } else {
                        // this is ugly, you have a better way speak up
                        if (e instanceof SessionExpiredException) {
                            LOG.info(e.getMessage() + ", closing socket connection");
                        } else if (e instanceof SessionTimeoutException) {
                            LOG.info(e.getMessage() + RETRY_CONN_MSG);
                        } else if (e instanceof EndOfStreamException) {
                            LOG.info(e.getMessage() + RETRY_CONN_MSG);
                        } else if (e instanceof RWServerFoundException) {
                            LOG.info(e.getMessage());
                        } else if (e instanceof SocketException) {
                            LOG.info("Socket error occurred: {}: {}", serverAddress, e.getMessage());
                        } else {
                            LOG.warn("Session 0x{} for server {}, unexpected error{}",
                                            Long.toHexString(getSessionId()),
                                            serverAddress,
                                            RETRY_CONN_MSG,
                                            e);
                        }
                        // At this point, there might still be new packets appended to outgoingQueue.
                        // they will be handled in next connection or cleared up if closed.
                        cleanAndNotifyState();
                    }
                }
            }
            synchronized (state) {
                // When it comes to this point, it guarantees that later queued
                // packet to outgoingQueue will be notified of death.
                cleanup();
            }
            clientCnxnSocket.close();
            if (state.isAlive()) {
                eventThread.queueEvent(new WatchedEvent(Event.EventType.None,
                        Event.KeeperState.Disconnected, null));
            }
            eventThread.queueEvent(new WatchedEvent(Event.EventType.None,
                        Event.KeeperState.Closed, null));
            ZooTrace.logTraceMessage(LOG, ZooTrace.getTextTraceLevel(),
                    "SendThread exited loop for session: 0x"
                           + Long.toHexString(getSessionId()));
        }

追踪下去,看向ClientCnxn.SendThread#primeConnection中:

        //--和服务器建立连接成功后,客户端sendThread发送ConnectRequest请求,申请建立session关联,此时服务器端会为该客户端分配sessionId和密码,同时开启对该session是否超时的检测。
        //--把该请求放到outgoingQueue请求队列中,等待被发送给服务器。
        //--将该请求包装成网络I/O层的Packet对象,放入发送队列outgoingQueue中去。
        void primeConnection() throws IOException {
            LOG.info("Socket connection established, initiating session, client: {}, server: {}",
                    clientCnxnSocket.getLocalSocketAddress(),
                    clientCnxnSocket.getRemoteSocketAddress());
            isFirstConnect = false;
            long sessId = (seenRwServerBefore) ? sessionId : 0;
            ConnectRequest conReq = new ConnectRequest(0, lastZxid,
                    sessionTimeout, sessId, sessionPasswd);
            // We add backwards since we are pushing into the front
            // Only send if there's a pending watch
            // TODO: here we have the only remaining use of zooKeeper in
            // this class. It's to be eliminated!
            if (!clientConfig.getBoolean(ZKClientConfig.DISABLE_AUTO_WATCH_RESET)) {
                List dataWatches = zooKeeper.getDataWatches();
                List existWatches = zooKeeper.getExistWatches();
                List childWatches = zooKeeper.getChildWatches();
                if (!dataWatches.isEmpty()
                        || !existWatches.isEmpty() || !childWatches.isEmpty()) {
                    Iterator dataWatchesIter = prependChroot(dataWatches).iterator();
                    Iterator existWatchesIter = prependChroot(existWatches).iterator();
                    Iterator childWatchesIter = prependChroot(childWatches).iterator();
                    long setWatchesLastZxid = lastZxid;

                    while (dataWatchesIter.hasNext()
                           || existWatchesIter.hasNext() || childWatchesIter.hasNext()) {
                        List dataWatchesBatch = new ArrayList();
                        List existWatchesBatch = new ArrayList();
                        List childWatchesBatch = new ArrayList();
                        int batchLength = 0;

                        // Note, we may exceed our max length by a bit when we add the last
                        // watch in the batch. This isn't ideal, but it makes the code simpler.
                        while (batchLength < SET_WATCHES_MAX_LENGTH) {
                            final String watch;
                            if (dataWatchesIter.hasNext()) {
                                watch = dataWatchesIter.next();
                                dataWatchesBatch.add(watch);
                            } else if (existWatchesIter.hasNext()) {
                                watch = existWatchesIter.next();
                                existWatchesBatch.add(watch);
                            } else if (childWatchesIter.hasNext()) {
                                watch = childWatchesIter.next();
                                childWatchesBatch.add(watch);
                            } else {
                                break;
                            }
                            batchLength += watch.length();
                        }

                        SetWatches sw = new SetWatches(setWatchesLastZxid,
                                                       dataWatchesBatch,
                                                       existWatchesBatch,
                                                       childWatchesBatch);
                        RequestHeader header = new RequestHeader(-8, OpCode.setWatches);
                        Packet packet = new Packet(header, new ReplyHeader(), sw, null, null);
                        outgoingQueue.addFirst(packet);
                    }
                }
            }

            for (AuthData id : authInfo) {
                outgoingQueue.addFirst(new Packet(new RequestHeader(-4,
                        OpCode.auth), null, new AuthPacket(0, id.scheme,
                        id.data), null, null));
            }
            //--把该请求放到outgoingQueue请求队列中,等待被发送给服务器。
            outgoingQueue.addFirst(new Packet(null, null, conReq,
                    null, null, readOnly));
            clientCnxnSocket.connectionPrimed();
            if (LOG.isDebugEnabled()) {
                LOG.debug("Session establishment request sent on "
                        + clientCnxnSocket.getRemoteSocketAddress());
            }
        }

其实也是放入outgoingQueue等待sendThread发送。那么肯定这个请求和之前我们的消息发送和接收不一样,因为这个发送这个请求时客户端和服务端并没有建立链接,接着我们看向ClientCnxnSocketNIO#doIO和ClientCnxnSocketNetty.ZKClientHandler#channelRead0的回调中有这样一个方法:ClientCnxnSocket#readConnectResult,这个是客户端发送建立连接时服务端返回的响应,我们看向代码:

   void readConnectResult() throws IOException {
        if (LOG.isTraceEnabled()) {
            StringBuilder buf = new StringBuilder("0x[");
            for (byte b : incomingBuffer.array()) {
                buf.append(Integer.toHexString(b) + ",");
            }
            buf.append("]");
            LOG.trace("readConnectResult " + incomingBuffer.remaining() + " "
                    + buf.toString());
        }
        ByteBufferInputStream bbis = new ByteBufferInputStream(incomingBuffer);
        BinaryInputArchive bbia = BinaryInputArchive.getArchive(bbis);
        //4反序列化,得到ConnectResponse对象
        //4我们就可以获取到服务器端给我们客户端分配的sessionId和passwd,以及协商后的sessionTimeOut时间。
        ConnectResponse conRsp = new ConnectResponse();
        conRsp.deserialize(bbia, "connect");

        // read "is read-only" flag
        boolean isRO = false;
        try {
            isRO = bbia.readBool("readOnly");
        } catch (IOException e) {
            // this is ok -- just a packet from an old server which
            // doesn't contain readOnly field
            LOG.warn("Connected to an old server; r-o mode will be unavailable");
        }

        //4取到Zookeeper服务端分配的会话SessionId
        this.sessionId = conRsp.getSessionId();
        sendThread.onConnected(conRsp.getTimeOut(), this.sessionId,
                conRsp.getPasswd(), isRO);
    }
        //--这个函数跟服务端没关系
        //--如果重新建立TCP连接后,已经达到该sessionId的超时时间了(服务器端就会清理与该sessionId相关的数据):
        // 则返回给客户端的sessionTimeout时间为0,sessionid为0,密码为空字节数组。(这是是服务端做的事情,客户端接收事件才是这个函数,才是下面的事件)
        // 客户端接收到该数据后,会判断协商后的sessionTimeout时间是否小于等于0,
        // 如果小于等于0,则使用eventThread线程先发出一个KeeperState.Expired事件,通知相应的Watcher,
        // 然后结束EventThread线程的循环,开始走向结束。此时ZooKeeper对象就是无效的了,必须要重新new一个新的ZooKeeper对象,分配新的sessionId了。
        void onConnected(int _negotiatedSessionTimeout, long _sessionId,
                byte[] _sessionPasswd, boolean isRO) throws IOException {
            negotiatedSessionTimeout = _negotiatedSessionTimeout;
            if (negotiatedSessionTimeout <= 0) {
                state = States.CLOSED;

                eventThread.queueEvent(new WatchedEvent(
                        Watcher.Event.EventType.None,
                        Watcher.Event.KeeperState.Expired, null));
                eventThread.queueEventOfDeath();

                String warnInfo;
                warnInfo = "Unable to reconnect to ZooKeeper service, session 0x"
                    + Long.toHexString(sessionId) + " has expired";
                LOG.warn(warnInfo);
                throw new SessionExpiredException(warnInfo);
            }
            if (!readOnly && isRO) {
                LOG.error("Read/write client got connected to read-only server");
            }
            readTimeout = negotiatedSessionTimeout * 2 / 3;
            connectTimeout = negotiatedSessionTimeout / hostProvider.size();
            //1--回调方法,通知Hostprovider
            hostProvider.onConnected();
            sessionId = _sessionId;
            sessionPasswd = _sessionPasswd;
            state = (isRO) ?
                    States.CONNECTEDREADONLY : States.CONNECTED;
            seenRwServerBefore |= !isRO;
            LOG.info("Session establishment complete on server "
                    + clientCnxnSocket.getRemoteSocketAddress()
                    + ", sessionid = 0x" + Long.toHexString(sessionId)
                    + ", negotiated timeout = " + negotiatedSessionTimeout
                    + (isRO ? " (READ-ONLY mode)" : ""));
            //4通过EventThread发送一个SyncConnected连接成功事件
            KeeperState eventState = (isRO) ?
                    KeeperState.ConnectedReadOnly : KeeperState.SyncConnected;
            //4代表客户端与服务器会话创建成功,并将该事件传递给EventThread线程
            //4ventThread线程收到事件后,会从ClientWatchManager管理器中查询出对应的Watcher,
            //4针对SyncConnected-None事件,那么就直接找出存储的默认Watcher,然后将其放到
            //4 EventThread的watingEvents队列中去。
            eventThread.queueEvent(new WatchedEvent(
                    Watcher.Event.EventType.None,
                    eventState, null));
        }

最终都是调用eventThread.queueEvent(),只是放入的事件类型不一样.
以上就是本文全部内容。

你可能感兴趣的:(zk通信机制源码分析)