为了保持会话的活跃,客户端需要周期性地发送ping报文,我们又可称之为心跳,今天就来研究这一块代码是如何实现的。
在 “客户端连接服务器” 的源码中,最后一段代码,里面包含了客户端向服务器发送ping报文,这个报文就是用于保持会话的。
ClientCnxn
类里面包含了SendThread
线程类,其run()
方法里面,涉及到发送ping的过程,其他代码都省略,仅保留ping的部分,代码如下:
public void run() {
......
final int MAX_SEND_PING_INTERVAL = 10000; //10 seconds
while (state.isAlive()) {
try {
if (state.isConnected()) {
to = readTimeout - clientCnxnSocket.getIdleRecv();
} else {
to = connectTimeout - clientCnxnSocket.getIdleRecv();
}
if (to <= 0) {
String warnInfo = String.format(
"Client session timed out, have not heard from server in %dms for session id 0x%s",
clientCnxnSocket.getIdleRecv(),
Long.toHexString(sessionId));
LOG.warn(warnInfo);
throw new SessionTimeoutException(warnInfo);
}
if (state.isConnected()) {
//1000(1 second) is to prevent race condition missing to send the second ping
//also make sure not to send too many pings when readTimeout is small
int timeToNextPing = readTimeout / 2
- clientCnxnSocket.getIdleSend()
- ((clientCnxnSocket.getIdleSend() > 1000) ? 1000 : 0);
//send a ping request either time is due or no packet sent out within MAX_SEND_PING_INTERVAL
if (timeToNextPing <= 0 || clientCnxnSocket.getIdleSend() > MAX_SEND_PING_INTERVAL) {
// 这里仅仅是组装报文,发送是在下面的代码
sendPing();
clientCnxnSocket.updateLastSend();
} else {
if (timeToNextPing < to) {
to = timeToNextPing;
}
}
}
// 这里才是发送报文的入口
clientCnxnSocket.doTransport(to, pendingQueue, ClientCnxn.this);
} catch (Throwable e) {
if (closing) {
// closing so this is expected
LOG.warn(
"An exception was thrown while closing send thread for session 0x{}.",
Long.toHexString(getSessionId()),
e);
break;
} else {
LOG.warn(
"Session 0x{} for sever {}, Closing socket connection. "
+ "Attempting reconnect except it is a SessionExpiredException.",
Long.toHexString(getSessionId()),
serverAddress,
e);
// At this point, there might still be new packets appended to outgoingQueue.
// they will be handled in next connection or cleared up if closed.
cleanAndNotifyState();
}
}
}
......
}
上面有几个关键的地方需要弄清楚,才能理解ping报文的发送周期。我们先看下timeToNextPing
的计算公式,代码如下:
int timeToNextPing = readTimeout / 2
- clientCnxnSocket.getIdleSend()
- ((clientCnxnSocket.getIdleSend() > 1000) ? 1000 : 0);
readTimeout
又是从哪里来的呢,追踪代码的调用逻辑,发现在ClientCnxn
的构造函数中有如下的初始化:
readTimeout = sessionTimeout * 2 / 3;
另一个是在ClientCnxn
的子类SendThread
的onConnected()
方法里面初始化的:
readTimeout = negotiatedSessionTimeout * 2 / 3;
结合这两个初始化代码,得出 readTimeout=sessionTimeout(或者negotiatedSessionTimeout) * 2 / 3
这里的sessionTimeout
是创建CuratorFramework
的client对象时,传入的参数,详见第一章"客户端连接服务器的源码"部分。
而negotiatedSessionTimeout
是由客户端和服务器建立连接后协商的值。因为客户端和服务器端,都可以设置会话的sessionTimeout
,建立连接后,就会协商一个统一的值。
要想确定negotiatedSessionTimeout
,在服务器端,涉及到2个重要的参数:maxSessionTimeout
和minSessionTimeout
。这两个参数在配置文件中,默认是没有的。
maxSessionTimeout
:该参数表示zk服务器允许客户端协商的最大超时时间。
minSessionTimeout
:该参数表示zk服务器允许客户端协商的最小超时时间。
如果服务端的配置文件未设置这两个参数,zk服务端会有一个默认值:
minSessionTimeout = tickTime * 2;
maxSessionTimeout = tickTime * 20;
而tickTime
是必须配置的参数之一,配置文件中默认为2000ms,即2s。因此,minSessionTimeout
默认为4s,maxSessionTimeout
默认为40s。
既然有maxSessionTimeout
和minSessionTimeout
两个参数,那么Zk是如何协商最终的sessionTimeout
的值呢?
前面提到过,在客户端侧,创建zk的client对象时,会传入sessionTimeout
和connectionTimeout
两个参数。该值可以自定义,如果客户端使用的是curator,则默认值是:(sessionTimeout=60s,connectionTimeout=15s)
。注意,connectionTimeout
是连接的超时时间,本文主要针对会话超时,所以,不对其进行研究,只关注连接成功之后的会话情况。
Zk服务端收到会话后,会对客户端传来的sessionTimeout
参数进行校验:如果客户端的sessionTimeout
比zk设置的minSessionTimeout
小,则sessionTimeout=minSessionTimeout
;如果sessionTimeout
比zk的maxSessionTimeout
大,则sessionTimeout=maxSessionTimeout
。换句话说:无论客户端传入的sessionTimeout
是多少,最终协商的sessionTimeout(negotiatedSessionTimeout)
介于minSessionTimeout
和maxSessionTimeout
之间。详见ZooKeeperServer.processConnectRequest()
代码:
int sessionTimeout = connReq.getTimeOut();
byte[] passwd = connReq.getPasswd();
int minSessionTimeout = getMinSessionTimeout();
if (sessionTimeout < minSessionTimeout) {
sessionTimeout = minSessionTimeout;
}
int maxSessionTimeout = getMaxSessionTimeout();
if (sessionTimeout > maxSessionTimeout) {
sessionTimeout = maxSessionTimeout;
}
cnxn.setSessionTimeout(sessionTimeout);
基于上述的默认值,即客户端的sessionTimeout=60s
,zk服务器的4s<=sessionTimeout<=40s
,我们可以确定最终协商的sessionTimeout
值,即negotiatedSessionTimeout=40s
。
进而可以计算出readTimeout
的值:readTimeout=negotiatedSessionTimeout*2/3=40s*2/3=~26s(取整数)
。
我们再看一下发送ping报文的计算公式是:
int timeToNextPing = readTimeout / 2
- clientCnxnSocket.getIdleSend()
- ((clientCnxnSocket.getIdleSend() > 1000) ? 1000 : 0);
if (timeToNextPing <= 0 || clientCnxnSocket.getIdleSend() > MAX_SEND_PING_INTERVAL) {
sendPing();
clientCnxnSocket.updateLastSend();
}
当timeToNextPing<=0
的时候,就可以发送ping,而clientCnxnSocket.getIdleSend()
是距离上次发送报文的时间间隔,如果我们忽略最后一个判断条件(clientCnxnSocket.getIdleSend() > 1000
),那么readTimeout/2
的值要小于或者等于clientCnxnSocket.getIdleSend()
才会是负数,所以,我们可以简单理解为,readTimeout/2
与clientCnxnSocket.getIdleSend()
相等即可,而clientCnxnSocket.getIdleSend()
表示距上次发送报文的间隔,所以,readTimeout/2
就可以理解为发送ping的间隔。
因此,我们可算出ping的周期是:
ping=readTimeout/2=26s/2=13s
上面这段代码不好理解,所以,我们对源代码增加日志打印,保持各参数默认值的情况,再看看执行结果:
2023-05-12 18:00:51,821 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1268] - ************ timeToNextPing = -1011, IdleSend = 13344
2023-05-12 18:00:51,821 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1272] - ================ send ping ================= IdleSend = 13344
2023-05-12 18:00:51,856 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@890] - Got ping response for session id: 0x1000ab066000000 after 35ms.
2023-05-12 18:01:05,171 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1268] - ************ timeToNextPing = -1013, IdleSend = 13346
2023-05-12 18:01:05,171 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1272] - ================ send ping ================= IdleSend = 13346
2023-05-12 18:01:05,174 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@890] - Got ping response for session id: 0x1000ab066000000 after 2ms.
2023-05-12 18:01:18,507 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1268] - ************ timeToNextPing = -1003, IdleSend = 13336
2023-05-12 18:01:18,507 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1272] - ================ send ping ================= IdleSend = 13336
2023-05-12 18:01:18,509 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@890] - Got ping response for session id: 0x1000ab066000000 after 2ms.
注意看日志中的================ send ping =================
前面的时间间隔,三次ping的间隔是13s左右,由此可以确定上述输出结果与前面的理解是一致的。
注意,发送报文的计算公式里面,还有一个判断,本文并未涉及,再回到代码:
final int MAX_SEND_PING_INTERVAL = 10000; //10 seconds
if (timeToNextPing <= 0 || clientCnxnSocket.getIdleSend() > MAX_SEND_PING_INTERVAL) {
sendPing();
clientCnxnSocket.updateLastSend();
}
当clientCnxnSocket.getIdleSend() > MAX_SEND_PING_INTERVAL
时,也会发送ping报文。也就是说,当距离上次发送报文大于10s,就需要向服务器发送ping报文。由于正常情况下,走不到该逻辑,所以,目前还不清楚什么情况下会触发这个条件,先记录在这儿,等后面理解了再回头补上。
为什么说正常流程走不到那个逻辑呢,回到本文最开始的代码,注意看上面的注释,发送ping报文的代码入口不在sendPing()
方法里面,而是在clientCnxnSocket.doTransport(to, pendingQueue, ClientCnxn.this)
方法里面。我们分别看看这两个方法的源码:
sendPing()
源码:
// ClientCnxn.SendThread.sendPing()
private void sendPing() {
lastPingSentNs = System.nanoTime();
RequestHeader h = new RequestHeader(ClientCnxn.PING_XID, OpCode.ping);
queuePacket(h, null, null, null, null, null, null, null, null);
}
// ClientCnxn.queuePacket()
public Packet queuePacket(
RequestHeader h,
ReplyHeader r,
Record request,
Record response,
AsyncCallback cb,
String clientPath,
String serverPath,
Object ctx,
WatchRegistration watchRegistration) {
return queuePacket(h, r, request, response, cb, clientPath, serverPath, ctx, watchRegistration, null);
}
// ClientCnxn.queuePacket()
public Packet queuePacket(
RequestHeader h,
ReplyHeader r,
Record request,
Record response,
AsyncCallback cb,
String clientPath,
String serverPath,
Object ctx,
WatchRegistration watchRegistration,
WatchDeregistration watchDeregistration) {
Packet packet = null;
// Note that we do not generate the Xid for the packet yet. It is
// generated later at send-time, by an implementation of ClientCnxnSocket::doIO(),
// where the packet is actually sent.
packet = new Packet(h, r, request, response, watchRegistration);
packet.cb = cb;
packet.ctx = ctx;
packet.clientPath = clientPath;
packet.serverPath = serverPath;
packet.watchDeregistration = watchDeregistration;
// The synchronized block here is for two purpose:
// 1. synchronize with the final cleanup() in SendThread.run() to avoid race
// 2. synchronized against each packet. So if a closeSession packet is added,
// later packet will be notified.
synchronized (state) {
if (!state.isAlive() || closing) {
conLossPacket(packet);
} else {
// If the client is asking to close the session then
// mark as closing
if (h.getType() == OpCode.closeSession) {
closing = true;
}
outgoingQueue.add(packet);
}
}
sendThread.getClientCnxnSocket().packetAdded();
return packet;
}
由此可知,snedPing()
方法只是构造ping报文,然后将其加入发送队列outgoingQueue
里面,并未立即发送至服务器。
clientCnxnSocket.doTransport()
方法的实现方法是ClientCnxnSocketNIO.doTransport()
,源码如下:
@Override
void doTransport(
int waitTimeOut,
Queue<Packet> pendingQueue,
ClientCnxn cnxn) throws IOException, InterruptedException {
// 为了验证是否会阻塞,这里增加了打印语句
LOG.debug("------------------- before select ----------------- waitTimeOut="+waitTimeOut);
selector.select(waitTimeOut);
LOG.debug("------------------- after select -----------------");
Set<SelectionKey> selected;
synchronized (this) {
selected = selector.selectedKeys();
}
// Everything below and until we get back to the select is
// non blocking, so time is effectively a constant. That is
// Why we just have to do this once, here
updateNow();
for (SelectionKey k : selected) {
SocketChannel sc = ((SocketChannel) k.channel());
if ((k.readyOps() & SelectionKey.OP_CONNECT) != 0) {
if (sc.finishConnect()) {
updateLastSendAndHeard();
updateSocketAddresses();
sendThread.primeConnection();
}
} else if ((k.readyOps() & (SelectionKey.OP_READ | SelectionKey.OP_WRITE)) != 0) {
// 会话已连接,则进入该方法
doIO(pendingQueue, cnxn);
}
}
if (sendThread.getZkState().isConnected()) {
if (findSendablePacket(outgoingQueue, sendThread.tunnelAuthInProgress()) != null) {
enableWrite();
}
}
selected.clear();
}
因为会话已连接,所以会进入ClientCnxnSocketNIO.doIO()
方法:
void doIO(Queue<Packet> pendingQueue, ClientCnxn cnxn) throws InterruptedException, IOException {
SocketChannel sock = (SocketChannel) sockKey.channel();
if (sock == null) {
throw new IOException("Socket is null!");
}
if (sockKey.isReadable()) {
int rc = sock.read(incomingBuffer);
if (rc < 0) {
throw new EndOfStreamException("Unable to read additional data from server sessionid 0x"
+ Long.toHexString(sessionId)
+ ", likely server has closed socket");
}
if (!incomingBuffer.hasRemaining()) {
incomingBuffer.flip();
if (incomingBuffer == lenBuffer) {
recvCount.getAndIncrement();
readLength();
} else if (!initialized) {
readConnectResult();
enableRead();
if (findSendablePacket(outgoingQueue, sendThread.tunnelAuthInProgress()) != null) {
// Since SASL authentication has completed (if client is configured to do so),
// outgoing packets waiting in the outgoingQueue can now be sent.
enableWrite();
}
lenBuffer.clear();
incomingBuffer = lenBuffer;
updateLastHeard();
initialized = true;
} else {
sendThread.readResponse(incomingBuffer);
lenBuffer.clear();
incomingBuffer = lenBuffer;
updateLastHeard();
}
}
}
// 发送报文,进入该分支
if (sockKey.isWritable()) {
Packet p = findSendablePacket(outgoingQueue, sendThread.tunnelAuthInProgress());
if (p != null) {
updateLastSend();
// If we already started writing p, p.bb will already exist
if (p.bb == null) {
if ((p.requestHeader != null)
&& (p.requestHeader.getType() != OpCode.ping)
&& (p.requestHeader.getType() != OpCode.auth)) {
p.requestHeader.setXid(cnxn.getXid());
}
p.createBB();
}
// 报文由这个方法发送出去
sock.write(p.bb);
if (!p.bb.hasRemaining()) {
sentCount.getAndIncrement();
outgoingQueue.removeFirstOccurrence(p);
if (p.requestHeader != null
&& p.requestHeader.getType() != OpCode.ping
&& p.requestHeader.getType() != OpCode.auth) {
synchronized (pendingQueue) {
pendingQueue.add(p);
}
}
}
}
if (outgoingQueue.isEmpty()) {
// No more packets to send: turn off write interest flag.
// Will be turned on later by a later call to enableWrite(),
// from within ZooKeeperSaslClient (if client is configured
// to attempt SASL authentication), or in either doIO() or
// in doTransport() if not.
disableWrite();
} else if (!initialized && p != null && !p.bb.hasRemaining()) {
// On initial connection, write the complete connect request
// packet, but then disable further writes until after
// receiving a successful connection response. If the
// session is expired, then the server sends the expiration
// response and immediately closes its end of the socket. If
// the client is simultaneously writing on its end, then the
// TCP stack may choose to abort with RST, in which case the
// client would never receive the session expired event. See
// http://docs.oracle.com/javase/6/docs/technotes/guides/net/articles/connection_release.html
disableWrite();
} else {
// Just in case
enableWrite();
}
}
}
因为是发送报文,所以进入sockKey.isWritable()
分支,然后从outgoingQueue里面取出一个报文,代码如下:
// ClientCnxnSocketNIO.doIO()
Packet p = findSendablePacket(outgoingQueue, sendThread.tunnelAuthInProgress());
// findSendablePacket()源码
private Packet findSendablePacket(LinkedBlockingDeque<Packet> outgoingQueue, boolean tunneledAuthInProgres) {
if (outgoingQueue.isEmpty()) {
return null;
}
// If we've already starting sending the first packet, we better finish
if (outgoingQueue.getFirst().bb != null || !tunneledAuthInProgres) {
return outgoingQueue.getFirst();
}
// Since client's authentication with server is in progress,
// send only the null-header packet queued by primeConnection().
// This packet must be sent so that the SASL authentication process
// can proceed, but all other packets should wait until
// SASL authentication completes.
Iterator<Packet> iter = outgoingQueue.iterator();
while (iter.hasNext()) {
Packet p = iter.next();
if (p.requestHeader == null) {
// We've found the priming-packet. Move it to the beginning of the queue.
iter.remove();
outgoingQueue.addFirst(p);
return p;
} else {
// Non-priming packet: defer it until later, leaving it in the queue
// until authentication completes.
LOG.debug("Deferring non-priming packet {} until SASL authentication completes.", p);
}
}
return null;
}
上面取出的报文,经过sock.write(p.bb);
方法,发送给服务端。
好了,发送报文的流程走完了。回到之前的问题,为何正常流程没有clientCnxnSocket.getIdleSend() > MAX_SEND_PING_INTERVAL
的情况呢?注意看ClientCnxnSocketNIO.doTransport()
方法里面,第一个参数是waitTimeout
,为了方便阅读,省略了无关的源码:
void doTransport(int waitTimeOut, Queue<Packet> pendingQueue, ClientCnxn cnxn) {
// 为了验证是否会阻塞,这里增加了打印语句
LOG.debug("------------------- before select ----------------- waitTimeOut="+waitTimeOut);
selector.select(waitTimeOut);
LOG.debug("------------------- after select -----------------");
}
这里涉及到网络编程相关的部分,平时基本上都是用的封装好的,所以这部分不熟悉,也就不深究了。查看selector.select(long timeout)
方法可知,当没有channel被选中时,会阻塞timeout
的时间。源码中,这个timeout
的值是由to
传递进去的。
to
的计算公式:
// ClientCnxn.SendThread.run()
to = readTimeout - clientCnxnSocket.getIdleRecv();
if (timeToNextPing < to) {
to = timeToNextPing;
}
由以上代码可知,to
的值必须得小于等于timeToNextPing
,所以可以理解为,正常情况下,ping报文发送之后,会阻塞一个ping周期的时间,然后再发送下一个报文。为了更好的理解阻塞过程,在源码上增加打印,可看到如下输出:
2023-05-12 18:00:51,821 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1268] - ************ timeToNextPing = -1011, IdleSend = 13344
2023-05-12 18:00:51,821 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1272] - ================ send ping ================= IdleSend = 13344
2023-05-12 18:00:51,822 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxnSocketNIO@337] - ------------------- before select ----------------- waitTimeOut=13340
2023-05-12 18:00:51,822 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxnSocketNIO@339] - ------------------- after select -----------------
2023-05-12 18:00:51,822 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1268] - ************ timeToNextPing = 13332, IdleSend = 1
2023-05-12 18:00:51,825 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxnSocketNIO@337] - ------------------- before select ----------------- waitTimeOut=13332
2023-05-12 18:00:51,825 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxnSocketNIO@339] - ------------------- after select -----------------
2023-05-12 18:00:51,825 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1268] - ************ timeToNextPing = 13333, IdleSend = 0
2023-05-12 18:00:51,825 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxnSocketNIO@337] - ------------------- before select ----------------- waitTimeOut=13333
2023-05-12 18:00:51,855 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxnSocketNIO@339] - ------------------- after select -----------------
2023-05-12 18:00:51,856 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1268] - ************ timeToNextPing = 13302, IdleSend = 31
2023-05-12 18:00:51,856 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxnSocketNIO@337] - ------------------- before select ----------------- waitTimeOut=13302
2023-05-12 18:00:51,856 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxnSocketNIO@339] - ------------------- after select -----------------
2023-05-12 18:00:51,856 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@890] - Got ping response for session id: 0x1000ab066000000 after 35ms.
2023-05-12 18:00:51,856 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1268] - ************ timeToNextPing = 13301, IdleSend = 32
2023-05-12 18:00:51,857 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxnSocketNIO@337] - ------------------- before select ----------------- waitTimeOut=13301
2023-05-12 18:01:05,171 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxnSocketNIO@339] - ------------------- after select -----------------
2023-05-12 18:01:05,171 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1268] - ************ timeToNextPing = -1013, IdleSend = 13346
2023-05-12 18:01:05,171 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1272] - ================ send ping ================= IdleSend = 13346
2023-05-12 18:01:05,171 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxnSocketNIO@337] - ------------------- before select ----------------- waitTimeOut=13352
2023-05-12 18:01:05,171 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxnSocketNIO@339] - ------------------- after select -----------------
2023-05-12 18:01:05,171 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1268] - ************ timeToNextPing = 13333, IdleSend = 0
2023-05-12 18:01:05,171 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxnSocketNIO@337] - ------------------- before select ----------------- waitTimeOut=13333
2023-05-12 18:01:05,171 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxnSocketNIO@339] - ------------------- after select -----------------
2023-05-12 18:01:05,171 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1268] - ************ timeToNextPing = 13333, IdleSend = 0
2023-05-12 18:01:05,171 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxnSocketNIO@337] - ------------------- before select ----------------- waitTimeOut=13333
2023-05-12 18:01:05,172 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxnSocketNIO@339] - ------------------- after select -----------------
2023-05-12 18:01:05,174 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1268] - ************ timeToNextPing = 13331, IdleSend = 2
2023-05-12 18:01:05,174 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxnSocketNIO@337] - ------------------- before select ----------------- waitTimeOut=13331
2023-05-12 18:01:05,174 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxnSocketNIO@339] - ------------------- after select -----------------
2023-05-12 18:01:05,174 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@890] - Got ping response for session id: 0x1000ab066000000 after 2ms.
2023-05-12 18:01:05,174 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1268] - ************ timeToNextPing = 13330, IdleSend = 3
2023-05-12 18:01:05,174 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxnSocketNIO@337] - ------------------- before select ----------------- waitTimeOut=13330
2023-05-12 18:01:18,507 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxnSocketNIO@339] - ------------------- after select -----------------
2023-05-12 18:01:18,507 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1268] - ************ timeToNextPing = -1003, IdleSend = 13336
2023-05-12 18:01:18,507 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1272] - ================ send ping ================= IdleSend = 13336
2023-05-12 18:01:18,507 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxnSocketNIO@337] - ------------------- before select ----------------- waitTimeOut=13333
2023-05-12 18:01:18,507 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxnSocketNIO@339] - ------------------- after select -----------------
2023-05-12 18:01:18,507 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1268] - ************ timeToNextPing = 13333, IdleSend = 0
2023-05-12 18:01:18,507 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxnSocketNIO@337] - ------------------- before select ----------------- waitTimeOut=13333
2023-05-12 18:01:18,507 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxnSocketNIO@339] - ------------------- after select -----------------
2023-05-12 18:01:18,507 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1268] - ************ timeToNextPing = 13333, IdleSend = 0
2023-05-12 18:01:18,507 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxnSocketNIO@337] - ------------------- before select ----------------- waitTimeOut=13333
2023-05-12 18:01:18,508 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxnSocketNIO@339] - ------------------- after select -----------------
2023-05-12 18:01:18,508 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1268] - ************ timeToNextPing = 13331, IdleSend = 2
2023-05-12 18:01:18,508 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxnSocketNIO@337] - ------------------- before select ----------------- waitTimeOut=13331
2023-05-12 18:01:18,508 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxnSocketNIO@339] - ------------------- after select -----------------
2023-05-12 18:01:18,509 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@890] - Got ping response for session id: 0x1000ab066000000 after 2ms.
2023-05-12 18:01:18,509 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1268] - ************ timeToNextPing = 13331, IdleSend = 2
2023-05-12 18:01:18,509 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxnSocketNIO@337] - ------------------- before select ----------------- waitTimeOut=13331
2023-05-12 18:01:31,850 [myid:127.0.0.1:2181] - DEBUG [main-SendThread(127.0.0.1:2181):ClientCnxnSocketNIO@339] - ------------------- after select -----------------
注意看上面的日志,这里有两点需要注意:
Got ping response for session id: xxxxx
日志之前,----- before select -----
和----- after select -----
之间的时间间隔是0ms。Got ping response for session id: xxxxx
日志之后,----- before select -----
和----- after select -----
之间的时间间隔才是13s左右,和ping的周期差不多。由此证明确实发生了阻塞。为何第1点里面提到的那段时间的日志,并没有发生阻塞呢,因为涉及到网络编程相关的代码,我也不熟悉,所以就不详细描述了,这里就直接给结论了。如果想了解,那还是老办法,在ClientCnxnSocketNIO.doIO()
方法里面增加打印日志。基于打印日志的结果得出,那段时间处于读和写的阶段,所以调用selector.select(long timeout)
就不会阻塞,直接就返回了。
Got ping response for session id: xxxxx
日志的输出,就表示客户端收到了服务器返回的ping的响应报文,此后就再无报文发送了,所以,这之后调用selector.select(long timeout)
就会再次阻塞,等待超时。
以上就是客户端在连接成功之后,发送心跳的流程,接下来,我们看下服务器收到心跳之后,做了哪些操作。
服务器接收报文的流程,和之前的文章"服务器处理连接的过程"是一样的,因此直接跳过接收报文的过程,重点阅读服务器收到ping报文之后,做了哪些事情。
根据之前的文章可知,报文最终会被FinalRequestProcessor.processRequest()
方法执行,查看处理ping报文的相关源码:
switch (request.type){
case OpCode.ping:{
lastOp="PING";
updateStats(request,lastOp,lastZxid);
cnxn.sendResponse(new ReplyHeader(ClientCnxn.PING_XID,lastZxid,0),null,"response");
return;
}
}
查看updateStats()
方法:
private void updateStats(Request request, String lastOp, long lastZxid) {
if (request.cnxn == null) {
return;
}
long currentTime = Time.currentElapsedTime();
zks.serverStats().updateLatency(request, currentTime);
request.cnxn.updateStatsForResponse(request.cxid, lastZxid, lastOp, request.createTime, currentTime);
}
查看updateStatsForResponse()
方法:
protected synchronized void updateStatsForResponse(long cxid, long zxid, String op, long start, long end) {
// don't overwrite with "special" xids - we're interested
// in the clients last real operation
if (cxid >= 0) {
lastCxid = cxid;
}
lastZxid = zxid;
lastOp = op;
lastResponseTime = end;
long elapsed = end - start;
lastLatency = elapsed;
if (elapsed < minLatency) {
minLatency = elapsed;
}
if (elapsed > maxLatency) {
maxLatency = elapsed;
}
count++;
totalLatency += elapsed;
}
updateStats()
更新请求相关的信息,目前不关注。
继续查看cnxn.sendResponse
的源码:
// ServerCnxn.sendResponse()
public void sendResponse(ReplyHeader h, Record r, String tag) throws IOException {
sendResponse(h, r, tag, null, null, -1);
}
最终调用的是实现类NIOServerCnxn.sendResponse()
,源码如下:
public void sendResponse(ReplyHeader h, Record r, String tag, String cacheKey, Stat stat, int opCode) {
try {
sendBuffer(serialize(h, r, tag, cacheKey, stat, opCode));
decrOutstandingAndCheckThrottle(h);
} catch (Exception e) {
LOG.warn("Unexpected exception. Destruction averted.", e);
}
}
上述代码就是将ping的响应报文,发回给客户端,服务器侧的处理流程就算结束了。
我们现在又回头看看客户端在收到服务器返回的ping响应报文时,是如何处理的。
回到ClientCnxnSocketNIO.doIO()
方法,查看处理的源码:
sendThread.readResponse(incomingBuffer);
上面是调用的SendTread.readResponse()
方法,其下处理ping的源码:
switch (replyHdr.getXid()){
case PING_XID:
LOG.debug("Got ping response for session id: 0x{} after {}ms.", Long.toHexString(sessionId), ((System.nanoTime()-lastPingSentNs)/1000000));
return;
}
上面的方法,收到ping的响应报文后,只是打印Got ping response for session id: xxxxx
的日志就结束了。自此,客户端发送ping报文至服务器的整个流程结束了,基于本文的基础,下一篇分析超时机制。