从QuorumCnxManager的类介绍里面知道,这是一个专门用于处理Leader选举的Manager。本节主要讲解的内容:
内部类
SendWorker类作为网络IO的发送者,从发送队列取出,发给对应sid的机器
Message类定义了消息结构,包含sid以及消息体ByteBuffer
RecvWorker类作为网络IO的接受者
Listener类作为electionPort端口的监听器,等待其他机器的连接
属性
recvQueue作为接受队列
queueSendMap表示每个sid对应的发送的发送队列
函数
连接相关
sender,recv生产消费相关
其他
思考以及总结
内部类 :
可以看到有四个内部类,SendWorker,Message,RecvWorker,Listener
SendWorker
代码注释中对这个类的介绍是这样的:
/**
* Thread to send messages. Instance waits on a queue, and send a message
* as soon as there is one available. If connection breaks, then opens a new
* one.
*/
这个线程继承自ZookeeperThread,维护了一个Socket,是用来发送消息用的.这个线程会阻塞在队列上知道有消息进来,如果连接断开,会创建一个新的。
属性Field:
Long sid;//要连接的ServerId
Socket sock;
RecvWorker recvWorker;
volatile boolean running = true;
DataOutputStream dout;
主要方法:
/**
* SendWorker线程的一个实例通过一个队列接收消息来发送到对应的 Server sid
* An instance of this thread receives messages to send
* through a queue and sends them to the server sid.
*
* @param sock
* Socket to remote peer
* @param sid
* Server identifier of remote peer
*/
SendWorker(Socket sock, Long sid) {
//用来指定线程名
super("SendWorker:" + sid);
this.sid = sid;
this.sock = sock;
recvWorker = null;
try {
dout = new DataOutputStream(sock.getOutputStream());
} catch (IOException e) {
LOG.error("Unable to access socket output stream", e);
closeSocket(sock);
running = false;
}
LOG.debug("Address of remote peer: " + this.sid);
}
既然是个线程,自然少不了run方法.
@Override
public void run() {
threadCnt.incrementAndGet();
try {
/**
* If there is nothing in the queue to send, then we
* send the lastMessage to ensure that the last message
* was received by the peer. The message could be dropped
* in case self or the peer shutdown their connection
* (and exit the thread) prior to reading/processing
* the last message. Duplicate messages are handled correctly
* by the peer.
*
* If the send queue is non-empty, then we have a recent
* message than that stored in lastMessage. To avoid sending
* stale message, we should send the message in the send queue.
*/
//queueSendMap key为sid,value是一个队列
ArrayBlockingQueue bq = queueSendMap.get(sid);
if (bq == null || isSendQueueEmpty(bq)) {
//lastMessageSent里存着sid最后的一条发送记录 当队列为空是发送最后一条记录是为了检查对方是否在存活着
ByteBuffer b = lastMessageSent.get(sid);
if (b != null) {
LOG.debug("Attempting to send lastMessage to sid=" + sid);
//发送消息
send(b);
}
}
} catch (IOException e) {
LOG.error("Failed to send last message. Shutting down thread.", e);
this.finish();
}
try {
while (running && !shutdown && sock != null) {
ByteBuffer b = null;
try {
ArrayBlockingQueue bq = queueSendMap
.get(sid);
if (bq != null) {
b = pollSendQueue(bq, 1000, TimeUnit.MILLISECONDS);
} else {
LOG.error("No queue of incoming messages for " +
"server " + sid);
break;
}
if(b != null){
lastMessageSent.put(sid, b);
send(b);
}
} catch (InterruptedException e) {
LOG.warn("Interrupted while waiting for message on queue",
e);
}
}
} catch (Exception e) {
LOG.warn("Exception when using channel: for id " + sid + " my id = " +
self.getId() + " error = " + e);
}
this.finish();
LOG.warn("Send worker leaving thread");
}
synchronized void send(ByteBuffer b) throws IOException {
byte[] msgBytes = new byte[b.capacity()];
try {
b.position(0);
b.get(msgBytes);
} catch (BufferUnderflowException be) {
LOG.error("BufferUnderflowException ", be);
return;
}
//先放入字节数组的长度 下面RecvWorker 接受的时候会先根据首个Int判断长度是否超标
dout.writeInt(b.capacity());
dout.write(b.array());
dout.flush();
}
在SendWorker中,一旦Zookeeper发现针对当前服务器的消息发送队列为空,那么此时需要从lastMessageSent中取出一个最近发送过的消息来进行再次发送,这是为了解决接收方在消息接收前或者接收到消息后服务器挂了,导致消息尚未被正确处理。同时,Zookeeper能够保证接收方在处理消息时,会对重复消息进行正确的处理。
Message
static public class Message {
Message(ByteBuffer buffer, long sid) {
this.buffer = buffer;
this.sid = sid;
}
ByteBuffer buffer;
long sid;
}
sid为消息来源方的sid,buffer即消息体
RecvWorker
这个类作为“接受者”,类似SendWorker,继承ZooKeeperThread,线程不断地从网络IO中读取数据,放入接收队列
属性
Long sid;
Socket sock;
volatile boolean running = true;
DataInputStream din;
final SendWorker sw;
可以看到SendWorker 和RecvWorker相互持有句柄
主要方法:
构造器
RecvWorker(Socket sock, Long sid, SendWorker sw) {
super("RecvWorker:" + sid);
this.sid = sid;
this.sock = sock;
this.sw = sw;
try {
din = new DataInputStream(sock.getInputStream());
// OK to wait until socket disconnects while reading.
sock.setSoTimeout(0);
} catch (IOException e) {
LOG.error("Error while accessing socket for " + sid, e);
closeSocket(sock);
running = false;
}
}
@Override
public void run() {
threadCnt.incrementAndGet();
try {
while (running && !shutdown && sock != null) {
/**
* Reads the first int to determine the length of the
* message
*/
int length = din.readInt();
if (length <= 0 || length > PACKETMAXSIZE) {
throw new IOException(
"Received packet with invalid packet: "
+ length);
}
/**
* Allocates a new ByteBuffer to receive the message
*/
byte[] msgArray = new byte[length];
din.readFully(msgArray, 0, length);
ByteBuffer message = ByteBuffer.wrap(msgArray);
// 将message复制一份然后加入到RecvQueue addToRecvQueue没啥东西
addToRecvQueue(new Message(message.duplicate(), sid));
}
} catch (Exception e) {
LOG.warn("Connection broken for id " + sid + ", my id = " +
self.getId() + ", error = " , e);
} finally {
LOG.warn("Interrupting SendWorker");
sw.finish();
if (sock != null) {
closeSocket(sock);
}
}
}
Listener
也是ZookeeperThread的子类.
//只有一个成员变量,ServerSocket,用来监听election port然后做一系列判断
volatile ServerSocket ss = null;
线程run方法
/**
* Sleeps on accept().
*/
@Override
public void run() {
......
//最主要的方法
while (!shutdown) {
Socket client = ss.accept();
setSockOpts(client);
LOG.info("Received connection request "
+ client.getRemoteSocketAddress());
receiveConnection(client);
numRetries = 0;
}
......
}
ss.accept(),监听一个接口,直到有请求到来.
receiveConnection()最重要的方法
public void receiveConnection(Socket sock) {
Long sid = null;
try {
......
//If wins the challenge, then close the new connection.这边判断接受到的Socket请求的发起者的sid是否比自己大,这里有一个规则,选举算法的socket连接由sid大的发起
if (sid < self.getId()) {
/*
* This replica might still believe that the connection to sid is
* up, so we have to shut down the workers before trying to open a
* new connection.
*/
SendWorker sw = senderWorkerMap.get(sid);
if (sw != null) {
sw.finish();
}
/*
* Now we start a new connection
*/
LOG.debug("Create new connection to server: " + sid);
closeSocket(sock);
//这个方法后续会讲到,主要用来连接sid服务器
connectOne(sid);
// Otherwise start worker threads to receive data.
} else {
//在Listener内部初始化了SendWorker和RecvWorker
SendWorker sw = new SendWorker(sock, sid);
RecvWorker rw = new RecvWorker(sock, sid, sw);
sw.setRecv(rw);
SendWorker vsw = senderWorkerMap.get(sid);
if(vsw != null)
vsw.finish();
senderWorkerMap.put(sid, sw);
if (!queueSendMap.containsKey(sid)) {
queueSendMap.put(sid, new ArrayBlockingQueue(
SEND_CAPACITY));
}
sw.start();
rw.start();
return;
}
}
至此内部类大致讲完了,接下来我们总结下:
SendWorker和RecvWorker互相依赖对方 彼此握有对方的句柄
QuorumCnxManager中按照sid区分了SendWorker和RecvWorker,以及相互隔离开.
两者都有属性sid,表示每个机器和其他机器连接时,按sid区分不同的RecvWorker和SendWorker
好比sid1和其余(n-1)个server建立连接,那么就按sid分开,有(n-1)个RecvWorker和SendWorker
Message作为消息的封装,包含sid和ByteBuffer作为消息体
Listener主要监听本机配置的electionPort,不断的接收外部连接,负责启动RecvWorker和SendWorker
讲完内部类,我们分析主类的相关信息:
QuorumCnxManager
属性:
属性 | 默认值 | 备注 |
---|---|---|
RECV_CAPACITY | 100 | 接收队列的长度 |
SEND_CAPACITY | 1 | 发送队列的长度,原因在"思考"中提到 |
ConcurrentHashMap |
sid对应的SendWorker | |
ConcurrentHashMap |
消息发送队列,key为各机器sid | |
ConcurrentHashMap |
上一次发送给sid机器的内容 | |
ArrayBlockingQueue |
接收队列 |
方法:
上面的Listener里面有讲到如果当前自己的sid大于连接过来的sid,那么会调用connectOne发起连接。
connectOne:连接上一个sid的服务器
//连接上某个sid的server
synchronized void connectOne(long sid){
if (senderWorkerMap.get(sid) == null){//如果没有记录在sender的map里面
InetSocketAddress electionAddr;
if (self.quorumPeers.containsKey(sid)) {
//从配置文件获取对应sid机器的选举端口
electionAddr =
self.quorumPeers.get(sid).electionAddr;
} else {
LOG.warn("Invalid server id: " + sid);
return;
}
try {
if (LOG.isDebugEnabled()) {
LOG.debug("Opening channel to server " + sid);
}
Socket sock = new Socket();
setSockOpts(sock); sock.connect(self.getView().get(sid).electionAddr, cnxTO);//连接上对应socket
if (LOG.isDebugEnabled()) {
LOG.debug("Connected to server " + sid);
}
//初始化连接
initiateConnection(sock, sid);//初始化连接
} catch (UnresolvedAddressException e) {
......
}
}
最重要的流程在initiateConnection
initiateConnection初始化连接
/**如果当前服务器已经为sid初始化了一个连接,那么这个连接会被废弃当他在挑战中失败,否则继续保持连接
* If this server has initiated the connection, then it gives up on the
* connection if it loses challenge. Otherwise, it keeps the connection.
*/
public boolean initiateConnection(Socket sock, Long sid) {
DataOutputStream dout = null;
try {
// 先会给对方服务器发一个challege
// Sending id and challenge
dout = new DataOutputStream(sock.getOutputStream());
dout.writeLong(self.getId());
dout.flush();
} catch (IOException e) {
LOG.warn("Ignoring exception reading or writing challenge: ", e);
closeSocket(sock);
return false;
}
//本地比较小,需不需要断开连接,
// If lost the challenge, then drop the new connection
if (sid > self.getId()) {
LOG.info("Have smaller server identifier, so dropping the " +
"connection: (" + sid + ", " + self.getId() + ")");
closeSocket(sock);
// Otherwise proceed with the connection
} else {
SendWorker sw = new SendWorker(sock, sid);
RecvWorker rw = new RecvWorker(sock, sid, sw);
sw.setRecv(rw);
SendWorker vsw = senderWorkerMap.get(sid);
if(vsw != null)
vsw.finish();
senderWorkerMap.put(sid, sw);
if (!queueSendMap.containsKey(sid)) {
queueSendMap.put(sid, new ArrayBlockingQueue(
SEND_CAPACITY));
}
sw.start();
rw.start();
return true;
}
return false;
}
这里有个概念,就是win challenge和lose challenge
在zk中,为了保证每一对server只有一个socket,Zookeeper只允许SID大的服务器主动和其他机器建立连接,否则断开连接。
发出连接时,要求自己sid大,完成SendWorker和ReceiveWorker的构造以及线程启动,否则close
接收连接时,要求自己sid小,完成SendWorker和ReceiveWorker的构造以及线程启动,否则close
思考与总结
1.tricky方法的体现:每一对server之间只有一个连接
可以理解成n个server,互相之间都要用connection
好比n个点,用无向的边连起来,用[sidn,sid1]表示sidn向sid1建立了连接
那么,[sid1,sidn]就没有存在的必要了,也就是n*(n-1)/2条边就够了
1+2+3...+n-1
2.为什么发送队列的长度为1,入队时满了就要把前面的踢出去
长度为1 QuorumCnxManager#SEND_CAPACITY
踢出去 QuorumCnxManager#addToSendQueue
应该参考SEND_CAPACITY注释
// Initialized to 1 to prevent sending
// stale notifications to peers
因为是选举leader投票,有特殊的要求:如果之前的票还没有投出去又产生了新的票,那么旧的票就可以直接作废了,不用真正的投出去