一:前言
zookeeper是一个分布式协调服务,它的最典型应用就是数据的发布和订阅,意思就是数据发布者将数据发布到某个节点上,其它订阅者进行订阅,这个功 能就是基于zk的watcher监听机制实现,当各个client监听的节点发生了数据的变更后,各client能够收到节点变更的通知。
zookeeper的client像server端发生一个watcher事件通知事件,整个流程图如下:
从流程图可以看到,整个注册和通知机制大致过程就是,zk客户端像服务端注册Watcher的时候会将Watcher对象存入本地的WatcherManager中,待服务端响应后,客户端从本地的WatcherManager中取出对应的Watcher来执行回调。
二:客户端注册Watcher源码剖析
在开始下面的讲解之前,先了解一下几个概念:
2.1 Packet
Packet是Zookeeper中用来通信的最小单元,所以任何需要网络进行传输的对象都需要包装成Packet对象。
2.2 SendThead
SendThread是Zookeeper中专门用来接收事件通知的线程,当服务端响应了客户端的请求后,会交给SendThread处理。
2.3 EventThread
EventThread是Zookeeper专门用来处理事件通知的线程,SendThread在接收到通知事件后会将事件传给EventThread进行处理。
2.4:ServerCnxn
ServerCnxn代表的是一个客户端和服务端的连接,客户端像服务端注册Watcher的时候,并不会真正将Watcher对象传递到服务端,而服务端也仅仅是保存了与当前客户端连接的ServerCnxn对象。
zookeeper有很多中方式都可以传入一个watcher对象,比如exist,getData等,下面我们以getData的源码来剖析zk的client和server端是如何处理实现整个watcher机制的。
原生的zookeeper API在获得节点数据的时候,可以通过getData方法获取,并且可以传入一个watcher对象,表示对当前节点进行监听,当节点的数据发送了改变,客户端会收到server端的通知事件告知该节点状态发生了改变。
下面我们根据getData()方法来看看zk是如何进行事件监听处理的。
public byte[] getData(final String path, Watcher watcher, Stat stat) throws KeeperException, InterruptedException { ..................... // the watch contains the un-chroot path WatchRegistration wcb = null; if (watcher != null) { wcb = new DataWatchRegistration(watcher, clientPath); } final String serverPath = prependChroot(clientPath); RequestHeader h = new RequestHeader(); h.setType(ZooDefs.OpCode.getData); GetDataRequest request = new GetDataRequest(); request.setPath(serverPath); request.setWatch(watcher != null); GetDataResponse response = new GetDataResponse(); ReplyHeader r = cnxn.submitRequest(h, request, response, wcb); .................... return response.getData(); }
(1)当客户端传入一个watcher对象的时候,如果不为null,首先会暂时将watcher包装成一个WatchRegistration,WatchRegistration是一个抽象类,里面保存了watcher对象和节点路径之间的关系。
(2)标记request,如果watcher不为null,将该请求标记为需要使用watcher监听,而RequestHeader标记的是一些头信息。
(3)ClientCnxn提交请求,封装成packet对象
public ReplyHeader submitRequest(RequestHeader h, Record request, Record response, WatchRegistration watchRegistration) throws InterruptedException { ReplyHeader r = new ReplyHeader(); Packet packet = queuePacket(h, r, request, response, null, null, null, null, watchRegistration); synchronized (packet) { while (!packet.finished) { packet.wait(); } } return r; }
其中重点看下queuePacket方法,它的源代码如下:
Packet queuePacket(RequestHeader h, ReplyHeader r, Record request, Record response, AsyncCallback cb, String clientPath, String serverPath, Object ctx, WatchRegistration watchRegistration) { Packet packet = null; // Note that we do not generate the Xid for the packet yet. It is // generated later at send-time, by an implementation of ClientCnxnSocket::doIO(), // where the packet is actually sent. synchronized (outgoingQueue) { packet = new Packet(h, r, request, response, watchRegistration); packet.cb = cb; packet.ctx = ctx; packet.clientPath = clientPath; packet.serverPath = serverPath; if (!state.isAlive() || closing) { conLossPacket(packet); } else { // If the client is asking to close the session then // mark as closing if (h.getType() == OpCode.closeSession) { closing = true; } outgoingQueue.add(packet); } } sendThread.getClientCnxnSocket().wakeupCnxn(); return packet; }
可以看到它将WatchRegistration对象封装成Packet对象,然后放到发送队列里面去,这里需要注意的是zk并不是把整个WatchRegistration对象都封装到Packet中:
其调用ClientCnxnSocket的sendPacket方法将封装好的packet对象传入, 源码如下:
@Override void sendPacket(Packet p) throws IOException { SocketChannel sock = (SocketChannel) sockKey.channel(); if (sock == null) { throw new IOException("Socket is null!"); } p.createBB(); ByteBuffer pbb = p.bb; sock.write(pbb); }
public void createBB() { try { ............... if (requestHeader != null) { requestHeader.serialize(boa, "header"); } if (request instanceof ConnectRequest) { request.serialize(boa, "connect"); // append "am-I-allowed-to-be-readonly" flag boa.writeBool(readOnly, "readOnly"); } else if (request != null) { request.serialize(boa, "request"); } ................. } catch (IOException e) { LOG.warn("Ignoring unexpected exception", e); } }
可以看到,实际上为了减少网络传输,只序列化了requestHeader和request.
(4)通过第一步我们知道,watcher对象是暂时封装在WatchRegistration中的,当客户端将请求发给服务端后,会通过SendThread.readResponse()方法来接收服务端的返回。
class SendThread extends Thread { ................ void readResponse(ByteBuffer incomingBuffer) throws IOException { ByteBufferInputStream bbis = new ByteBufferInputStream(incomingBuffer); BinaryInputArchive bbia = BinaryInputArchive.getArchive(bbis); ReplyHeader replyHdr = new ReplyHeader(); replyHdr.deserialize(bbia, "header"); .................. try { if (packet.requestHeader.getXid() != replyHdr.getXid()) { ................ } finally { finishPacket(packet); } }
源码里的finishPacket方法会将从刚刚我们封装的packet中取出来,通过调用 p.watchRegistration.register会将暂时保存的Watcher保存至ZKWatcherManager中。
这里我们也可以看到,实际上zk并没有真正把watcher对象传递到了服务端,而是会将watcher存在本地的ZKWatcherManager中。
private void finishPacket(Packet p) { if (p.watchRegistration != null) { p.watchRegistration.register(p.replyHeader.getErr()); } ............ }
public void register(int rc) { if (shouldAddWatch(rc)) { Map<String, Set<Watcher>> watches = getWatches(rc); synchronized(watches) { Set<Watcher> watchers = watches.get(clientPath); if (watchers == null) { watchers = new HashSet<Watcher>(); watches.put(clientPath, watchers); } watchers.add(watcher); } } }
本例中的getData对应的是DataWatchRegistration.
class DataWatchRegistration extends WatchRegistration { public DataWatchRegistration(Watcher watcher, String clientPath) { super(watcher, clientPath); } @Override protected Map<String, Set<Watcher>> getWatches(int rc) { return watchManager.dataWatches; } }
会放到本地ZKWatcherManager的dataWatches中去,其中dataWatches是一个Map,存储了数据节点路径和Watcher的映射关系。
private static class ZKWatchManager implements ClientWatchManager { private final Map<String, Set<Watcher>> dataWatches = new HashMap<String, Set<Watcher>>(); }
三:服务端Watcher处理
当服务端收到了客户端的请求后,如果客户端标记了需要使用Watcher监听,服务端会触发相应的事件,整个主干流程很简单,可以简单理解为下图的方式:
触发事件的源码如下:
public Set<Watcher> triggerWatch(String path, EventType type, Set<Watcher> supress) { WatchedEvent e = new WatchedEvent(type,KeeperState.SyncConnected, path); HashSet<Watcher> watchers; synchronized (this) { watchers = watchTable.remove(path); if (watchers == null || watchers.isEmpty()) { if (LOG.isTraceEnabled()) { ZooTrace.logTraceMessage(LOG, ZooTrace.EVENT_DELIVERY_TRACE_MASK, "No watchers for " + path); } return null; } for (Watcher w : watchers) { HashSet<String> paths = watch2Paths.get(w); if (paths != null) { paths.remove(path); } } } for (Watcher w : watchers) { if (supress != null && supress.contains(w)) { continue; } w.process(e); } return watchers; }
第一步:先将事件类型,路径和通知的状态封装成WatchedEvent对象。
第二步:从watchTable中获取对应的watcher。
第三步:调用process方法回调。
@Override synchronized public void process(WatchedEvent event) { ReplyHeader h = new ReplyHeader(-1, -1L, 0); if (LOG.isTraceEnabled()) { ZooTrace.logTraceMessage(LOG, ZooTrace.EVENT_DELIVERY_TRACE_MASK, "Deliver event " + event + " to 0x" + Long.toHexString(this.sessionId) + " through " + this); } // Convert WatchedEvent to a type that can be sent over the wire WatcherEvent e = event.getWrapper(); sendResponse(h, e, "notification"); }
在回调里面会将WatchedEvent反序列化成WatcherEvent便于网络传输,而且还会在请求头里面设置一个标记-1,代表的是通知请求。
最后调用sendResponse()方法发送该通知。
这样,一个服务端处理Watcher的过程就走完了,可以看到并没有涉及任何处理Wacther的真正的业务逻辑,因为这块是在客户端执行的。
四:客户端回调
当服务端触发了watcher后,服务端使用ServerCnxn对应的TCP连接像客户端发送了一个WatcherEvent事件,当客户端收到后会进行下面的处理:
前面我们已经讲的,客户端的SendThread线程用来接收事件的通知
class SendThread extends Thread { private long lastPingSentNs; private final ClientCnxnSocket clientCnxnSocket; private Random r = new Random(System.nanoTime()); private boolean isFirstConnect = true; //用来接收服务端的事件 void readResponse(ByteBuffer incomingBuffer) throws IOException { ByteBufferInputStream bbis = new ByteBufferInputStream(incomingBuffer); BinaryInputArchive bbia = BinaryInputArchive.getArchive(bbis); ReplyHeader replyHdr = new ReplyHeader(); replyHdr.deserialize(bbia, "header"); if (replyHdr.getXid() == -2) { // -2 is the xid for pings if (LOG.isDebugEnabled()) { LOG.debug("Got ping response for sessionid: 0x" + Long.toHexString(sessionId) + " after " + ((System.nanoTime() - lastPingSentNs) / 1000000)+ "ms"); } return; } //................. if (replyHdr.getXid() == -1) { // -1 means notification if (LOG.isDebugEnabled()) { LOG.debug("Got notification sessionid:0x" + Long.toHexString(sessionId)); } //反序列化 WatcherEvent event = new WatcherEvent(); event.deserialize(bbia, "response"); // convert from a server path to a client path if (chrootPath != null) { String serverPath = event.getPath(); //...................... WatchedEvent we = new WatchedEvent(event); eventThread.queueEvent( we ); return; } } }
对应请求头标记为-1的事件,首先要将字节流转为WatcherEvent对象,然后还原WatchedEvent,最后将WatchedEvent传递给EventThread线程进行处理。
public void queueEvent(WatchedEvent event) { if (event.getType() == EventType.None && sessionState == event.getState()) { return; } sessionState = event.getState(); // materialize the watchers based on the event WatcherSetEventPair pair = new WatcherSetEventPair( watcher.materialize(event.getState(), event.getType(),event.getPath()), event); // queue the pair (watch set & event) for later processing waitingEvents.add(pair); }
EventThread线程首先会从ZKWatcherManager中取出相应的Watcher
public Set<Watcher> materialize(Watcher.Event.KeeperState state,Watcher.Event.EventType type, String clientPath) //........................ Set<Watcher> result = new HashSet<Watcher>(); switch (type) { case None: //...................... synchronized(dataWatches) { for(Set<Watcher> ws: dataWatches.values()) { result.addAll(ws); } if (clear) { dataWatches.clear(); } } return result; case NodeDataChanged: //.................................. case NodeCreated: synchronized (dataWatches) { addTo(dataWatches.remove(clientPath), result); } synchronized (existWatches) { addTo(existWatches.remove(clientPath), result); } break; } return result; } }
从客户端回调的方法中可以看到,客户端判断出事件类型后,会从相应的存储Watcher的Map中remove掉,也就是Watcher监听是一次性的。
当获取了Watcher后,将其放入waitingEvents队列里,这是一个基于链表的有界队列。当队列里面有Watcher的时候,SendThread线程的run()方法则不断从队列里面取数据进行处理。
@Override public void run() { try { isRunning = true; while (true) { Object event = waitingEvents.take(); if (event == eventOfDeath) { wasKilled = true; } else { processEvent(event); } if (wasKilled) synchronized (waitingEvents) { if (waitingEvents.isEmpty()) { isRunning = false; break; } } } } catch (InterruptedException e) { LOG.error("Event thread exiting due to interruption", e); } LOG.info("EventThread shut down"); }
这个地方是一个串行同步的处理方式,所以需要注意不要因为某一个watcher的长时间处理而影响了客户端回调队列里其它的Watcher。
到这里,整个watcher机制从客户端请求,服务端响应,客户端回调的整个流程就完成了。
五:Watcher监听实例
package com.travelsky.pss.react.zookeeper.watcher; import java.io.IOException; import java.util.List; import java.util.concurrent.CountDownLatch; import org.apache.zookeeper.CreateMode; import org.apache.zookeeper.KeeperException; import org.apache.zookeeper.WatchedEvent; import org.apache.zookeeper.Watcher; import org.apache.zookeeper.Watcher.Event.EventType; import org.apache.zookeeper.Watcher.Event.KeeperState; import org.apache.zookeeper.ZooDefs.Ids; import org.apache.zookeeper.ZooKeeper; public final class WatcherExample implements Watcher { private static CountDownLatch latch = new CountDownLatch(1); private static CountDownLatch countDownLatch = new CountDownLatch(1); private static ZooKeeper zk = null; public static void main(String[] args) throws IOException, InterruptedException { WatcherExample.zkInit(); countDownLatch.await(); } @Override public void process(WatchedEvent event) { if (KeeperState.SyncConnected == event.getState()) { if (event.getType() == EventType.None && null == event.getPath()) { latch.countDown(); } else if (event.getType() == EventType.NodeChildrenChanged) { try { System.out.println("子节点状态发生了改变!"); List list = zk.getChildren(event.getPath(), true); for (final String node : list) { System.out.println("变更后的子节点:" + node); } countDownLatch.countDown(); } catch (KeeperException e) { e.printStackTrace(); } catch (InterruptedException e) { e.printStackTrace(); } catch (Exception e) { e.printStackTrace(); } } } } private static void zkInit(){ try { zk = new ZooKeeper("ip:2181", 5000, new WatcherExample()); latch.await(); zk.create("/NodeWatcher", "nodeWatcher".getBytes(), Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT); zk.create("/NodeWatcher/ChildWatcher", "ChildWatcher".getBytes(), Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT); System.out.println("/NodeWatcher节点的子节点内容:" + zk.getChildren("/NodeWatcher", true)); zk.create("/NodeWatcher/ChildWatcher2", "ChildWatcher2".getBytes(), Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT); } catch (IOException e) { e.printStackTrace(); } catch (InterruptedException e) { e.printStackTrace(); } catch (KeeperException e) { e.printStackTrace(); } } }运行结果如下:
/NodeWatcher节点的子节点内容:[ChildWatcher] 子节点状态发生了改变! 变更后的子节点:ChildWatcher2 变更后的子节点:ChildWatcher可以看到,当监听的子节点发生了状态变化后,zk的客户端会收到相应的通知。