本篇博客讨论如何基于zookeeper来设计一个分布式队列。
分布式队列一般需要考虑两点:
- 任务如何进入队列
- 如何保证每个任务只能被一个消费者消费
首先,zookeeper本身就提供了对队列的支持,不过官方文档对这个的描述非常简单
Distributed queues are a common data structure. To implement a distributed queue in ZooKeeper, first designate a znode to hold the queue, the queue node. The distributed clients put something into the queue by calling create() with a pathname ending in “queue-“, with the sequence and ephemeral flags in the create() call set to true. Because the sequence flag is set, the new pathnames will have the form path-to-queue-node/queue-X, where X is a monotonic increasing number. A client that wants to be removed from the queue calls ZooKeeper’s getChildren( ) function, with watch set to true on the queue node, and begins processing nodes with the lowest number. The client does not need to issue another getChildren( ) until it exhausts the list obtained from the first getChildren( ) call. If there are are no children in the queue node, the reader waits for a watch notification to check the queue again.
可以看出来,如果创建znode时,znode的名字以queue-
结尾,并且开启sequence 和 ephemeral模式,则这个znode会被自动命名为queue-X
,其中,这个x
是一个递增的值。且不会重复。其实以什么结尾不重要,主要是开启sequence 和 ephemeral模式,但按照惯例,都要以-
结尾。
这就解决了生产者想队列中生产任务的问题。但如何保证每个任务只能被一个消费者消费呢?由于zookeeper的对读取和删除都是原子操作,所以,如果一个消费者能够查询到队列中的值并且能够删除这个znode的话,表示该消费者拥有了队当前任务的权限,就可以开始消费了。
下面看一下简单的代码实现。首先来看生产者
package kite.zookeeper_demo.queue;
import org.apache.zookeeper.CreateMode;
import org.apache.zookeeper.KeeperException;
import org.apache.zookeeper.ZooDefs;
import org.apache.zookeeper.ZooKeeper;
public class ZkQueueProducer implements Runnable {
private ZooKeeper zooKeeper;
private String znode;
public ZkQueueProducer(ZooKeeper zooKeeper, String znode) {
this.zooKeeper = zooKeeper;
this.znode = znode;
}
@Override
public void run() {
try {
for (int i = 0; i < 200; i++) {
zooKeeper.create(znode + "/queue-", "job content...".getBytes(),
ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL_SEQUENTIAL);
Thread.sleep(500);
}
} catch (KeeperException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
注意两点,1,znode名称是以”queue-“结尾的,2,CreateMode.EPHEMERAL_SEQUENTIAL
。这两点保证了创建的znode会被自动命名
下面来看消费者的实现
import org.apache.zookeeper.KeeperException;
import org.apache.zookeeper.WatchedEvent;
import org.apache.zookeeper.Watcher;
import org.apache.zookeeper.ZooKeeper;
import java.util.List;
import java.util.concurrent.locks.ReentrantLock;
import java.util.stream.Collectors;
public class ZkQueueConsumer implements Watcher {
private ZooKeeper zooKeeper;
private String znode;
private String consumerName;
ReentrantLock reentrantLock = new ReentrantLock();
public ZkQueueConsumer(ZooKeeper zooKeeper, String znode, String consumerName) throws KeeperException, InterruptedException {
this.zooKeeper = zooKeeper;
this.znode = znode;
this.consumerName = consumerName;
childrenList = zooKeeper.getChildren(znode, this);
consumeChildrenList();
}
private List childrenList;
@Override
public void process(WatchedEvent watchedEvent) {
if (watchedEvent.getType().equals(Event.EventType.NodeChildrenChanged)) {
try {
reentrantLock.lockInterruptibly();
childrenList = zooKeeper.getChildren(znode, this);
consumeChildrenList();
} catch (InterruptedException e) {
e.printStackTrace();
} catch (KeeperException e) {
e.printStackTrace();
} finally {
reentrantLock.unlock();
}
}
}
private void consumeChildrenList() throws InterruptedException {
System.out.println(consumerName + " begin consume.children list size = " + childrenList.size());
childrenList = childrenList.stream().sorted().collect(Collectors.toList());
for (String child : childrenList) {
try {
String path = znode + "/" + child;
if (zooKeeper.getData(path, false, null) == null) {
System.out.println(consumerName + " get empty item");
continue;
}
zooKeeper.delete(path, zooKeeper.exists(path, false).getVersion());
Thread.sleep(200);//表示正在处理任务
System.out.println(consumerName + " consumed:" + child);
} catch (KeeperException e) {
e.printStackTrace();
}
}
System.out.println(consumerName + " set a new watch");
}
}
如果zooKeeper.getData(path,false,null) != null
并且zooKeeper.delete(path,zooKeeper.exists(path, false).getVersion());
没有抛出异常,则表示当前consumer成功地拿到了这个任务,因为每个znode只能被删除一次,第二次调用删除方法是会抛出异常的。这就保证了一个任务只能被消费一次。然而,实际情况比我们想象的要好,因为在我测试的过程中,没有跑出过这个异常,甚至连”get empty item”这个信息都没有打印出来,因为zookeeper实在是太快了。
最后,是main方法,连接到zookeeper,并启动生产者和消费者。
import java.io.IOException;
import java.util.LinkedList;
import java.util.List;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class ZkQueue {
static ExecutorService threadpool = Executors.newCachedThreadPool();
public static void main(String[] args) throws IOException, InterruptedException, KeeperException {
String znode = "/test/my_queue";
ZooKeeper zooKeeper = ZookeeperFactory.connect();
threadpool.submit(new ZkQueueProducer(zooKeeper, znode));
List consumerList = new LinkedList<>();
for (int i = 0; i < 5; i++) {
String consumerName = "consumer-" + i;
consumerList.add(new ZkQueueConsumer(zooKeeper, znode, consumerName));
}
Thread.currentThread().join();
}
}
上面用到了一个ZookeeperFactory
,代码如下
import org.apache.zookeeper.WatchedEvent;
import org.apache.zookeeper.Watcher;
import org.apache.zookeeper.ZooKeeper;
import java.io.IOException;
import java.util.concurrent.CountDownLatch;
public class ZookeeperFactory {
private static String connStr = "localhost:2181";
public static ZooKeeper connect() throws InterruptedException, IOException {
CountDownLatch countDownLatch = new CountDownLatch(1);
System.out.println("connnecting to zookeeper:" + connStr);
ZooKeeper zk = new ZooKeeper(connStr, 3000, (WatchedEvent event) -> {
if (event.getState() == Watcher.Event.KeeperState.SyncConnected) {
countDownLatch.countDown();
}
});
countDownLatch.await();
System.out.println("connected to zookeeper!");
return zk;
}
}